y .t* **: °?°*. NOAA Technical Memorandum NWS FCST 24 PROBABILITY FORECASTING - REASONS, PROCEDURES, PROBLEMS Meteorological Services Division Silver Spring, Md. January 1980 noaa NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION National Weather Service NOAA TECHNICAL MEMORANDUMS National Weather Service, Weather Analysis and Prediction Division Series The Weather Analysis and Prediction Division of the Office of Meteorological Operations is specifi- cally responsible for the management of forecasting services for the public, aviation, marine, agricul- ture, and fire weather interests. NOAA Technical Memorandums in the Weather Analysis and Prediction Division series communicate scientific and technical information relating to field forecasting opera- tions. The series includes information on present techniques, procedures, and performance data. Back- ground information and detail on selected service operations are also given. The series provides a means for the personnel in the National Weather Service headquarters and regional offices to report on forecasting methods of general interest and wide application. NOAA Technical Memorandums in the Weather Analysis and Prediction Division series facilitate rapid distribution of material which may be preliminary in nature and which may be published formally else- where at a later date. Publications 1 through 4 by this Division are in the former series, Weather Bureau Technical Notes (TN), Notes to Forecasters (FCST); publications 5 through 15 are in the former series, ESSA Technical Memorandums Weather Bureau Technical Memorandums (WBTM) . Beginning with FCST 16, publications are now part of the series, NOAA Technical Memorandums, National Weather Service (NWS). Publications listed below are available from the National Technical Information Service (NTIS), U.S. Department of Commerce, Sills Bldg., 5285 Port Royal Road, Springfield, Va. 22161. Prices vary for pa- per copy; $3.50 microfiche. Order by accession number, when given, in parentheses. Weather Bureau Technical Notes TN 8 FCST 1 On the Use of Probability Statements in Weather Forecasts. Charles F. Roberts, September 1965. (PB-174-647) TN 13 FCST 2 Local Cloud and Precipitation Forecast Method (SLYH). Matthew H. Kulawiec, September 1965. (PB-168-610) TN 16 FCST 3 Present and Future Operational Numerical Prediction Models. Charles F. Roberts, October 1965. (PB-169-126) TN 23 FCST 4 Forecasting the Freezing Level Objectively as an Aid in Forecasting the Level of Icing. Jack B. Cox, December 1965. (PB-169-247) ESSA Technical Memoranda WBTM FCST 5 Performance of the 6-Layer Baroclinic (Primitive Equation) Model. Julius Badner, Sep- tember 1966. (PB-173-426) WBTM FCST 6 Forecasting Mountain Waves. Philip A. Calabrese, September 1966. (PB-174-648) WBTM FCST 7 A Method for Deriving Prediction of Soil Temperature From Medium-Range Weather Forecasts. Charles F. Roberts, June 1967. (PB-175-773) WBTM FCST 8 Recent Trends in the Accuracy and Quality of Weather Bureau Forecasting Service. Charles F. Roberts and John M. Porter, November 1967. (PB-176-953) WBTM FCST 9 Report on the Forecast Performance of Selected Weather Bureau Offices for 1966-1967. C.F. Roberts, J. M. Porter, and G. F. Cobb, December 1967. (PB-177-043 ) WBTM FCST 10 Size of Tornado Warning Area When Issued on Basis of Radar Hook Echo. Alexander Sadowski, May 1969. (PB-184-613) WBTM FCST 11 Report qji Weather Bureau Forecast Performance 1967-68 and Comparison With Previous Years. Charles F. Roberts, John M. Porter, and Geraldine F. Cobb, March 1969. (PB-184-366) WBTM FCST 12 Severe Local Storm Occurrences 1955-1967. Staff, SELS Unit, NSSFC, Maurice E. Pautz, Editor, September 1969. (PB-187-761) WBTM FCST 13 On the Problem of Developing Weather Forecasting Equations by Statistical Methods. Charles F. Roberts, October 1969. (PB-187-796) WBTM FCST 14 Preliminary Results of an Empirical Study of Some Spectral Characteristics of Skill in Present Weather and Circulation Forecasts. Charles F. Roberts, November 1969. (PB-188- 529) (Continued on inside back cover) NOAA Technical Memorandum NWS FCST 24 PROBABILITY FORECASTING - REASONS, PROCEDURES, PROBLEMS Lawrence A. Hughes National Weather Service Central Region Kansas City, Mo. Meteorological Services Division Silver Spring, Md. January 1980 -"•'"■--- Md w » UNITED STATES / NATIONAL OCEANIC AND / Natonal Weather F DEPARTMENT OF COMMERCE / ATMOSPHERIC ADMINISTRATION / Service \Wt^^ W$ ' Phip M Klutznkk, Secretary / Richard A Frank. Administrator / Richard E Halgren, Director ■*•-• i *VW I I I CONTENTS Page Abstract V1 - 1. Introduction 1 2. History 2 3. Purpose and Use of Probability Forecasts 6 a. Words vs. Numbers 6 b. Use of Forecast and the Cost/Loss Ratio 7 c. Omitting Probabilities 10 d. Use of Zero and 100% 11 e. Value of Probability Forecasts 12 4. Definitions and Problem Areas 12 a. Probability vs. Chance 12 b. Average Point Probability 13 c. Splitting Local Area 13 d. Point vs. Area! Probability 14 e. Measurable vs. Trace 15 f. Point Probability vs. Precipitation Amount 16 g. Time Periods 19 h. First Period Problems 20 i. Probability vs. Odds 21 5. Determining Probabilities . 21 a. Objective Scheme and Guidance 21 b. Consensus 23 c. Forecast Principles 23 d. Probability in Body of Forecast 25 n Page 6. Types of Probability Forecasts 26 a. Point vs. Areal Probability -again 26 b. For Severe Storms 27 c. For Continuous Variables 27 d. For Maximum or Minimum Temperature 28 e. Frost and Freeze 28 f. Categorical from Probabilistic and Reverse 29 7. How Verified 30 a. Brier Score 39 b. Skill Score (regular and sample) 35 c. Skill Score vs. Reduction of Variance 35 d. Use of Multiple Gages 35 e. Modification of Brier Score 37 8. Problems Shown by Verification 33 a. New Forecaster Bias 39 b. 6 H and Day-Night Bias 42 c. Variations Among Forecasters 44 d. Skill vs. Distance 45 e. Sample Size 43 f. Seasonal Effects 48 g. Characteristics of Precipitation Probabilities ... 52 9. Improper Bias Reduction 53 10. Improving on MOS 54 iii Page 11. Trend in Scores 57 12. Forecast Comparisons 60 13. Combining Probabilities 62 14. Problems for the Future 64 15. Public Education 71 ACKNOWLEDGMENT 73 REFERENCES 74 IV 'When you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager, un- satisfactory kind. " Sverre Petterssen used this quotation from Lord Kelvin as a frontispiece in his Volume 1 of Weather Analysis and Forecasting , pub- lished by McGraw-Hill in 1956. PROBABILITY FORECASTING - REASONS, PROCEDURES, PROBLEMS Lawrence A. Hughes National Weather Service Central Region Kansas City, MO ABSTRACT. This paper is intended as a comprehensive discussion of probability forecasting, based mainly on that for precipitation probability. It is primarily intended to cover points of concern to those persons making probability forecasts, but it also covers points pertinent to those making management decisions concerning a probability program. Also included is a history of probability forecasting, and an extensive set of references for those wanting more information. VI 1 . INTRODUCTION The purpose of this Memorandum is to discuss the meaning of, the formulation of, and the use of probability forecasts in meteorology. Much of the material is based on the precipitation probability effort of the National Weather Service (NWS), especially that of the special verification program of the NWS Central Region. However, other types of probability forecasts will be mentioned where appropriate. The purpose of a weather forecast is to help people make better weather-dependent decisions. Thus, we are concerned mainly with the forecast user. As long as the forecaster cannot always make a forecast with complete certainty, i.e., make categorical fore- casts that are always correct, the user of the forecasts needs to know the forecaster's degree of certainty (probability) at the time of forecast issuance. Probability words like "chance" and "likely" express probability crudely, even if they are well defined and used consistently. Making accurate forecasts is not sufficient for the decision maker either. This was noted by both Gringorten (1958) and Borgman (1960), but Borgman put it best when he concluded: "Accurate forecasts are not sufficient to guarantee a high utility unless the fore- caster makes the required effort to word his forecasts so that they are usable in decision making." Malone's statement (1956) augments this by indicating that "In communication, cognizance must be taken of indeterminancy in a way that will optimize the usefulness of the forecast in making decisions. This leads directly to probability forecasts." Thus, probability numbers are the most precise means of communicating what the forecaster has in mind in that they clearly qualify certainty. REMEMBER— PROBABILITY IS ONLY A MEANS OF COMMUNICATION. It is thus a much simpler concept, especially for the user, than many people get by connotation from the word "probability." Because probability is a precise means of communication, it is easy to notice errors in it, and it is easy to make errors in creating it; therefore, it is important that the forecaster use it correctly, or the user can easily be misled. Murphy and Winkler (1971a, 1971b, 1974b) conducted a survey of NWS and other forecasters and noted that forecasters had problems understanding the probability concept. These problems must be resolved for the program to reach its full potential. They suggested, and Murphy (1977b) reiterated, the need for an educational program for both the forecasters and the public. This Technical Memorandum is intended to serve that purpose for the forecaster, but it also has some suggestion for public education by NWS people. To be stressed are the user-oriented items of meaning and usage of probability forecasts, but also treated will be items of interest mainly to the forecaster, such as how such forecasts are made, how combined, how verified, and how they can be used for continuous events like ceiling height or temperature. Because of its precision, probability requires more effort by the forecaster, but less by the user. It is therefore user-oriented, and not only for the sophis- ticated user, as we shall see. This memorandum is intended as a comprehensive source of information on probability forecasting, but, of course, it doesn't contain all that is known or even all that might be useful. However, it is the intent to make the memorandum more comprehensive by referencing many papers which could provide additional information for those wanting more on specific points. To make the memorandum more useful as a reference, the Contents oaqes list the topics covered, A look there should put you close to the right page for information you wish. Note that areal coverage problems are discussed in two places. 2. HISTORY Probability forecasting goes back at least as far as Cooke (1906a) in Australia when he reported on his experiment in the use of confidence factors. Cooke appended one of five confidence factors after each part of the weather forecast (precipitation, temperature, etc.), with five indicating yery high confidence and one a so-so confidence (50% probability). These five numbers were equivalent to ten of ours because he could use the five for a categorical "rain" or a categorical "sunny." Verifying about 2000 of these numbers, Cooke found the following percent correct for his categories: 5-99%, 4-94%, 3-79%, 2-56%, 1-58%. There was controversy and mis- understanding in his day, too, because he gave a short follow-up item (Cooke 1906b) in which he attempted clarification. There he said that he wasn't changing the structure of the forecast because the numbers were merely appended to the regular style forecast, which is our reason for the present style. You may be wondering whether confidence factors and probability are the same thing. Basically, yes. Confidence factors represent probabilities, although generally with some loss in precision, even after they are defined by verification as Cooke did. But there is a slight, but real, difference besides the lowered precision in that confidence factors are used for categorical forecasts and and represent the forecasters confidence in his forecast, while, the probability, on the other hand, is the forecaster's confidence that the event will occur. It is not a confidence in his forecast, because then it would be a probability of a probability. During World War I, according to a U.S. Army Signal Corps report (1919), the French and British Meteorological Services provided forecasts to the American Expeditionary Force which contained odds in favor of the forecast. It is interesting, again, that these were appended to the forecast. Also, the report indicated that the forecast was an unqualified categorical one, because they did not use qualifiers such as "probable" or "possibly." Next came a paper from the U.S. Weather Bureau (USWB) on forecasting precipitation in probability terms (Hallenbeck 1920), using 10% increments of probability. This was done mainly to help decisions involving irrigation in the southwestern United States. Brier (1944), of the USWB discussed forecaster confidence and the value of proba- bility in a paper that is hard to find, perhaps because it was classified at the time and thus may have had quite limited dis- tribution. In the last paragraph of this paper, Brier made the following important points: "Every one of us is called upon to make decisions of one kind or another, decisions such as whether to carry the umbrella to the office, whether to set up protective devices to save the fruit from a freeze, or whether to send bombers over Berlin on Friday or Saturday night. The decisions of a rational man will, to a large extent, depend upon his estimates of the probabilities of the different events and the consequences of them. When he is convinced that the weather forecaster's estimates of these probabilities are better than his own, he will come to him for weather information. But, in general, it will be up to this in- dividual (not the forecaster) to decide what course of action to take. He should not be given a 'pessimistic' forecast or some other 'biased' forecast. This will sometimes happen when the forecaster has in mind some particular operation for which the forecast might be used. However, so far as the scientific problem of weather fore- casting is concerned, the forecaster's duty ends with providing accurate and unbiased estimates of the probabilities of different weather situations." Efforts toward making probability forecasts objective increased after World War II. Price (1949), USWB, developed an objective technique for forecasting the probability of thunderstorms; Dickey (1949), USWB, did the same for temperature change; and Berkofsky (1950) did it for fog. Gentry (1950) created a quasi- objective scheme for forecasting areal coverage of summer showers in Florida. Areal coverage forecasts are not as good as the point probability forecasts the NWS uses now, but this paper was a step in the right direction, and his areal coverages were quite close to point probabilities because showers occur almost eyery day in Florida in summer. The relationship of areal coverage to point probability will be discussed in detail later. Efforts toward determining probabilities objectively actually go at least as far back as Besson (1904), who related surface observed variables to the probability of precipitation, although his final product was a categorical forecast. An interesting point here was that he noted that combining variables was of little or no use, because of the strong dependence among them. However, this effort was with European data and may not be as valid in the United States. Williams (1951) got probability experiments by Weather Bureau operational forecasters started again by an experiment in the use of confidence factors. He noted the near equality of confidence factors and probability. Later, in the 1950' s The Travelers Insurance Company started broadcasting weather to the public in the Hartford, CT, area, including probabilities for precipitation. These are continuing. In 1952 Schroeder (1954), of the USWB at Chicago, started an experiment using carefully defined probability words in his fire- weather forecasts. He noted that forecasters showed skill at this use, and he concluded that they should continue to do it. Also started in 1952 and going public in the local forecasts in January 1953 and continuing to date were the probability efforts of the USWB office at Hartford, CT, as reported by Wassail (1966). Roger Allen, when temporarily in charge of the Weather Bureau's St. Louis office for a short time in early 1954, ran a probability experiment for precipitation and temperature, but these forecasts were not released to the public. At that time the aviation briefings of the office gave probabilities for ceiling and visibility. These efforts were not reported in the meteorological literature. While the above shows a long history of work and experiments in probability forecasting, the present era of probability forecasts to the public is probably best related to experimental probability forecasts started in California after a series of papers on objective methods for making such forecasts was published (Vernon, 1947; Vernon and Stoneback, 1952; Jorgensen, 1949, 1953; Thompson, 1946, 1950). These forecasts were first issued to the public in 1956 in San Francisco, and probability forecasts have continued there to date. They started in Los Angeles in 1957. The San Francisco efforts were reported on by Root (1958, 1961, 1962). In early 1960, the eight Research Forecasters of the Weather Bureau (each at one of the major forecast offices—Boston, Massachusetts; Chicago, Illinois; Kansas City, Missouri; Denver, Colorado; Salt Lake City, Utah; San Francisco, California; Seattle, Washington; and Anchorage, Alaska) in one of their meetings noted the success of the California, Hartford, and other probability experiments. They endorsed the probability concept and recommended that each forecast center continue or initiate such a program. This was generally done. The results of some of these efforts were reported by Dickey (1965), Diemer (1965), Hughes (1965), and Stallard, et al . (1965). Each also included explanatory material on probability forecasting and verification, with the most comprehensive that of Hughes. As a result of these and earlier efforts and with realization of the potential of such forecasts, Weather Bureau Headquarters in 1965, under the authority of R. Simpson and E. Vernon, with technical guidance by C. Roberts, authorized the start of nationwide fore- casting for precipitation occurrence. At first, the effort was a trial and learning program, with the forecasts not released to the public. These trials were reported by Dickey (1966), Diemer (1966), Dunn (1966), and Hughes (1966a). The public release began in the first half of 1966. The first public forecasts were re- ported on by Dickey (1967), Hughes (1967a), and Roberts, et al. (1967). There have been some additional verification summaries by the NWS after these, both regional and national, and routine verification continues to date for all NWS Central Region offices and selected NWS offices on a national basis. However, specialized verifications—seasonal , etc., --then took on interest, as discussed later. Subjective probability guidance from the National Meteorological Center (NMC) started on facsimile in October 1965. The objective probability forecasts produced by NMC through the efforts of the Techniques Development Laboratory (TDL) and statistical methods were discussed early by Glahn (1962) and later by Glahn and Lowry (1972). These objective forecasts began in 1969 for the eastern United States (NWS, 1969) and for the conterminous 48 states about January 1, 1972 (NWS, 1971). These probabilities are generated by combining numerical and statistical models through a technique called Model Output Statistics (MOS), This history has had to be selective, with the intent to give an overview of the development of probability forecasting, especially from material most likely to be available in NWS Offices. Papers on or related to probability were considerably more numerous after 1950 and they became more sophisticated as well as more objective. Those desiring more material should refer to the bibliography on the subject (Murphy and Allen 1970). More recent papers, especially those in the 1970's, are given by Murphy (1976). Many additional papers, especially those more recent, will be referenced later in this memorandum. 3. PURPOSE AND USE OF PROBABILITY FORECASTS H. Roberts (1968) stated the problem of the forecaster well when he said of probability forecasting, "Insight does not come easily: the meaning of probability entails subtleties that invite mis- understanding and tempt misapplication even by those who are most receptive to the idea of expressing forecasts in probabilistic rather than deterministic form." In this section and the next, we will talk about a number of the details forecasters must be aware of in making these forecasts. a. Words vs. Numbers Emmons (1940) wrote that it was difficult to know from the fore- cast what the forecaster had in mind. He concluded that, "What we need, therefore, is a codification of forecast terms, in which precise definitions are given then the man in the street can judge for himself whether today's weather will be good, bad, or indifferent from his personal point of view." Landsberg (1940) stated much the same thing when he said, "A weather forecast in order to be useful has to be specific and should be worded so that the general public understands what is meant by the forecast." Quite a bit later, Dickey (1956) said, "The important things are the percentage definitions of the terms and the strict adherence to the terms once they have been decided upon and defined." Landsberg made another point that may be the reason some fore- casters and possibly some managers are not fully in favor or probabilities in forecasts. In regard to specific word usage, he said, "There are two approaches to this subject. One is from the viewpoint of the forecaster and the verification of forecasts, the second is from the viewpoint of the customer or user of the forecast. The forecaster wants to see his forecast verified; the less specific his forecast or the more latitude a term according to his own definition. . .the better his imagined score. The general public wants information." He stated that the public is often unaware of the wide leeway that the forecaster permits himself in using certain terms. Since probability numbers are the most precise expression of the forecaster's certainty, it reduces the leeway and therefore is of maximum potential benefit to the forecast user. Finally, according to Knox (1969), the indiscriminate use of un- defined probability (word) modifiers seriously reduced the effective- ness of the Canadian public forecasts prior to 1946. The Canadians then shifted to purely categorical forecasts, while the USWB stayed with probability modifiers, but, as noted by Hughes (1965, p31 ) , they were still not used consistently. In fact, almost the whole spectrum was used when no precipitation was mentioned. This extreme lack of consistency of words vs. numbers is one of the strong arguments for the use of numbers. In 1966, the USWB established a set of words well-defined in pro- bability terms, and these are used rather consistently today. However, public surveys have shown that forecast users do not correctly understand varying certainty from words (Rogell 1972 and 1976, and Eastern Region 1973, all summarized by Hughes 1978a). In the 1973 reference there was sampling in which the public had no idea of the certainty from the probability word, i.e., the frequency of choice of the four probability ranges given was essentially equal to a random selection. On the other hand, every- one knows that the certainty is low when it is 10% and high when 70%, and they know it is going up when it is 30% for today and 40% for tonight—these surveys showed this. Probability numbers are the most precise and, therefore, the most useful way to express varying certainty. The above is not to say that the probability should carry all the information, and words none. For example, in the precipitation forecast, the probability describes well the certainty as to whether measurable precipitation will occur or not, but the body of the forecast, if it is well done, should contain such other information as the forecaster can give, such as the type, amount, and start or stop time of the precipitation. Forecasters do not do enough of this sort of thing, tending to prove right Landsberg's point quoted above. b. Use of Forecasts and the Cost/Loss Ratio Some forecasters think the public wants and needs a yes-no type forecast, because the public's decision is one of doing or not doing some particular thing. This is mostly true, but it is not a sufficient reason for making a categorical forecast. A categorical forecast is really a two-probability forecast in which the probabilities are usually unknown. That is, taking rain forecasts as an example, the two probabilities are the percent correct of the rain forecasts and the percent correct of the no-rain forecasts after subtracting the latter from 100. These are generally around 65% and 15%, depending on the climatic frequency of rain and the lead time to the forecast period. These figures are known only from verification, but are not given to users and are probably not really known by forecasters. But no forecaster would really be happy if forced to issue only those two probabilities, because much of the time it would be clear that the degree of certainty is different from either of them. The user wouldn't be happy either. Experience clearly indicates that the use of 11 to 13 different probability numbers, mostly in 10% increments, is adequate for the present state of the science. How- ever, when the climatic frequency of the event forecast is considerably below 50%, there is evidence that forecasters can distinguish between low probability numbers quite close together such as 0, 2 and 5% (see Hughes, 1965, p26). Such discrimination could have value in drier climates. Using a fixed set of probability numbers for all climates can lead to less useful forecasts. There is some rationale for saying that there should be about as many probabilities available to the fore- caster that are below the climatic frequency of precipitation as there are above this frequency. This is obviously less and less the case with any fixed set of probabilities as the frequency gets low (practically all of the frequencies for the 6 and 12-hour periods in the United States are less than 40%, with cold season frequencies in the Northwest and in the Great Lakes area the most likely exceptions). This indicates that the use of the really low probabilities is more reasonable and necessary in dry climates (low frequency climates). The user's decision to do or not to do something is the basis for a precise way to use probability numbers. Let us look at several examples. First, let's take a forecast probability that the minimum air temperature will fall below 28°F. In an orchard, in spring, temperatures below this value can cause damage to the future crop if the trees are not protected, so the orchard manager has to decide whether or not to protect his crop with heaters. He knows that it will cost quite a bit to run those heaters for a number of hours as well as having an expense for lighting them, putting them out, refueling them, plus wear and tear by use. He also knows. that loss of his crop would be wery expensive. If he puts dollar values on these two, i.e., the cost to protect and the loss if caught unprotected when damaging cold temperatures prevailed, and takes the ratio of protection cost over loss, he will get a threshold for decision. A reasonable value of the ratio for the orchardist might be 5%. Thus anytime the forecast calls for a probability higher than that, he takes protective measures, otherwise he does not. 8 This is overly simplified, but nevertheless it is the principle of decision making using probability! . The 5% is called the cost-loss (C/L) ratio, and is the threshold for this task. Note that this threshold is quite low. But you say this is probably a sophisticated decision maker, what about the general public? No problem. The same principle is used when deciding whether or not to carry an umbrella or to buy life insurance. In fact every decision by each of us is made in the same way. Because of this experience of the public in making probabilistic decisions without really knowing it, it is easy for the public to make such decisions from the precipitation probability forecasts of the NWS. For example, as heard on a commuter bus after a pass- enger got on and was kidded about carrying an umbrella, "30%, that's good enough for me." This is exactly the way to make this pro- babilistic decision. The 30% was greater than his cost/loss ratio, but he would probably not know what C/L was if you asked. After making such decisions a few times, the decision maker may decide to change the threshold. It is by this experience that the people learn to adjust their decisions to the best threshold probability for them . The threshold could easily be different depending on whether the umbrella carrier is in work clothes or dress clothes, as well as the amount of tine out of doors. Thus, the threshold changes with conditions, and the thresholds for all the decisions within a metropolitan area may well cover almost all of the pro- bility spectrum (Root 1965). If you were a private meteorologist forecasting for a particular client, you might also act as consultant-decision-maker. If so, you should be aware of the C/L ratio for particular things. In that case, after you have your probability in mind, you can issue a categorical forecast telling the company to protect or not protect. In such a case you are the decision maker. Thus, a categorical fore- cast is really only useful when tuned to a particular cost-loss ratio. When there are many ratios, it is a useless concept. For the NWS forecasts, since there is a wide spectrum of C/L values in the area, the forecaster issues the forecast in terms of probability A corollary of this is that forecasts have value only when there are people to make decisions AND there are alternative decisions that depend on the forecast—no people or no way to protect equals no useful forecast. The inverse would also be generally true, that the more people making decisions and the more alternatives they have, the more important the forecasts. Of course, the monetary value of the decisions has some significance as well. so each decision maker can decide whether it is above or below the C/L value(s) of concern. Only probability numbers will allow this with precision and without ambiguity. Surveys and other feedback suggest that the public understands probability more and more (see Murphy 1967), in spite of what some forecasters think. For example, some forecasters say that the public doesn't understand our probability forecasts because every time it is, say, 40% or more they take it as though rain will occur. But the forecasters are wrong. The public is simply saying that 40% is above their threshold for a number of their decisions- carrying an umbrella, washing a car, leaving a car roof open while working, possibly even going to a ball game or outdoor concert. As a forecaster, suppose you are debating whether to put out a fore- cast of 30% or 40%. What is the effect of your decision? There is no effect for users with C/L values lower than 30% and higher than 40%--the large majority of users. Only those with C/L values between these two probabilities would be affected. Since most probabilities are low, the higher you are in the probability range the fewer users that probably would be affected. Very few and perhaps none would be affected by uncertainty between 70% and 80%. However, there is something satisfying to a forecaster in correctly forecasting 100% probability, even though 90% may have brought about no change in the decisions made, and almost as good a score in our usual measures. A very sophisticated usage of the C/L principle, where more than two options are available, can be found in Thompson (1959), which involved snow removal decisions by a municipality. A reprint of this article was distributed to every Weather Bureau office at the time. The use of this principle in the raisin industry was discussed in depth by Kolb and Rapp (1962), and a multiple-option use in fruit- frost decisions was discussed by Murphy and Thompson (1977). Schwerdt (1970) discussed its use on the New York City docks. For more information on the C/L ratio and its use, see Thompson and Brier (1955) and Thompson (1963, 1966). c. Omitting Probabilities Some people argue that probabilities shouldn't be mentioned in all forecasts. The reason usually given is that precipitation shouldn't be mentioned all the time. One wonders if such persons realize that omitting probabilities, say below some threshold, is valid only if there are no C/L ratios in the omitted range. 10 No one has examined the distribution of C/L values in the United States or even in an urban-surburban area. A reasonable distribution, considering that outdoor activities must relate somewhat to the climatic frequency of precipitation (raisin drying outdoors can take place in California but not in Missouri), is that there is a maximum of C/L values fairly near the climatic frequency of pre- cipitation, with very few values close to the extreme probabilities of zero and one. Thus, since most C/L values are low, it is a risky assumption that none are below the cutoff value, especially with a cutoff as high as 30%. On the other hand, omitting high values, say >10%, might not matter at all for decision makers. However, maximum utility for the large group of decision makers that exists in almost any of our forecast areas is more likely found with the full spectrum of probabilities. Then all users have a complete opportunity to make the best decision. An excellent example of this, such as could occur with many people these days, occurred to me one midnight. I was in bed on a summer night when I realized that I had left the roof open on my Volkswagen bus, and that is a big hole. I turned on the NOAA Weather Radio beside the bed, got a forecast of zero probability, and went to sleep. To get a personal feeling for the C/L ratio for yourself, think about what percentage of the time you would accept rain in the car. I am sure it will be small. I thought less than 10%, i.e., C/L = 10%. With the omission of low probabilities, I would have had no choice but to get up and close the roof. P.S., it was a good zero forecast. I personally think there are a number of these minor convenience decisions that low probabilities can help. Another example would be on an overcast day, with clouds thick enough to prevent shadows. A zero probability on such a day, and many of these are easy to do, really helps our image of good forecasting because to the public it looks like a threatening day. Many people wouldn't mind washing their car, etc., even with the overcast condition, if zero probability were explicitly stated, especially if they had learned from experience that such forecasts are correct. d. Use of Zero and 100% There is little point in using zero and 100% without modifiers. Purists will fault you that no one can be that certain and our verification data tend to bear that out. Also, forecasters generally wouldn't bet their odds (see section 4 i). Also, as mentioned in the section above, there probably aren't any C/L ratios very near 11 these extremes. This suggests that the terms "near zero" and "near 100%" are better to use. e. Value of Probability Forecasts If one needed more argument for probability forecasting, a paper by Thompson (1962) should do the trick. He showed that of the total possible gain from decision making with perfect forecasts, there is about as much to gain from the creation and use of probability fore- casts as from scientific advances; he also stated that decision- making using probability is available now, while the comparable advances in knowledge will take a millenium to achieve. Murphy (1977a) put it well when he said, "...the value of day-to-day weather fore- casts could be significantly increased, within the context of any decision-making situations, if such forecasts were routinely expressed in probabilistic terms and disseminated to decision makers (including the general public). In this regard, it should be emphasized that the benefits which could be expected from such a probability fore- casting program do not depend on scientific advances in the state of the art of weather forecasting...." Thus, while research, development, and numerical prediction can lead to better and more useful fore- casts, theuseof probability is a highly efficient and simple way to help decision makers (sophisticated and unsophisticated) make better decisions right now. 4. DEFINITIONS AND PROBLEM AREAS In the routine local (zone) NWS forecast, what is to be forecast is the average point probability of a measurable amount of precipitation for the local area (or zone) in the time periods of the forecast. This sounds straightforward, but there have been questions or un- certainty about each part of this definition. Let us look at the parts. a. Probability vs. Chance First, "probability" is the same as "chance", so that the point probability at a point is the same as the chance of the event at a point, Thus, a probability of 30% means that there is a 30% chance that the event forecast will occur^. Point probability is used be- cause almost all users need the chance of the event in a very small o One could also say that 30% means that 3 times out of 10 that 30% is used , the event is expected to occur. It is not necessary that the 10 times be with similar weather situations, as some have said. Also 3 out of 10 is the same as 3 to 7 odds. It is easy to err when changing probabilities into odds and vice versa, so odds are not recommended (see section 4 i). 12 area such as one's home, business, work site, or play site. Thus, point probability is used because it suits the user's needs best. b. Average Point Probability The average point probability over the forecast area is used be- cause the value should apply to and be usable at any point within the forecast area. If conditions over the forecast area are uniform (for example, fair, rain everywhere or a uniform coverage of showers), the point probability would be the same at each point, so the average would be the same also. As long as the conditions are uniform over the fore- cast area, the point probability is not affected by the size of the forecast area. Problems arise when the forecaster does not expect conditions to be nearly uniform over the local area. In areas which are climatological- ly nearly uniform, such as St. Louis, the forecasters are rarely able to distinguish probability differences across the area (Winkler and Murphy 1976). At Rapid City, with its prominent terrain variations in the local area, there are probability variations over the local area the forecaster can forecast (see Murphy and Winkler 1977a). When such conditions can be forecast the local area should be split (see next section). An important point relating to the average point probability is that if one were to forecast 100% probability, one should be certain that every point in the local area will get precipitation. Likewise with zero probability, no_ point should get precipitation. These are not good forecasts unless the conditions are met (see also section 3d), even though, using a single gage for verification, at times the fore- casts could be verified as if there were no error (see section 7d on use of multiple gages). c. Splitting Local Area Small variations in probability in the forecast area are neg- lected, even if they could be forecast. But where terrain or other features, or even the movement of weather systems causes sizable -- > 20% -- forecastable differences in probability over the area, this is best handled by splitting the forecast area into two parts. For example, in Miami "20% near the shore, 60% inland"; in Denver (or Rapid City) "20% except 50% west portions"; in Chicago "20% except 50% near the lake." It is highly desirable, from the user's view- point, for the forecaster to make such space distinctions. If it is not done in a place where sizable space variations are commonplace, forecasts of the average probability are much less useful and users justifiably complain, sometimes in the formal meteorological litera- ture (see Curtiss 1968). 13 d. Point vs. areal probability 3 When you make or use a probability, be sure you understand whether it is a point or an areal probability. The NWS public forecasts are and should be for the point probability, but some of the guidance probabilities are areal probabilities, e.g., severe thunderstorms. If in doubt about a guidance product, see the appropriate Technical Procedures Bulletin. The relationship between point and areal probability that is most useful to the forecaster, given by Hughes (1965), can be stated as Pn = p a c . (^ p a c where T 7 is the average point probability over the forecast area, P is the areal probability, i.e., the chance of precipitation for a any place in the forecast area, and C is the conditional areal coverage, i.e., the areal coverage the forecaster expects to exist if precipitation does materialize in the area. Note that the point probability is almost always smaller than the areal probability, that it can never be larger, and that they are equal only when the expected areal coverage is one--100%. Forecasters have difficulty with this point (see Winkler and Murphy 1976). For scattered showers, the areal probability is always larger. Also, the areal probability usually increases as the size of the area increases, so be sure to note this size compared to the size of your forecast area when using guidance given in areal probability terms. The average point probability over an area much less is sensitive to the size of the area. Forecasters may still be surprised at how large the climatological areal probability can be. Beebe (1952), extrapolating to a very large number of gages, has shown that in summer measurable rain probably occurred within a 50 mile radius of Atlanta or Birmingham on as much as 85% of the 24-hr days. This result is corroborated by Smith and Smith (1978). Thus, showers in the local area in summer are so common in these locations that the main help of the forecast comes in the ability to distinguish differing areal coverages, yet D. Smith (1977) has data that shows forecasters even now have little such ability, as noted by Hughes (1977b) and Murphy (1978b). Causey (1953) showed much the same thing as Beebe, but for Lincoln, NE, Peoria, IL, and a point in eastern Ohio, and in spite of using only a 35-mile radius, frequency of about 65% was obtained. 3 See also Sections 6a and 7e. 14 Going back, Eq. (1) was derived with the forecaster's need in mind, because it separates the problem into the parts to be handled, in that it separates the event of "some rain" as related to whether or not the weather system will reach the forecast area from the problem of the spottiness of the rain, if it does come. It is thus most useful before the fact. It is obvious from Eq. (1) that the conditional areal coverage, C usually does not include all the uncertainty, and thus is less desirable for decision making. Another equation is P = C (2) p u Here the areal coverage C is unconditional. This usage is evidently more understandable to mathematicians (see Curtiss 1968). However, it is best used in a climatological sense—after the fact, and it says that the best probability one could forecast on any particular period is the areal coverage that actually occurred in that period. This explains why the spotty showers of summer do not allow as frequent use of the high point probabilities as does the widespread precipitation events of winter. It also shows that since the larger amounts of precipitation must cover even smaller areas than just a measurable amount, high probabilities would be used even less for higher precipitation amounts. Eq. (2) equals Eq.'-(l) when the forecaster is certain that precipitation will occur in the forecast area, i.e., P a = 1.0. Since this is not commonly the case, Eq. (1) a is more useful at forecast time. Also, this uncertainty is why a forecast of areal coverage is not as useful for most decision making as a point probability. That is, decision making requires knowledge of all uncertainty. e. Measurable vs. Trace 4 Going back to the definition at the beginning of this section, let us look at another part of it—measurable. Many forecasters feel that a trace of precipitation should be included in the definition of a precipitation event, or that a trace should count as a hit when the probability is for measurable. The main reason for using the measurable criterion is that it has been in use for many years and the users have become accustomed to it. The frequency °f on 1y a trace is generally almost as high as the frequency of more-than-a-trace (see Hughes 1965). Thus if one were to include traces as precipitation events, the average forecast probability would have to almost double to adjust to the change in definition. This would cause great confusion to users for quite a time, and, because forecasts are compared to climatology, verification scores need not be higher since the climatic frequency of a trace or more would be almost twice as high as that for measurable. Of course, '♦See also Section 8f. 15 it would be completely inappropriate to verify using a trace amount as an event if the forecasts were for a measurable amount. The event forecast and that used to verify must be identical. Another point is that it is generally considered that a trace amount requires no protective action and thus should be treated the same as no-rain. This is probably not true for everyone, but the Weather Service must aim to help the large majority, and let the others with less common needs be additionally served by private weather services. Of course, the text of the forecast gives additional information on expected events. The probability alone cannot and was never intended to carry the whole precipitation message. This point must be kept in mind when constructing the forecast. In the wintertime, snow flurries behind cold fronts and to the lee of sizable bodies of unfrozen water are wery frequent, and commonly insignificant. Of course there are heavy snow squalls at times in such conditions, but these are less common than light flurries. To call high probabilities for flurry trace events would cause loss of value for the probability on other, more important, days. f. Point probability vs. precip. amount Some will say that the public needs the probability of more than a measurable amount of precipitation, and that is true. The problem is how to get the information into the forecast without causing undue confusion. It could be done in specialty forecasts, e.g., agricultural and fire weather, because MOS guidance already exists for QPF (quantitative precipitation forecast) probabilities (Bermowitz and Zurndorfer, 1979). However, there is some QPF information in the probabilities of only measurable precipitation. This was shown for the three forecast periods for one cold season by Wasserman and Rosenblum (1972) for four East Coast locations combined and by Wasserman (1972) for 13 NWS Eastern Region locations-- see Table 1. It was also shown for the first forecast period (today) for the warm season by Hashemi and Decker (1969) for the four Weather Bureau offices in Missouri—see Fig. 1. To use Fig. 1, find the point of intersection between the curved line appropriate to the forecast probability of a measurable - amount and the vertical line of the desired precipitation threshold, and note on the left side of the graph the probability of precipitation greater than the threshold. For example, for a 70% forecast probability (60-100% line), the chance of > 0.60 in. is about 17%. These data are probably not universal and need to be calculated for other areas, especially those in the West, and possibly should be 16 RELATIVE FREQUENCY OF OBSERVED PRECIPITATION > AMOUNT INDICATED PEATMOS POP CASES TRACE 0.01" 0.11" 0.26* 0.51' 13 O >-i-H 3 U O ) 2 -f <|> (1 - «,) (5) where is the observed frequency of precipitation of all the forecasts of probability F. This was devised by Sanders (1963) and its derivation is easily seen from Hughes (1965, pll) 7 . The (F-<}>) 2 reflects reliability error (bias) and says that the error is zero when the observed frequency is the same as the forecast probability F (see also Fig. 4). The (T - $) reflects resolution error and is zero only when the observed frequency is either 0% or 100% for that forecast probability regardless of what it is (see Fig. 5). Figs, 4 and 5 are from Hughes (1967c) and are discussed further there. Note in Fig. 4 that the bias is relatively small even for moderate unreliability, and resolution (Fig. 5) is relatively large for mid-probability forecasts. The major portion of the Brier score in fact comes from poor resolution. Forecasters have complained about the fact that the Brier score is a squared function. This aspect has been discussed in depth by 7 Take the score for days with rain as (F - I) 2 and ttye days without rain as (l-c|))F 2 , add the two, add and subtract 2 , and rearrange. 32 o o ^ o o> o 00 o r*. CD s- o o tr, o o> <£> S- >» .25- .20- .10- 50 F% 100 Figure 5. --Resolution 34 Hughes (1967c) and it was shown that powers of 1 and 3 yield highly playable systems and thus are much poorer scoring systems than power 2, whether or not the absolute value of these odd powers is taken. b. Skill score (regular and sample) While the Brier score is the basic score for verification, the fact that low scores are better confuses people, and the size of the number is not in itself meaningful, i.e., what is a good score is not obvious. To try to overcome these aspects and also to relate the score of the forecasters to a non-skill but before-the-fact forecast, a skill score was devised related to climatology, as suggested by Jorgensen (1962) and along the lines used by Sanders (1963). It is of the form 100(P C -P F ) ( 6 ) Pfc where Pp is the Brier score of a set of forecaster probabilities, and Pp is the Brier score of the same set, but using climatological probabilities as forecasts. The climatological probabilities are the long-term (over 10 years) values appropriate to the month, time of day, and the length of the forecast period, such as those of Jorgensen (1967) or Hughes (1966b). A perfect skill score is 100 in keeping with traditional grading concepts, zero indicates no skill, and negative scores are possible. The skill score is used as the ultimate measure of the quality of the forecasts. For a sizable set of forecasts, it has all of the desirable characteristics given above for the Brier score (see Murphy 1973). However, 100 is unattainable because we forecast the average point probability which equates to the observed areal coverage, so much of the time it is impossible to use the very high probabilities, especially in the warm season (see section 4d). Some people have used the sample skill score (e.g., Sadowski and Cobb 1974). This uses observed frequency of the forecasts—the short-term frequency—instead of the long-term frequency of the forecasts, and is discussed by Murphy (1974). The scores using the sample frequency are poorer than scores from the long-term frequency and the score is undesirable for two other reasons, 1) the sample frequency is not a value known ahead of time and therefore it can not be used to create a forecast ahead of time against which to compete; therefore, comparison with it is a different breed of score from that using long-term climatology, 2) it takes away from the forecaster some score based on the ability to distinguish deviations from the long-term climatology, i.e., wet and dry regimes. This is part of the forecaster's skill for which credit should be given. 35 c. Skill Score vs. Reduction of Variance Many objective schemes for forecasting are now derived via screening-regression, and the measure of the quality of the scheme and the contribution of the individual terms is given by the reduction of variance. It was noted and then proven by Sangster (1970) that the reduction of variance and the skill score are equal under certain conditions. They are exactly equal if the relative frequency of precipitation in the sample used in the derivation is exactly equal to the longer-term frequency used in verification, and the probabilities used in the reduction of variance computation are in the range zero through one. The deviations from these ideal conditions are usually small enough and/or infrequent enough that there is usually only a small difference between the skill score (in percent) and the reduction of variance (in percent). In general the deviations from the ideal conditions are such that the reduction of variance is usually slightly less than the skill score. This means that if the reduction of variance of a forecast scheme is better than your skill score, the scheme should be given a lot of consideration. d. Use of Multiple Gages Forecasters dislike the use of only one rain gage to verify their probability forecasts, even though a point probability is correctly verified that way. However, they do have a point, because their forecast, as noted earlier, is the average point probability in the forecast area, since it should apply equally to any point in the area. If we had a number of gages in the forecast area, how should they be used to properly verify the probability forecast? It would not be correct to have measurable rain at any gage be called a rain event because that would be verifying the areal probability, not the point probability. One correct way would be to apply the forecast probability to each gage-point and verify each gage-point separately, then average the result. One can see from this that as long as there are no local effects in the forecast area that cause some places to regularly get more rain than other places, the result from any gage and result for all gages averaged together would be the same in the long run. Thus the main advantage of multiple gages is that it shortens the time to get a representative sample. However, a sample of 12 months of forecasts verified as a set should get close to the same result as using many gages, especially since we are verifying rain frequency, QPF probability verification will be more difficult and require larger samples. The disadvantage of multiple gages is the workload of both gathering and processing the data. 36 e. Modification of Brier Score If an adequate number of rain gages existed in each local area or the actual areal coverage could be obtained reasonably from radar, it should be better to use a further modification of the Brier score, as discussed by Curran and Hughes (1968) and by D. Smith (1977), for verifying the average point probability as forecast for the local area. The form of the score would be the same as the Brier score shown earlier except that the E in the score (Exj. 3), instead of being only zero or one as in the true Brier score, becomes the areal coverage observed, i.e., the portion of the local area actually receiving measurable precipitation in the time period of concern. Such a score would take out some of the scatter shown earlier and would certainly give much higher scores and therefore a psychological lift to the forecasters. It would greatly alleviate the trace- measurable problem and it would also alleviate other problems of using one location to obtain the verifying data, a method that could also compromise the probability a bit, at least for the first period as discussed earlier. This method would also allow a representative sample to be achieved faster because the minimum number of forecasts that could make a forecast probability reliable would be only one (if 10 gages were available), no matter what the probability, instead of, for example, ten forecasts for a 90% probability. The method then overcomes the first two (the main) reasons for non-acceptance of a verification system mentioned by Gringorten (1967). It could be done now in selected locations because some places have reasonably, good gage coverage; however, the logistics of verification would be considerably increased as a result. In a sense this new score would be saying that it is impossible to forecast exactly which part of the local area will be hit by rain when all parts are not hit, and this generally is the case except where strong local effects are acting. Thus a perfect forecast of the average point probability would be that of the observed areal coverage. Such a forecast is possible, so perfect forecasts would be possible. They are not possible now except in areas where the areal coverage is always zero or 100%, if such areas exist. However, the best possible score now is still achieved by forecasts equal to the observed areal coverage, even though the Brier or Skill Score would show them to b£ imperfect. Nothing in the above should be taken as saying one should issue forecasts of areal coverage. Point probability is the thing, (see section 4d) . For those wanting more on verification in general, Muller (1944) gives a summary of 55 older papers on the subject listed in chronological order from 1884 to 1943. A discussion of a comprehensive 37 probability verification program and its use is given by Hughes (1967b) and the next section of this paper is devoted to problems shown by such a comprehensive verification in the past. At the other extreme is a yery recent paper by Stael von Hoi stein and Murphy (1978) which discusses the family of quadratic scoring rules, of which the Brier score is a special case. Perhaps other scores need to be used. This will probably be the case if justice is to be done to probability forecasts of such things as ceiling height or the amount of precipitation, because in these situations it would seem that a small error should not be penalized as much as a large error. This sensitive-to-distance aspect is discussed in the paper by Von Hoi stein and Murphy and its references. 8. Problems shown by Verification The material below is based on Hughes (1979), which in turn is based on over 13 years of monthly probability verification for each of 66 Weather Service offices and for each forecaster in each office. The verification data gave rapid feedback to forecasters and extensive subset verification, such as season, time of day, lead time, individual forecaster, and guidance. Such subsets are essential because bias is the main problem, and many types of bias have positive and negative aspects that can eancel or appear minor when subsets are combined. The basic score used is the skill score discussed above. Bias is given by B = R F ' R x 100, (7) R Where Rp is the forecast frequency of precipitation of the set of forecasts, which is simply the sum of the probabilities, and R Q is the observed frequency of precipitation for the periods forecast. A value of zero indicates no bias. Bias is subject to manipulation by forecasters, but it should be reduced only by methods which will raise the skill score; therefore, it is necessary to instruct forecasters in the proper and improper ways to do it. The measure is a quick tool to spot problems;, however, bias in subsets can compensate when the sets are considered together, so subset usage is highly desirable, as will be shown later. While zero bias is the goal, in subsets and overall, it is impossible and perhaps undesirable to obtain in the usual sample sizes used. In a year's forecasts—about 4400 from one office (from three probabilities per shift and four shifts per day) —an acceptable and probably irreducible overall bias would be less than 10%. However, overall biases of less than 20% are sometimes hard to reduce because the problem is more random error than systematic error. Essentially 38 zero bias should exist in probabilities of 0% and 100% no matter how small the sample size. a. New Forecaster Bias Virtually every forecaster, when starting to use probability forecasts, has systematic error of initially overestimating his/her skill by excessive use of very high and very low probabilities. Forecast skill naturally decreases with increasing forecast lead time (the time between forecast preparation and the beginning of the period for which the forecast is applicable). This means that the usable range of precipitation probabilities available to the fore- caster shrinks with lead time to the single value of the climatological frequency at some time in the future, something like that shown in Fig. 6 (Hughes 1966a). Although most forecasters can reliably use precipitation probabilities from zero to 100% in the first 12-h period, by the third such period the probability limits for reliable forecasts have shrunk considerably. A diagram such as Fig. 6 therefore serves as a useful warning to new forecasters who are overly tempted to use extreme probability values in long lead time forecasts. When constructed from past forecasts as discussed next, it can show forecasters specific limits of their skills. Fig. 7 demonstrates how range limits can be obtained after adequate data become available (data taken from an actual forecast sample). The points indicate the observed frequency of precipitation for each forecast probability. The solid line indicates equality of these two, giving the desirable goal called "perfect reliability". The dashed line indicates the effective upper limit of skill. It was calculated by obtaining the observed frequency of precipitation for the set of unreliable forecasts. These were the high probability forecasts, i.e., 60% or more. This frequency is about 60%. One could also consider 70% as an upper limit, but probabilities > 70% have an observed frequency just under 65%and the set therefore gets a slightly poorer score at 70% than at 60%. For the lower limit, the zero forecast probability shows an observed frequency of about 5%. Thus the limits in this case are 5% and 60% (or possibly 70%). This can be repeated for the other forecast periods and for each forecaster. These limits can change over a sizable period of time due to a change in forecast skill. They are usually dependent on the climatic frequency of precipitation as well, both tending to be higher (lower) for locations with a climatic frequency that is higher (lower). Once such limits are obtained, they should not be used as absolute boundaries, but should act as a "red flag" warning to proceed with caution. They can be exceeded, but only under unusually favorable conditions, for example with a slowly moving major storm, especially a hurricane, or in a pronounced dry spell. Reasonable values for the present for 12 h forecast periods in a 39 \ v • !—♦ 1 1 1 !• I - \i • — 1 i 1 I 1 • 1 1 1 IVS <4— 00 o >» O S oo I s - = ■+-> A -r- O « £i co -O ••— o >— - S D- m >*■ en (/> T- O « r- -* o .^ £ E ©5 s - eo iT Aousnbaij pa/uasqo A)i|iqeqoJd iseoejoj 40 climatology such as that of Iowa would be for "red flag" limits of 2-80% for the second period of the forecast and 5-70% for the third period. b. 6 h and day-night bias Two other widespread forecast problems are evident from the bias values given in Table 2. These values were computed as an average from first period forecasts made at 66 stations in a 2 year period during the earlier years of the Central Region program. N indicates forecasts valid for the night period, D indicates day- valid forecasts, and subscripts 6 and 12 indicate forecasts for period lengths of six and twelve hours respectively. The warm season was April through September, with the cold season October through March. The first problem evident from these data is the overforecasting of precipitation in 6 h periods no matter what the season or time of day. This is the most prominent, widespread, and persistent bias noted in the entire verification program. That this overforecasting stems mainly from a misunderstanding on the part of forecasters of the effect of forecast period length on forecast probability, as mentioned earlier, was noted by Winkler and Murphy (1968) and fully confirmed by Murphy and Winkler (1974b). Climatological data such as that of Jorgensen (1967) show that over most of the United States, the climatic frequency of precipitation decreases about one-third when the forecast period length decreases by one-half. Forecasters tend to ignore this when updating a 12 h forecast to a 6 h forecast, so that if conditions have remained essentially unchanged, they tend to leave the precipitation probability unchanged instead of on the averane reducing it by about one-third, as they should. This produces a considerable overforecast for the 6 h period. Of course, in some cases if later events show that the 12 h forecast being updated was wrong, the 6 h probability may correctly be higher than the 12 h value. However, in the long run, if the 6 h forecasts are to be reasonably reliable, the average probability must be close to the precipitation frequency. For this reason, the probability for very short periods, such as an hour or so, must be quite a bit lower than even the 6 h probability, on the average . This means that with the same lead time it is harder to forecast the yery high probabilities reliably in a 6 h period than in a 12 h period. Conversely, it is harder to forecast extremely low probabilities in the 12 h period. Improperly adjusting for the length of the forecast period is probably a poorer way to think of this problem than improperly adjusting for the climatic frequency. This is especially important and noticeable when the precipitation frequency is markedly different 41 Table 2. Bias values (%) from forecasts made at 66 stations in a two-year period for night-valid (N) and day- valid (D) 6 and 12 h first periods only. N 12 N 6 D 12 D 6 Warm season 1 23 12 40 Cold season 6 22 11 41 42 among the 6 h periods. An excellent example of this is Denver, which has precipitation frequencies for summer, according to Jorgensen (1967) for 6 h periods starting 0000 GMT, of .15 and .05 (early tonight and late tonight, respectively) and .03 and .14 (this morning and this afternoon), while the 12 h periods have .17 and .15--essentially the same. With an update forecast made in late morning, with the first period being "this afternoon" (6 h), the climatic frequency for "this afternoon" is little different from that for "today" (.14 vs. .15), so little or no reduction of the probability is necessary, on the average. However, for the update forecast made 12 hours later, the frequency for "late tonight" is much smaller than for "tonight" (.05 vs. .17), and a large reduction must usually be made if the forecasts are to be reliable. In a 12 month sample of warm season forecasts made for Denver, the bias for "this afternoon" was only 3%—trivial — and about half the bias would have been if the 12 h probability were used as a 6 h forecast. That for "late tonight" was 95% which is very large but still only about half what it would have been had the 12 h forecast been used. These points indicate insufficient knowledge of the necessary adjustment. If the 12 h climatic frequencies were used as a 6 h forecasts, the biases would be (.15 - .14)/. 14 = 7% and (.17 - .05)/. 05 = 240% for the afternoon and late tonight periods, respectively. This clearly shows that each office must compare its 6 and 12 h probabilities and take appropriate action on the update forecasts or when splitting a 12 h period into two 6 h periods. Getting forecasters to "think small" when updating a 12 h probability forecast to a 6 h forecast has been a lengthy and difficult task, and it is not yet fully accomplished. It is probably aggravated by the fact that it is not necessary for other variables forecast, such as temperature, ceiling, or visibility. However, unless the probability verification is concerned with the 6 h period, as a subset (and few are), this bad bias will remain undetected. The opposite problem with period length came to light recently with a forecast which gave 30% for this afternoon (a 6 h period) and 40% for tonight (a 12 h period). When queried, the forecaster said that the higher probability for tonight was because a line of showers was due in the area early tonight. This forecaster did not realize that this was an inconsistent set' of forecasts for that reasoning. When going from a 6 to 12 h period one has to do the opposite of think small, and think big. A first guess at a 12 h probability starting with the 6 h value of 30% would be about 45%— half again as large, based on climatology. But if the hour-by-hour probability was to increase as the line of showers came in, the 45% should be raised to perhaps 60%. Thus if 30% was a good forecast for this afternoon, the 40% was poor for tonight. 43 The second problem seen in Table 2 is that of greater overforecasting of precipitation during the daytime regardless of period length. This bias is not universal. Both of these problems are more evident in Table 3 based on forecasts at one particular station for a 12 month set of warm season months. The 6 h bias is severe in the daytime. The last column in the table shows that the prominent daytime bias would revert to a much smaller bias for the 12 h forecasts if day and night forecasts had been combined. What is the reason for this daytime bias? In this case it appeared to be related to the time of occurrence of "afternoon" showers. Even though forecasters think of them as occurring in the afternoon or evening, they expected too many of them to start in the afternoon period, i.e., used too high a probability, for day-valid forecasts. Knowledge of the hourly frequency or precipitation can assist in such forecasts. In the office from which the Table 3 forecasts came, the time of maximum precipitation frequency was essentially at the dividing line between the afternoon and night periods (see Topi! 1963). Realization of their bias and learning of this cli- matological fact greatly reduced the bias in the future. By the way, some mignt say that the dividing time between forecast periods should not be near a time of maximum precipitation frequency, but should be adjusted so as to bracket that time. However, the forecast periods should really be proper for the decision maker, not the forecaster. They are reasonably well suited to the user now, as mentioned earlier, because weather dependent decisions are usually either for the work day or for the evening play period. c. Variations among Forecasters Some of the variation in scores among forecasters could be due to attitude. H. Roberts (1968) said that "small stakes may not provide the incentive for a careful assessment." The large majority of forecasters are very conscientious and work hard to get the best possible product. Perhaps a comprehensive verification with individual forecaster scores available to forecasters and their supervisors, plus the comparison with MOS guidance and a comparative ranking of forecast offices provides additional incentive for maximum effort. Table 4, from real data for a particular station, further emphasizes the importance of forecast verification by subsets. These bias data demonstrate the large differences which can exist among forecasters at a station. Forecaster 5 appears to have no appreciable problem. Forecaster 1 has mainly a 6-h problem, but is bad. Forecaster 2 has a prominent 6 h bias plus more of a problem on shifts producing forecasts near 4 am and 4 pm (lead times of 2, 14, and 26 hours) which requires three 12 h forecasts instead of one 6 h and two 12 h. Since the latter set of forecasts is at update time 44 Table 3. One-station forecast bias (%) for two warm seasons, all periods for various lead times. Lead time (h) Night Day Average 12 h forecasts 6 h forecasts 26 1 39 20 20 -1 33 16 14 3 40 21 8 1 30 15 2 -2 39 18 17 91 54 Table 4. Individual forecaster bias for various lead times, Lead Time (h) Forecaster 2 3 4 12 h forecasts 26 13 28 24 12 5 20 21 6 22 16 -2 14 8 20 26 20 8 8 8 22 37 -13 2 4 32 19 17 -15 6 h forecasts 82 56 27 36 Number of forecasts 876 840 819 792 834 45 and is for essentially the same forecast periods as made by the previous forecaster, whereas the others are more original, this could reflect less experience or skill. Another bias of some forecasters, which was fairly common and pronounced early in the program, is that of overuse or avoidance of particular probabilities. If the verification data show the number of forecasts in each probability, it is easy to note irregular variations in frequency. Most frequency curves for three forecast periods combined are bimodal , especially in the warm season, with a maximum at zero probability and another near the climatic frequency. A major deviation from a smooth-curve frequency distribution suggests bias if it occurs in a sizable sample of forecasts. The most common bias is the underuse of 50% probability, although overuse also exists. The avoidance of 90% is another abnormality. The 50% problem exists to some extent even today as a vestige of the categorical concept. Since the best forecast on a particular day would be the area! coverage observed on that day, as stated earlier in the discussion of areal coverage, and there is little reason to believe any particular areal coverage has a much different frequency from a nearby value (other than zero frequency), there is no natural reason for other than a smooth frequency curve, and 50% probability is the best forecast on some days, as a result. See D. Smith (1977) for areal coverage frequencies derived from radar. These verify the above. d. Skill vs. Distance Another problem is that forecaster skill may deteriorate as distance of the forecast area from the forecaster's home station increases. This problem is also more serious for some forecasters than for others. Table 5 is based on data collected by Sanford Miller, an NWS forecaster now retired. He used three years of routine forecasts made by himself and fellow forecasters for three 12-h periods twice a day at his office. A is the home station, and the distance of the other stations from the home increases from left to right in the table with maximum distance about 550 km. Note that the average score decreases as the distance from home increases. However, the precipitation frequency also mostly decreases with distance and may contribute to the result (see Figure 8 and its discussion below), but this would not be decisive. Note also that the scores were nearly homogeneous at the home station, a result in accord with the study by Gregg (1969), since Miller et al. had long experience at his location. However, the scores are far from homogeneous at the other locations, suggesting that Gfcegg's finding applies only to the home station. Note also that the scores for forecaster 1 have a small range, while those for forecaster 5 have a large range, with most of the problem at stations D and E. This suggests that forecaster 5 may be deficient in knowledge of the climatology of these locations, and further training is required. 46 Table 5. Individual forecaster skill scores for stations at different distances. Station Forecaster D 21 17 15 14 14 25 23 22 13 9 26 26 19 12 17 26 25 22 13 8 25 19 19 6 7 Average 25 22 19 12 11 Precipitation frequency 0.187 0.172 0.189 0.135 0.126 5 10 15 20 25 30 35 40 45 Precipitation Frequency Finure 8. --Skill score vs. precipitation frequency (winter) 47 While Gregg's result of station homogeneity may well apply to the home location and experienced forecasters who have worked together for some time, it doesn't necessarily apply to less trained persons or persons less experienced at a particular location. Thus even verification for only the home location can show up major differences among forecasters. As an example from a particular station, the six forecasters on station had skill scores of 18, 8, 27, 29, 18, and -98. It's obvious that the score of -98 indicates a forecaster greatly in need of help. One must be certain to check that the number of forecasts involved is sufficient to make the score representative, and that the precipitation frequency was not highly abnormal for this one forecaster. However, this was not the case. Instead, he had come from a part of the country where there was a completely different climatology, and he had not yet adjusted. His guidance was much better than -98, so he was told to follow his guidance without change, until he had adjusted to the climatic conditions at the new station. e. Sample Size For routine verification, we experimented with the combinations of months to be treated as a group. We soon felt that seasonal differences should be significant, mainly because of areal coverage differences. We first used a 6-month season, using only one season at a time. This is satisfactory for whole-station figures, but the number of forecasts for individual forecasters, even those fore- casting irregularly, was small. There was not a sufficient number of forecasts of the lesser used (high) probabilities so as to adequately judge bias. While we have continued into the present with monthly verification of 12 months of data (two 6-month seasons; add a month's data and drop a month's each time), we have also made some special combinations to bring out specific features in the data. Some of these are discussed below. There now exist over 13 years of forecast and guidance information for 66 stations, and the potential of these data has not been exhausted. f. Seasonal Effects In Fig. 8, from Hughes (1968d), skill score is plotted for each station in the region against the observed precipitation frequency for all forecasts in two 3-month winter (D-J-F) seasons. The regression line shown (solid) is for the points left of the dashed line and clearly shows a dependence of skill score on precipitation frequency, with a correlation coefficient of 0.40. If the points to the right of the dashed line were included, there would be little dependence, especially as determined by linear regression. These points were omitted because they are a unique group with a special problem. They are all frequently in air that has lake-effect precipitation, as they are all the NWS offices in upper and lower Michigan, except Detroit, plus Chicago, Fort Wayne, 48 and South Bend, around the south end of Lake Michigan, and finally Duluth on the west shore of Lake Superior. The three stations closest to but to the left of the dashed line probably also have some lake effect as they are Detroit, Milwaukee, and Green Bay. The lake effect that is apparently suppressing scores is probably the lake-created snow flurries acting through the trace problem. Many times in the winter season the lake effect produces small amounts of precipitation in which the possibility of it being a trace instead of the measurable amount being forecast is quite great, especially since the precipitation is snow and therefore harder to measure accurately. The problem shows up through the equation P = P - P + (8) m n t ' where P is the probability of measurable precipitation being forecast, Pis the probability of any precipitation, and P. is the probability of only a trace amount. Thus even if the forecaster feels strongly that there will be lake-effect snow in his area (P a is large), but the amount will be small, as is common, P. will be fairly large so his forecast of P can not be large. This causes the deviations from climatology to be smaller, and a poorer skill score results. The problem is further aggravated because the lake effect creates many more days with precipitation. Because these days are mostly with small amounts, the skill score stays low for them, and, on the other hand, there are now fewer no-precipitation days on which to catch up some points in the score by forecasting considerably lower than the climatic frequency. Similar studies with the fall and spring seasons (Hughes 1968c and 1969a) have shown that the dependence of score on the precipitation frequency is about the same in these seasons as in winter. However, the lake effect is present in the fall but not in the spring. This variation in the lake effect is reasonable when one considers the seasonal variations in air-water temperature difference, so critical in lake-induced showers. The maximum difference is in the depths of winter because the lake temperature changes little after October as it approaches and then slightly passes the temperature of maximum density (about 4 C), while the air continues to cool quite a bit possibly into early February. Thus the lake effect is strongest in winter, but is strong even in the fall. Of course, in spring the warmer air over the cold water suppresses convection. Incidentally, the S score average of all stations was a little lower in the fall than in the winter, and a little lower still in spring. To check out this lake effect on later data, 12 months of cold season (October-March) data ending March 1978 were examined. The data 49 showed the same character. From all these data it is clear that the strongest lake effects on scores are in the upper, peninsula of Michigan and around the southeast shore of Lake Michigan. This dependence of skill score on precipitation frequency was hypothesized by Hughes (1968a) as a parabola-type curve showing a maximum score at a precipitation frequency of about 50% and approaching zero score as probabilities of zero and 100% are approached (the skill is not defined at precipitation frequencies of exactly 0% or 100%). This dependence is developed further by Glahn and Jorgensen (1970), who showed the parabola-type relationship with the Brier score of a number of stations for one cold season and one warm season. They also showed a lesser dependence in the skill score; thus the skill score does remove some effect of the precipitation frequency on score. The dependence of score on precipitation frequency noted above is in contrast to that found by Sanders (1973) and by Bosart (1975). However, the reason for the difference is probably that their samples were for precipitation variations at one location. Experience clearly shows that forecasters underplay deviations from normal, thus not showing the full effect of variations in precipitation frequency. But when we have different stations with different climatologies, the scores generally do reflect this difference, except as seen next. Figure 9 (from Hughes, 1968b) shows the same type of data for summer. Note that there is no appreciable dependence of score on precipitation frequency. Note also that the skill score is much lower, the lowest of the year, averaging about half that of winter. The reason for the lower value is most likely areal coverage differences, as discussed earlier using Eq. (1) and (2). Because summer rain is mostly spotty in showers, the areal coverage is smaller than in winter, thus the average point probability tends to be less, and forecast probability can't deviate from the climatic frequency, resulting in a lower skill score. It is questionable that the differences in score in summer are due to variations in the ability of the forecasters at different locations, but the reasons are unknown. However, nine of the top ten scores are from Great Lakes states. An attempt to see if the scatter was related to differences in the average areal coverage among stations was not conclusive because of the lack of adequate data for many locations, but the limited data suggested that no major areal coverage differences exist in the region in the summer. When use of electronically digitized radar systems becomes widespread, data from them can be used to reasonably evaluate the areal coverage for many locations somewhat in the manner used by D. Smith (1977), and thus prove or disprove this point. 50 25- 20- o) 15- o u I 10- CO 5- — -5 1 • • • • • I J L 5 10 15 20 25 30 35 Precipitation Frequency Figure 9. --Skill score vs. precipitation frequency (summer) 51 Further scatter might be eliminated if one were to normalize the data for the persistence of precipitation, such as done by Glahn and Jorgensen (1970). This is because when precipitation extends over a long period, i.e., it persists, it is fairly easy to forecast higher probabilities, and thus obtain better scores, in those forecast periods which come after the start of the precipitation. Another way out of this problem would be to use a skill score which compares the forecast probabilities to the persistence probability rather than the climatological probability. This might be better for the first period forecast, but less reasonable for other periods. This could be done for many locations, because persistence probabilities for various period lengths and lead times exist in the work of Jorgensen and Klein (1970). Figure 8 and the above discussion show that the skill score has at times some dependence on precipitation frequency and a lot of dependence on the trace frequency. These factors should be allowed for in comparing skill scores of stations because they are factors over which the forecasters have no control. The NWS Central Region now has a program in operation (see section 12) which compares the forecasts at its WSFOs on a 3-month seasonal basis after adjustment of scores for precipitation frequency, the frequency of small precipitation amounts, the persistence frequency, and the quality of the MOS guidance. g. Characteristics of Precipitation Probabilities As a result of points made earlier, some characteristics of precipitation probabilities are: 1. If there is to be perfect correspondence between the forecast probability and the relative frequency—perfect reliability--the average forecast probability must equal the relative frequency of precipitation. Therefore probabilities tend to be lower in drier climates. 2. Probabilities tend to be lower the shorter the length of the forecast period, because the relative frequency is usually lower and 1 . above applies. 3. Probabilities tend to be lower as the chance of a trace amount increases because account must be taken of the possibility that the weather system will yield only a trace. 4. Probabilities tend to be lower when the precipitation is by spotty showers rather than a widespread rain shield because the probability at any given point in an area is directly related to the areal coverage of the precipitation. 52 5. Probabilities tend to be lower when there is less persistence of precipitation because certainty is directly related to persistence, 6. Probabilities tend to be less extreme and the use of the wery low and especially the ^/ery high values is less frequent as the lead time to the forecast period increases, until eventually the range has shrunk to the single value of the climatic frequency. 9. Improper Bias Reduction Many of the problems noted here have dealt only with bias--a lack of reliability. When these biases are of an organized type, such as those discussed, they can be removed or reduced by a vigorous verification program, resulting in a better score and more useful forecasts. However, artificial adjusting of bias is bad because it lowers the ultimate worth of the forecasts. An example of such improper adjustment would be to intentionally downgrade a 100% probability forecast to a 70% forecast simply because this would improve the bias of the 70% category which is known to have too few precipitation events in past 70% forecasts. This is trying to play the system, but, as mentioned earlier, it has proven that the system is not playable and it hurts the skill score to do it in any form? Also, as the earlier forecasts are gradually dropped from the set being verified, the bias of the 70% forecasts would change s i §n . To properly correct the 70% forecasts, look at nearby values. If the 60% and 80% probabilities have about enough precipitation events, or too many, the 70% problem is most likely random error and it should be neglected because it will right itself eventually without any active adjustment. However, if some surrounding probabilities, particularly higher ones, are also deficient in precipitation events, then over-forecasting bias is likely and should be adjusted for in the future by easing off the usage of the high probabilities. Other systems have been tried and also found to hurt the skill score and the utility of the forecasts. THERE ARE NO KNOWN WAYS TO 8 This is easy to prove in this case, as follows: Once a forecast is made and filed, there is no way the forecaster can change its score. Thus all past forecasts will have the same score no matter what the forecaster does on the present forecast. Therefore, since the 70% forecast gets a poorer Brier score when rain occurs than a 100% forecast, the score of the whole set would also be poorer if 70% is forecast. This reasoning can be used to prove false any artificial scheme involving past forecasts. Correctly "playing the numbers game" involves only how to make the forecast under consideration get a better score. 53 HONESTLY BEAT THE SCORING SYSTEM 10. Improving on MOS The MOS forecasts should be verified in the same way as the forecaster's forecasts. This will then show the bias and relative quality of the MOS forecasts. However, since MOS equations are derived on a sizable sample of data and the derivation method forces the probabilities to be essentially reliable on the dependent data, any biases noted may be transitory, i.e., due to the idiosyncracies of the smaller sample being verified, and not universal and long-lasting. This is especially so where there are no pronounced local effects in the area to which a particular set of MOS equations apply. Where there could be major differences in local effects in a MOS area, as in the western U.S., biases noted could be long-lasting and should be worth determining. Eventually, single-station equations for precipitation probability, as now exist for maximum temperature, should further improve MOS, especially in the west. The relative quality of the MOS forecasts compared to those of the forecasters who use it as guidance is important because it can be an additional standard to judge a forecaster's efforts. This in effect is saying that the amount of "improvement" of the forecasters on MOS could be a universal measure of quality. The difficulty with this is that MOS, because its equations are for areas, not points, may have its most difficult time in places where there are strong local effects, while the forecasters do the best in such places. This is one reason why the normalization procedure discussed in section 12 has the MOS probability as a variable in the regression equation. Nevertheless, the relative quality of MOS is important because decisions as to whether or not to continue to modify MOS will be made from such data. From a listing of several year's forecasts made by the WSFOs in the Central Region, it was noted when considering all three forecast periods that in the cold season 25% to 50% of the time the forecasts by the WSFOs were exactly the same as that of MOS. The higher percentages were in the drier portions of the region, and the range was smaller in the warm season with its smaller range of precipitation frequencies (see Figs. 8 and 9). Also, about 96% of the time the WSFO forecasts were within 30% of the MOS value. Perhaps surprisingly, none of these values changed significantly with lead time, on the average. Also examined was the portion of the probability range where the forecasters made the most improvement on the MOS. This was done by taking many months of forecasts and putting them in an array of 169 boxes dependent on both the 13 MOS and the 13 forecasters' probabilities. It is obvious and well known that the forecaster's improvement on MOS is greatest for the first period of the forecast. 54 From the arrays it was also noted that the improvement on MOS in the first period was much greater for forecasts which were higher than MOS. It was the same way, but to a lesser extent in the second period, with little difference in the third period. Interestingly, in the first and second periods, this is in spite of making fewer forecasts in the first and second periods than in the third period which were higher than MOS. This strongly suggests that the forecasts are quite a bit more successful in using the extra data they have for the forecast to improve on the beginning of the precipitation event as compared with the ending of the event. Naturally this success would diminish as the lead time to the forecast increases. One way to maximize the forecaster input compared to MOS is to be aware of what the scoring system is saying about what is best to do for the current forecast. This is "playing the numbers game" the right way, by working to get the best current forecast. This is good if scores improve, because score is directly related to utility, and it can improve scores. Let us look at some examples, using the Brier scores given in Table 6. As said earlier, making a major change in guidance yields a big gain if it is in the right direction. This is fairly obvious and is mostly so in any scoring system, but the squaring part of the Brier score is what makes it so advantageous. Of course, if the forecaster is wrong, guidance makes the big gain instead of the forecaster. The key point here is the fact that changes are worth more at some probabilities than at others, and in one direction more than the other . Proper use of this can improve scores. For example, correctly changing MOS of 10% to 0% gains the forecaster only 0.01 units of Brier score, but correctly changing 80% to 70% gains 0.15 units (.64 - .49)— 15 times as much. Also, correctly changing MOS guidance of 10% to 20% gains 17 times as much (.81 - .64) as changing it correctly to 0%. Not to take these points into account in changing guidance is like playing craps in Vegas without knowing the odds of various rolls--you go broke. These points are probably the reason that consensus forecasts do so well. But if a stupid thinker like consensus can do well, the thinking forecaster should be able to do better if acting with full knowledge of the consequences of various actions. To look at it a different way and show why consensus works well, look at this example. If you change a 10% MOS forecast to 0%, you gain .01 if you are right, and lose .19 if wrong (1.0 minus .81). Thus if you are correct in such changes only 19 times out of 20, ou have gained nothing for your effort. You must do better than 9 out of 20. On the other hand, if you were to raise the 10% to 20%, you gain .17 if right (.81 - .64), and lose only .03 if wrong (.04 - .01). Thus you need to be right only one time out of six such changes, a dramatic difference. All these points have been discussed in detail by Hughes (1969b) and summed up as: 55 \ Table 6. Half Brier Score. PROBABILITY % NO "RAIN" OBSERVED "RAIN" OBSERVED .00 1.00 2 .0004 .9604 5 .0025 .9025 10 .01 .81 20 .04 .64 30 .09 .49 40 .16 .36 50 .25 .25 60 .36 .16 70 .49 .09 80 .64 .04 90 .81 .01 100 1.00 .00 56 1. All correct changes to guidance are a gain to the forecaster, and even small gains are significant because the average gain over guidance is usually small. 2. A correct change in the probability guidance that is initially in the direction of 50% will net more to the forecaster than a like change away from 50%, with the amount of the gain greater the farther the guidance value is from 50%. Or, a small correct change toward 50% is usually of more value than a much larger correct change away from 50%. 3. Verifications show that forecasters make changes away from 50% much more often than toward 50%. This may be because of the conservativeness of the guidance or the lack of awareness of the forecaster of the opportunity for gain with changes toward 50%. 4. THEREFORE, forecasters should be alert for opportunities to make changes toward 50%, and they should be aware of how conservative they must be in making changes away from 50% if they are to make such efforts of value. These points may seem to be contrary to improving the resolution of the forecasts, but this is not so. All changes to M0S which are in the right direction, i.e., toward zero if it doesn't rain or toward 100% if it does, improve on M0S through resolution regardless of whether the changes are toward 50% or away from 50%. This; can be proven for sets of forecasts as follows: Take a set of perfectly reliable M0S forecasts of 30%. Assume the forecaster makes changes only toward 50%, say only to 40%. The set being changed to 40% will have a better Brier score if the observed frequency of precipitation on the set is over 35.0%, i.e., more than halfway from the M0S value. The set of unchanged forecasts will get a better score if the set removed has a frequency larger than 30.0%. Thus both sets can have better Brier scores, and yet both sets can have (and at least one set will have) imperfect reliability. We can therefore say that the resolution has improved so much that it overcomes the reliability loss. Clearly, then, going toward 50% can improve resolution. A listing of all the possible gains and losses is given in Tables 7 (a) and (b). These were created by Wilbur Wray of NSSFC, who said that he had improved his score by use of the tables. It would be wise to have these tables at the forecast desk for easy reference. 11 . Trend in Scores Have scores improved due to this type verification and other factors? Since there have been no breakthroughs in forecasting, one would expect only small changes from year to year, with some ups and downs. Because of this, it was decided to take 4-year average 57 CO u 3 CJ o o d O •H •u CO 4-1 •H CU •H O 01 J-l O c d CU •s ct) o o rH CO 3 d •H a CO o 2 CD 5-i O o CO u CU >> M-l CO M-l !M •H & CO r>. cu rH 43 CO H en oi E> u o S3 o M H a M Cn M o Pm o CO CM o Ch CO § o o O O o o o o ON ON vO ON rH ON st 00 st vo rH m vO co ON rH o i-H rH rH rH O on rH 00 rH 00 rH 00 o oo CN m vO vO m m st CM CO rH O ON rH r O oo vO st st vO CO vO o vO m m 00 st ON co 00 CM m rH o rH 1 vO CO 1 o On st ON st ON st 00 st m st o st CO CO st CM CO rH o m rH 1 CM CO 1 rH m 1 o vO vO CO en vO CO m co CM co P-- CM o CM rH rH O CO rH 1 00 CM 1 m st I st vO o in CN m CN m CN st CN rH CM vO rH ON o o rH rH 1 st CM 1 ON CO 1 vO m I m 1 o st vo rH vO vo H m rH CM rH o o ON o 1 O CM 1 CO CO 00 st 1 m vO I st 00 o co ON o ON o ON o 00 o m o o o 1 vO rH 1 CM 1 o st 1 m in I CM I H ON 1 o CN st o st o st o CO o o m o 1 CM H 1 rH CM 1 CM CO 1 m st 1 o vO 1 l VO ON 1 o rH rH O rH o rH o o CO o 1 00 o 1 m rH 1 st CM 1 m CO oo st 1 CO vO 1 O oo 1 ON ON 1 m o m CN o o CN o o o rH o st o 1 ON o 1 VO rH m CM I V0 CO 1 ON st st vO 1 rH 00 1 o o H 1 CM O st o o o o CN o o 1 rH o 1 o 1 ON o 1 vO rH 1 m CM vO CO 1 ON st 1 st VO 1 rH 00 1 o o rH 1 O O o st o o o m CN o o rH o st o ON o vO in CM vO CO ON st st vO rH 00 o o 1 1 1 1 1 1 l 1 1 ' ' rH o o CN o m o o rH o CM o co o <* o m © vO O O 00 O ON O o rH saoa ivooi 58 to u u a o C o •H ■U ct) •U •H a. •H o QJ >-l ft c CU •f CO C •H CO O S cu M O o CO u cu •H PQ 03 cu r-- o on C rH CU >-i •> CU >> <+-i CO 4-1 U •H !2 r- cu rH a) H CO P4 CJ u o o M H < H M Ph rH U W P4 PU CO Ph o Pm CO o 5" o o o o nO ON o ON CO NO ON NO co m CM NO rH On O o rH o o iH rH 1 1 1 1 1 i l 1 1 1 I i O on ON 1 m ON l ON 00 1 o CO 1 CO NO 1 CO I m co 1 CM 1 m rH 1 00 o 1 CO O 1 o rH O o 00 NO ON 1 CN ON 1 NO 00 1 l o NO 1 m l CM co 1 rH CM 1 CM rH 1 m o i o CO o o o rH ON 1 oo 1 tH 00 1 CM 1 in m l o i CM 1 NO rH 1 o 1 o m o 00 o ON o O no <1- 00 1 o 00 l 1 m I CO 1 co CO 1 o CM 1 ON o 1 o O CM rH m rH NO rH o m m l rH 1 m NO l NO m I ON CO 1 CM l rH rH 1 o ON o NO rH rH CM CM m CM o NO 1 o NO 1 m l m l 00 CM i CO rH 1 O rH rH o CM CM CM CO m CO NO CO o en rH l i rH 1 CM CO 1 m rH 1 o co rH CM CO co o m 00 -3- ON o CM NO CO 1 CN co 1 NO CM 1 rH 1 o m rH 00 CM ON CO 00 m m o NO CO NO nO O rH ON rH 1 m rH on o 1 O rH CM CO m NO m m NO CM o 00 rH 00 m o O rH 1 nO o 1 o ON o NO CM rH m m NO rH 00 NO 00 ON 00 o ON CM O O 1 o NO o m rH CM co o NO rH O 00 00 CM ON m NO ON o o o o o rH ON rH co rH m nO m 00 rH ON NO ON ON ON o o rH o o CM o m o O rH o CM o CO o O m o NO o o CO o ON o o rH SdOd 1V301 59 scores for the first period forecast (when a 12-h forecast), and compare the post-MOS 4-year result (1972-75) with the pre-MOS 4-year period immediately previous. This was done for five WSFOs in the Central Region (Milwaukee, Chicago, Louisville, St. Louis, Denver). The skill scores for each station were obtained for each calendar year and then averaged for each 4-year period, with the result shown in Table 8. Note that four of the five stations have improved local scores from the earlier period (aLCL). The 3-year score 1973- 75 gave even better results. Note also that the MOS objective guidance (aGUID) had materially improved on its subjective predecessor at only two locations. Also, the sizable difference between the scores of MOS and the forecaster (L-G) shows that MOS had a long way to go to be competitive for the first period forecast. These forecaster improvements with time are contrary to that found by others, for example Sanders (1973) and Cook and Smith (1977). However, Sanders and Cook and Smith are correct for their single- station sample (Station A in Table 8 is the one Cook and Smith used), but generalizing using one location has its dangers. However, as we will see next, while their conclusions were not general at the time, they may be more general now, although the five locations used here are still not a large sample of stations and are not very geographically dispersed. Table 9 is an update of Table 8 prepared for the 2-year period of April 1976 through March 1978 (the period just after that of Cook and Smith) by averaging the skill score from the first period forecasts of a 12-month warm season set and a 12-month cold season set of forecasts. Note that station A (used by Cook and Smith) made the most substantial gain, but its MOS guidance gained even more. Note also in the average gain that the WSFOs had about a zero value, some going up and some down, while the MOS gained a bit at every location. Thus the forecasters changed little, while the MOS scores continued to improve, narrowing the margin between them. MOS is certainly competitive now, especially when it is realized that the forecasters have the advantage of later data and that their gain, if any, would be smaller for the second and third periods of the forecast. 12. Forecast Comparisons Panofsky and Brier (1963) discussed the pitfalls of verification and indicated that one of the greatest dangers lies in attempts to compare relative abilities when there are climatological differences. This danger still exists in probability forecasts. The climatological factors of importance in precipitation forecast comparisons were 60 Table 8. Average station (LCL) and guidance (GUID) skill scores for 1972-75 and change in score from 1968-71, all for first 12 h period. WSFO LCL GUID LCL -GUID Change from 1968-71 ALCL AGUID A 40 35 5 -6 -2 B 42 32 10 2 5 C 43 30 13 4 1 D 46 34 12 9 9 E 34 22 12 7 -1 Table 9. Average station (LCL) and guidance (GUID) skill score from April 1976-March 1978 and change from 1972-75 period (first 12 h period only) . Change from LCL 1972- -75 WSFO LCL GUID -GUID ALCL AGUID A 45.5 41.5 4.0 5.5 6.5 B 38.5 33.5 5.0 -3.5 1.5 C 41.5 35.0 6.5 -1.5 5.0 D 48.0 40.5 7.5 2.0 6.5 E 29.5 24.5 5.0 Average -4.5 -0.4 2.5 4.4 61 discussed earlier (section 8), and earlier yet it was mentioned that the climatological skill score removes some dependence on precipitation frequency. So if comparisons are made using this score, comparing forecasters at the same location is the safest; but even then one needs a sizable sample (2 years of regular shift forecasting?) to be able to reasonably neglect climatic differences. Comparing offices is much more difficult, and having a large sample size is by no means adequate. This is obvious from the discussion of the various climatic factors given earlier. The safest way to compare offices is to adjust for the climatic differences in some manner. Hughes and Sangster (1969b) discuss in detail a method for doing this by screening regression to select and weigh the the climatic effects. The terms in the equations involve precipitation frequency, frequency of small amouhts of precipitation and persistence of precipitation, all in a variety of forms and involving long-term and short-term (sample) values. The score of MOS was also included. However, even such an adjustment is not fully definitive, as they added a subjective adjustment to the objective ranking. It is likely though, that when a number of offices are compared on a sizable sample, those at the top of the list should have something going for them other than climatology, especially compared to those offices near the bottom of the list. They also found that a comparative score which is simply the difference in the Brier scores was little affected by the climatological factors they used. 13. Combining Probabilities There are times when users need a probability for a period that is considerably longer than our usual 6 or 12 h period for which we issue probabilities for some parameters. How to do this with the precipitation probabilities routinely issued has been discussed in depth by Hughes and Sangster (1974, 1979a). The essence of this and its main results are given here for convenience. An equation for the 3-period probability from three 1 -period probabilities is, according to a basic addition formula for probabilities: ■ P l + P 2 + P 3 " P 1 P 2 " P 1 P 3 " P 2 P 3 + P 1 P 2 P 3 " (9 > 123 Here Pj^ and p o ar e the probabilities (as decimal fractions) for the three periods separately and P is the probability for 123 the three periods taken as a whole. If a 2-period probability combination is desired, the terms on the right involving P~ should 62 be omitted. Incidentally, this equation is appropriate whether or not the three periods are contiguous or are the same length. The main problem with using this equation for the task at hand is that it assumes the events are independent of each other, an assumption not likely to be correct for precipitation for the three contiguous periods of a weather forecast. It also assumes the probabilities reliable. To determine the effect of these assumptions, the actual locally- made 12 h forecasts for 28 stations distributed rather uniformly across the north-central U. S. for one warm season (April -September) and one cold season (October-March) were treated. Each of the seven terms on the right side of the above equation was determined from the actual forecasts and then fed into a regression program as predictors, with the observed precipitation as a predictand. The regression estimate of the 36 h probability for the warm season is: P = .03 + 1.04P 1 + .81P 2 + .97P 3 - .86P 1 P 2 - 1.3SPjP 2 - -76P 2 P 3 + 123 + 1. 19P 1 P 2 P 3 (10) Note that the terms in the equation have the same sign as in the independent equation given earlier and that the coefficients are quite close to the 1.0 of the independent equation. The constant is not the zero value of Eq. (9), but .03 (3%), and is probably brought about by a lack of reliability of the forecasts in the yery low probabilities numbers. However, it is small enough that it will not have an appreciable effect on the final result. With this equation, one could calculate the 36 h probability for any warm season set of three 12 h probabilities. The cold season result was not as nice in that its coefficients were much farther from 1.0, and the signs were not always the same as the independent equation. The result from these warm and cold season equations indicates that the warm season events are nearly independent, one period from another, while in the cold season they are much more dependent. This is what one would suspect from the longer time span of precipitation in the cold season. To get around this and other problems with this approach, another basic equation was used, given as: 12 1 '2 '12 This is for combining two periods only, but it can be repeated any number of times. P 12 is the probability of precipitation occurring 63 in both periods 1 and 2. The equation thus does not have the assumption of independence of the events. Using the observed frequencies of precipitation in the same sets of probability forecasts as before, a warm season and a cold season value of P 12 was calculated, including an adjustment for the bias of the probabilities. Tables 10 and 11 result from the use of the equations, and tables 12 and 13 show the amount of deviation from independent events. Once the 24 h probability is obtained from a table, the table can be used again with that value, interpolating as necessary, and the third probability to get a probability for a 36 h period, etc. Note that the warm season events are still fairly close to independent, and the cold season is close except when both probabilities are middle values. The tables can be used with those forecasts starting with a 6 h period, but since the independence is slightly less with a 6 h period than with a 12 h period, the final probability should be a bit lower (maximum 2%) than the tables indicate. A quick estimate of the 36 h probability can be obtained from Table 14 and some judgement. Start with the highest of the three probabilities. This gives the lower limit for the 36 h probability. The second column gives the highest possible 36 h probability and occurs when all three probabilities are the same as that in the first column, and are independent. In such a case a downward adjustment for dependence (max 10 percent in middle probabilities and zero at the extremes) will give the desired result. If the three probabilities are not the same, one can use the value in the right hand column. This value is obtained by adding half the range to the highest probability (first column) and reducing this mainly for dependence of events and with some rounding for convenience in memory. These figures are probably more appropriate for the cold season, so the mid-values should be a bit higher for warm season showers because they are more independent. Notice that for first column probabilities of 20-70%, the rounded last-column values are simply the highest single probability plus 10%--easy to remember, 14. Problems for the Future We can easily see what to do about bias, and that part has been pushed vigorously. However, experience clearly shows that only a small gain in Brier score can result from such changes (see Hughes 1965, and Sanders 1963 and 1973). But to consider small gains insignificant is unwise. For one reason, the history of forecast improvement is only of small gains. Secondly, small gains can be economically significant to a large population of decision makers. Much of the deviation of the skill score from perfection would be 64 Table 10. Combination probabilities (percent) for warm season. PR0B2 2 5 10 20 30 40 50 60 70 80 90 100 2 5 10 20 30 40 50 60 70 80 90 100 2 2 4 7 12 21 31 41 51 61 70 80 90 100 5 5 7 9 14 23 33 42 52 62 71 81 90 100 10 10 12 14 18 27 36 45 54 63 72 81 91 100 p 20 20 21 23 27 34 41 49 58 66 74 83 91 100 R 30 30 31 33 36 41 47 54 62 69 77 84 92 100 B 40 40 41 42 45 49 54 59 65 72 79 86 93 100 1 50 50 51 52 54 58 62 65 69 75 81 87 94 100 60 60 61 62 63 66 69 72 75 78 83 89 94 100 70 70 70 71 72 74 77 79 81 83 85 90 95 100 80 80 80 81 81 83 84 86 87 89 90 92 96 100 90 90 90 90 91 91 92 93 94 94 95 96 96 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 65 Table 11. Combination probabilities (percent) for cold season PR0B2 2 5 10 20 30 40 50 60 70 80 90 100 2 5 10 20 30 40 50 60 70 80 90 100 2 2 4 7 11 21 31 41 51 60 70 80 90 100 5 5 7 9 14 23 32 42 52 61 71 81 90 100 10 10 11 14 17 26 35 44 53 62 72 81 91 100 p 20 20 21 23 26 32 40 48 56 65 74 82 91 100 R 30 30 31 32 35 40 45 52 60 67 75 83 92 100 •R 40 40 41 42 44 48 52 56 63 70 77 85 92 100 1 50 50 51 52 53 56 60 63 66 72 79 86 93 100 60 60 60 61 62 65 67 70 72 75 81 87 93 100 70 70 70 71 72 74 75 77 79 81 82 88 94 100 80 80 80 81 81 82 83 85 86 87 88 89 95 100 90 90 90 90 91 91 92 92 93 93 94 95 95 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 66 Table 12. Probabilities from Table 10 minus those for independent events. PR0B2 2 5 10 20 30 40 50 60 70 80 90 100 0000000000000 2 0-0-0 -0 -0 -0 -0 -0 -0 -0 -0 -0 5 0-0-0 -0 -1 -1 -1 -1 -0 -0 -0 -0 10 -0 -0 -1 -1 -1 -1 -1. -l -l -l -0 P 20 -0 -1 -1 -2 -3 -3 -2 -2 -2 -1 -1 * 30 -0 -1 -1 -3 -4 -4 -3 -3 -2 -2 -1 B 40 -0 -1 -1 -3 -4 -5 -5 -4 -3 -2 -1 1 50 -0 -1 -1 -2 -3 -5 -6 -5 -4 -3 -1 60 -0 -0 -1 -2 -3 -4 -5 -6 -5 -3 -2 70 -0 -0 -1 -2 -2 -3 -4 -5 -6 -4 -2 80 -0 -0 -1 -1 -2 -2 -3 -3 -4 -4 -2 90 -0 -0 -0 -1 -1 -1 -1 -2 -2 -2 -3 100 000000000000 67 Table 13. Probabilities from Table 11 minus those for independent events. PR0B2 2 5 10 20 30 40 50 60 70 80 90 100 2 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 5 -0 -1 -1 -1 -1 -1 -1 -1 -1 -0 -0 10 -0 -1 -2 -2 -2 -2 -2 -2 -1 -1 -0 p 20 -0 -1 -2 -4 -4 -4 -4 -3 -2 -2 -1 R 30 -0 -1 -2 -4 -6 -6 -5 -5 -4 -3 -1 B 40 -0 -1 -2 -4 -6 -8 -7 -6 -5 -3 -2 1 50 -0 -1 -2 -4 -5 -7 -9 -8 -6 -4 -2 60 -0 -1 -2 -3 -5 -6 -8 -9 -7 -5 -3 70 -0 -1 -1 -2 -4 -5 -6 -7 -9 -6 -3 80 -0 -0 -1 -2 -3 -3 -4 -5 -6 -7 -3 90 -0 -0 -0 -1 -1 -2 -2 -3 -3 -3 -4 100 68 Table 14. 36-h Probability combination estimate. Highest 12-h Prob. (lowest 36-h) Highest 36-h Prob. (independent) Range Highest 12-h Prob. plus 1/2 range 36-h est. (dependent) 1 5 14 9 9 8 10 27 17 18 15 20 49 29 34 30 30 66 . 33 46 40 40 78 38 59 50 50 88 38 69 60 60 94 34 77 70 70 97 27 83 80 80 99 19 89 87 90 100 10 95 94 100 100 100 99 69 eliminated if the score were changed to that recommended here (section 7e) by using area! coverage instead of and 1 in the Brier score, but that is only a cosmetic change for a psychological and logistic advantage. The remaining deviation of the score from perfection would be due to the lack of knowledge of forecasting. Increasing the basic knowledge of so many duty forecasters is a wery difficult task. It involves getting a better understanding of meteorological processes, making better use of observations, especially of radar and satellite, and getting a better understanding of the strengths and weaknesses of the various forms of numerical and statistical guidance given to the forecaster. Comprehensive verification should continue so as to help the new forecaster reduce bias and to keep the experienced forecaster on his/her toes. But we are now to the point where, if the forecaster is to remain ahead of the inevitably improving guidance material and thus contribute to the forecast, the "state of the art" must improve. Comparative verification and station ranking will help encourage forecasters to a resurgence of effort. However, the main way to go is through training, formal or otherwise. The forecaster also must have time in the forecast routine to use the knowledge gained. It appears that field office automation will help do this, as well as provide tools heretofore unavailable to the forecaster, which could also help. Since the main value of the human forecaster will be in the first part of the forecast, it is also likely that a return to more diagnostics is in order. That is, obtaining a fuller understanding of what exists at the initial time, and why it exists would lead to better short term forecasts. The "why" is the harder part and it will require more emphasis on the use of initial charts, satellite, radar, and peripheral data, and less on progs. Properly answering the "why" will also require considerable meteorological understanding, and thus will require well-trained meteorologists. With the gradually improving guidance, forecasters are approaching a "sink or swim" period, as seen for precipitation forecasting by Tables 5 and 6, and treading water won't do much longer. Things will have to be done differently or they may not be done by people. Automation of Field Operations and Services (AFOS) should help here, too, because it will require changing old habits and hopefully, the new ones will be more to the needs of the present. Probability may well play a larger role in the future. However, Dexter (1962) showed how not to do it when he cluttered the forecast with so many numbers that chaos resulted. What is needed is an approach which is gradual and perhaps implicit (see Hughes 1978a) so as to get the information the user needs for the best decision making. Probabilities for the public need to expand to give probabilities of various precipitation amounts. This and other limitations of 70 the present NWS probability program are discussed by Myers (1974). While not in favor now, there may well come a time when probability will be used in warnings. There already is a paper on this by Franceschini (1960) on thunderstorm warnings, and Barton (1956) noted that we must eliminate loose terminology because a "warning defeats itself if people are confused." What is the first thing you do when you hear a warning for your area? If you are like the people interviewed in disaster surveys, you take a look outside to see what it looks like. In effect you are trying to establish a probability so as to know where the threat is compared to your C/L ratio. Do you duck or just remain cautious and watching? If the event warned about is too far to be seen or identified out the window, like a flash flood or hurricane, people seek confirmation (probability) from other sources. The public is thus ripe for more information in probability terms. For discussion on this with regard to the so-called Boulder Wind events, see Hughes (1978b) For yery recent thoughts on probability from several people, see Pielke (1977) and responses to this by M. Smith (1977), Murphy (1978c), Hughes (1978a), M. Smith (1978), and Pielke (1978). As mentioned near the beginning, Brier (1944) said, "The decisions of a rational man will to a large extent depend upon his estimates of the probabilities of the different events and the consequences of them. When he is convinced that the weather forecaster's estimates of these probabilities are better than his own, he will come to him for weather information." He also said in closing that " so far as the scientific problem of weather forecasting is concerned, the forecaster's duty ends with providing accurate and unbiased estimates of the probabilities of different weather situations." Thus we must do it well, and it is likely that we will do it more in the future. 15. Public Education Many NWS forecasters feel that the public needs education concerning probability forecasts, and the need for such education has been suggested recently in the meteorological literature, e.g., Murphy (1977b). We all know, and in the same reference Murphy mentions, that there is a lack of understanding of weather forecasts by the public whether they are probabilistic or not. There will always be some misunderstanding. The point I want to make here is that while such education may seem necessary, it may only be desirable. As long as we put out the type of forecasts needed by the decision makers—point probability forecasts — it is sufficient that the public know only how to use them. I also contend, as mentioned earlier, that the public does know how to use them because they have unknowingly made probabilistic decisions in all of their decisions to date. It is rather easy to know how to use probability forecasts, as was 71 discussed to some extent under the cost/loss ratio concept (section 3b) and I am convinced that the public does know how to use the probability forecasts because doing so is so simple. If the public understands how to use them, is any education program worthwhile? Yes, mainly because of the confidence in the forecasts that such knowledge will help build, and possibly also because some of the details of the program, for example that we are forecasting for measurable precipitation, may be of value to some decision makers. What should be discussed? The most obvious thing is the quality of the forecasts in the manner most obvious to the user--the reliability of a sizable set of forecasts. This can be done by showing a set of forecast probabilities and their observed frequencies for the home location as given in the routine probability printout, or it can be shown in the usual reliability diagram (Fig. 7). Reliability is an important attribute of such forecasts and knowing the high degree of reliability that exists in practically all sets of such forecasts should increase user confidence in and acceptance of the forecasts. Discussion of how they are made, in an orderly and scientific way, as discussed earlier in this memo (section 5c) and not the ways the cartoons show, would also increase confidence in the forecasts. Then there is the trace-measurable problem, and also that the amount of precipitation tends to be greater with the larger probabilities on the average , as discussed earlier (section 4f). Exactly what the forecast periods are, especially one like "remainder of tonight", is useful, as is discussion of time changes in skill--lesser use of the yery high and \/ery low probabilities in the later forecast periods. Finally there is the areal coverage problem, discussed in depth earlier (section 4d and 6a) which leads to why you still are carrying a 40% probability when "it is raining here now", and the concept of the probability being the average point probability for the local area when only one value is given for a forecast period. The most difficult part of probability forecasting is obtaining adequate forecaster understanding. That was mentioned earlier too. To do a good job, the forecaster should know all the answers to any questions on the above points. Off-hand comments to questions by the public may be incorrect and \/ery hard or impossible to correct later in the minds of the public. The public wants probability- type information, as noted by Murphy and Winkler (1974b), because they have asked for such information in areas not now available. When we find a way to provide such information, we should do it in the future. We have already started in some of the speciality forecasts--fire weather and agriculture. It is likely that the expansion of probability into other areas depends more on our knowledge that it gives better 72 service than on the public's interest and understanding. ACKNOWLEDGMENT The author wishes to thank Roger A. Allen (NWS retired), Allan H. Murphy (NCAR) and H.R. Glahn (NWS) for help in polishing the draft report and in removing the small errors that can crop up in a work of this scope. 73 REFERENCES Barton, S., 1965: Confusion must be eliminated in weather warnings. Abstract. Bull. Amer. Meteor. Soc , 37 , 179. Beebe, R. G. , 1952: The distribution of summer showers over a small area. Mon. Wea. Rev . , 80, 95-98. Bennett, E., J. E. Newman, P. W. Dailey, Jr., C. E. Petzhold, and L. H. Smith, 1968: Precipitation probabilities in weather forecasts, Publication AY-174 Coop. Extension Serv., Purdue Univ. , 14 pp. Berkofsky, L., 1950: An objective determination of probability of fog formation. Bull. Amer. Meteor. Soc , 31 , 158-162. Bermowitz, R. J., and E. A. Zurndorfer, 1979: Automated guidance for predicting quantitative precipitation. Mon. Wea. Rev ., (in press - Feb. 1979) . Besson, L., 1904: Essai de prevision methodique du temps (Attempts at methodical forecasting of the weather). Annuaire de la Societe Meteorologique de France . Paris, April, pp. 92-97. English translation, R. A. Edwards, 1904. Mon. Wea. Rev ., 32 , 311-313. Borgman, L. E., 1960: Weather-forecast profitability from a client's viewpoint. Bull. Amer. Meteor. Soc , 41 , 347-356. Bosart, L. F. , 1975: SUNYA experimental results in forecasting daily temperature and precipitation. Mon. Wea. Rev ., 103 , 1013-1023. Brier, G. W., 1944: Verification of a forecaster's confidence and the use of probability statements in forecasting. USWB Research Paper , No. 16, 10 pp. 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev ., 78 , 1-3. , and R. A. Allen, 1951: Verification of weather forecasts. Compendium of Meteorology , Amer. Meteor. Soc, Boston, MA. 841-848. Causey, 0. Y., 1953: The distribution of summer showers over small areas. Mon. Wea. Rev ., 81, 111-114. Charba, J. P. 1977: Operational system for predicting thunderstorms two to six hours in advance. NOAA Tech. Memo . NWS TDL-64, 24 pp. 74 , 1979: Two to six hour severe local storm probabilities: an operational forecasting system. Mon. Wea. Rev . , (in press). Cook, D. and D. R. Smith, 1977: Trends in skill of public forecasts at Louisville, KY. Bull. Amer. Meteor. Soc , 58, 1045-1049. Cooke, W. E., 1906a: Forecasts and verifications in Western Australia. Mon. Wea. Rev . , 34, 23-24. , 1906b: Weighting forecasts. Mon. Wea. Rev ., 34, 274-275. Curran, J. T. , and L. A. Hughes, 1968: The importance of areal coverage in precipitation probability forecasting. ESSA TECH . MEMO. WBTM CR 24, 9 pp. Curtiss, J. H., 1968: An elementary mathematical model for the interpretation of precipitation probability forecasts. J. Meteor . , 7_. 3-17. Dexter, R. V., 1962: Confidence factors are fictional. Weather , U, 132-135. Dickey, W. W. , 1949: Estimating the probability of a large fall in temperature at Washington, D. C, Mon. Wea. Rev ., 77 , 67-78. , 1956: The use of strictly defined terms in summertime forecasts. Mon. Wea. Rev ., 84, 179-188. , 1965: Probability forecasting. ESSA Tech. Memo . WBTM SR 6, 14 pp. , 1966: Summary of probability of precipitation forecasts in the Southern Region for the period January through March 1966. ESSA Tech. Memo . WBTM SR 13, 5 pp. , 1967: Verification of operational probability of precipitation forecasts, April 1966-March 1967. ESSA Tech. Memo. WBTM WR 25, 6 pp. Diemer, E. D., 1965: Some notes on probability forecasting. ESSA Tech. Memo . WBTM WR 1, 18 pp. > 1966: Final report on precipitation probability test programs. October to March 1966. ESSA Tech. Memo . WBTM WR 7, 40 pp. Dunn, C. R., 1966: Preliminary evaluation of probability of precipitation experiment. ESSA Tech. Memo . WBTM ER, 10 2 pp. 75 Eastern Region Staff Notes , NOAA, Nat. Wea. Ser., 1973 (2/26): Comments on a survey. Emmons, G., 1940: Suggestions for improved presentation of weather information to the public. Bull. Amer. Meteor. Soc , 21, 311-316. Franceschini , G. A., 1960: A probability factor for thunderstorm warnings. Bull. Amer. Meteor. Soc , 41 , 28-30. Gentry, R. C, 1950: Forecasting local showers in Florida during the summer. Mon. Wea. Rev . , 78 , 41-49. Glahn, H. R., 1962: An experiment in forecasting rainfall probabilities by objective methods. Mon. Wea. Rev ., 90 , 59-67. , 1974: Problems in the use of probability forecasts. Preprints Fifth Conf. on Weather Forecasting and Analysis , St. Louis, Amer. Meteor. Soc. 32-35. , 1978: Computer Worded Public Weather Forecasts. NOAA Tech . Memo . NWS TDL - 67., 24 pp. , and J. R. Bocchieri, 1972: Use of model output statistics for predicting ceiling height. Mon. Wea. Rev . , 100, 869-879. , and D. L. Jorgensen, 1970: Climatological aspects of the Brier P-score. Mon. Wea. Rev ., 98 , 136-141. , and D. A. Lowry, 1972: The use of model output statistics (M0S) in objective weather forecasting. J. Appl . Meteor . 11 , 1203-1211. , E. A. Zurndorfer, J. R. Bocchieri, G. M. Carter, D. J. Vercelli, D. B. Gilhousen, P. J. Dallavalle, and K. F. Hebenstreit, 1978: The role of statistical weather forecasts in the National Weather Service's operational system. Preprints Conf. on Weather Forecasting and Analysis and Aviation Meteorology , Silver Spring, Amer. Meteor. Soc, 382- 389. Gregg, G. T., 1969: On comparative rating of forecasters. ESSA Tech. Memo . WBTM SR 48. (PB 188 039). 1977: Probability forecasts of a temperature event. Nat . Wea. Digest , 1, No. 2, 33-34. Gringorten, I. I., 1958: On the comparison of one or more sets of probability forecasts. J. Meteor., 15, 283-287. 76 Gringorten, I. I., 1967: Verification to determine and measure forecasting skill. J. Appl . Meteor . , 6^ 742-747. Hallenbeck, C, 1920: Forecasting precipitation in percentages of probability. Mon. Wea. Rev . , 48, 645-647. Hashemi, F. , and W. Decker, 1969: Using climatic information and weather forecasts for decisions in economizing irrigation water. Agricul. Meteor ., 6, 245-257. Heffernan, M. H., and H. R. Glahn, 1979: Users guide for TDL's computer worded forecast program. TDL office Note 79-6, 44pp. Hughes, L. A., 1965: On the probability forecasting of the occurrence of precipitation. ESSA Tech. Note 20-CR-3. 36 pp. (Available in all NWS offices). , 1966: Precipitation probability forecast verification summary Nov. 1965-Mar. 1966. ESSA Tech Memo . WBTM CR 1 . 10 pp. , 1966b: Climatic frequency of precipitation at Central Region stations. ESSA Tech. Memo . WBTM CR 8. 51 pp. , 1967a: Public probability forecasts. ESSA Tech. Memo. WBTM CR-11. 22 pp, _, 1967b: Improving precipitation probability forecasts using the Central Region verification printout. ESSA Tech. Memo . WBTM CR-15. 14 pp. . 1967c: On the use and misuse of the Brier verification score, "ESSA Tech. Memo . WBTM CR-18. 14 pp. _, 1968a: Probability verification results (24 months). ESSA "Tech Memo . WBTM CR-19. 15 pp. _, 1968b: Seasonal aspects of probability forecasts. 1. summer. "ESSA Tech. Memo . WBTM CR-22. 8 pp. (PB 185 733). _, 1968c: Seasonal aspects of probability forecasts. 2. fall. "ESSA Tech. Memo . WBTM CR-23. 15 pp. (PB 185 734) , 1968d: Seasonal aspects of probability forecasts. 3. winter. "ESSA Tech. Memo . WBTM CR-26. 15 pp. (PB 185 735). , 1969a: Seasonal aspects of probability forecasts. 4. spring. "ESSA Tech. Memo . WBTM CR-27. 8 pp. (PB 185 736). _, 1969b: How to play the system and get a better product. NWS Central Region Technical Attachment B, December. 3 pp. 77 Hughes, L. A., 1977a: A program of chart analysis. NOAA Tech. Memo , NWS CR-63. 96 pp. , 1978a: Quantification of weather forecasts. Bull . Amer. Meteor. Soc , 59, 431-432. , 1978b: A pitch for probability. Bull. Amer. Meteor. Soc , 59, 1036-1037. , 1979: Precipitation probability forecasts - Problems seen via a comprehensive verification. Mon. Wea. Rev ., 107, 129-139. , and W. E. Sangster, 1970: A note on the categorical verification of probability forecasts. ESSA Tech. Memo . WBTM CR-35. , and W. E. Sangster, 1974: Thirty-six hour precipitation probabilities. Proc. 5th Conf. Wea. Fcstng. and Anal . St. Louis. Amer. Meteor. Soc, 29-31. , and W. E. Sangster, 1979a: Combining precipitation probabilities Mon. Wea. Rev ., 107 , 520-524. , and W. E. Sangster, 1979b: Precipitation probability - comparing offices for skill. Mon. Wea. Rev . 107 , 9^ in press. Jorgensen, D. L., 1949: An objective method of forecasting rain in central California during the raisin-drying season. Mon. Wea . Rev ., 77, 31-46. , 1953: Estimating precipitation at San Francisco from concurrent meteorological variables. Mon. Wea. Rev . , 81 , 101- 110. _, 1962: Note on the measurement of skill of probability fore- "casts. Mon. Wea. Rev ., 83, 139-142. _, 1967: Climatological probabilities of precipitation for the conterminous United States. ESSA Tech Report WB-5. , and W. H. Klein, 1970: Persistence of precipitation at 108 cities in the conterminous United States ESSA Tech. Memo . WBTM TDL 31 (PB 193 599). Knox, J. L., 1969: The use of numerical probability factors in public forecasts for the prediction of precipitation occurrence. Paper given at Second Annual Congress of Can. Meteor. Soc Kolb., L. L., and R. R. Rapp, 1962: The utility of weather forecasts to the raisin industry. J. Appl . Meteor . J_, 8-12. 78 Landsberg, H., 1940: Weather forecasting terms. Bull . Amer. Met . Soc , 21, 317-320. McDonald, J. E., 1959: "It rained everywhere but here" -- the thunderstorm encirclement illusion. Weatherwise 12 , 159-160. Malone, T. F. 1956: The role of probability in the communication of weather information. Abstract, Bull . Amer. Met. Soc . , 37 , 180. Muller, R. H. 1944: Verification of short range weather forecasts (a survey of the literature). Bull. Amer. Meteor. Soc , 25 , 18-27, 47-53, and 88-95. Munn, R. E., 1953: Group participation in forecasting a meteorological event. Bull. Amer. Meteor. Soc , 34, 468-469. Murphy, A. H., 1966: A note on the utility of probabilistic predictions and the probability score in the cost/loss ratio decision situation. J. Appl . Meteor . , 5, 534-537. , 1969: Measure of the ability of probabilistic predictions in cost/loss ratios is incomplete. J. Appl . Meteor . , 8, 863- 873. _, 1970: The ranked probability score and the probability score: a comparison. Mon. Wea. Rev ., 98 , 917-924. _, 1973: Hedging and skill scores for probability forecasts. J. Appl. Meteor ., J_2, 215-223. _, 1974: A sample skill score for probability forecasts. Mon . "Wea. Rev ., 102 , 48-85. _, 1967: Probability in meteorology: status, problems, and prospects. Proc 6th conf. on Wea. Fcstng and Anal ., Albany, NY. Amer. Meteor. Soc, 9-12. _, 1977a: The value of cl imatological , categorical and probabilistic forecasts in the cost/loss ratio situation. Mon . Wea. Rev ., 105 , 803-816. _, 1977b: On the misinterpretation of precipitation probability forecasts. Bull. Amer. Meteor. Soc , 58, 1297-1299. _, 1978a: Hedging and the mode of expression of weather forecasts. Bull. Amer. Meteor. Soc . , 59, 371-373. _, 1978b: Letter to the Editor. Nat. Wea. Digest , 3,No.l, pp. 36-37. 79 Murphy, A. H., 1978c: Quantification of Weather Forecasts. Bull Amer. Meteor. Soc . , 5£, 430-431 . and R. A. Allen, 1970: Probabilistic prediction in meteorology: a bibliography. ESSA Tech. Memo . WBTM TDL 35, 60 pp. (PB 194-415). _, and E. S. Epstein: 1967: A note on probability forecasting and "hedging". J. Appl. Meteor ., 6, 1002-1004. _, and J. C. Thompson, 1977: On the nature of the nonexistence of ordinal relationships between measures of the accuracy and value of probability forecasts: an example. J. Appl . Meteor. , 1_6, 1015-1021. _, and R. L. Winkler, 1971a: Forecasters and probability forecasts: the responses to a questionaire. Bull . Amer . Meteor. Soc , 52^, 158-165. _, and R. L. Winkler, 1971b: Forecasters and probability forecasts: some current problems. Bull. Amer. Meteor. Soc , 52, 239-247. _, and R. L. Winkler, 1974a: Credible interval temperature forecasts: some experimental results. Mon. Wea. Rev ., 102, 784-794. _, and R. L. Winkler, 1974b: Probability forecasts: a survey of National Weather Service forecasters. Bull. Amer. Meteor . Soc, 55, 1449-1453. and R. L. Winkler, 1977a: Experimental point and area precipitation probability forecasts for a forecast area with significant local effects. Atmosphere , 15 , 61-78. , and R. L. Winkler, 1977b: Probabilistic Tornado Forecasts: some experimental results. Proc. 10th Svr. Storms Conf ., Omaha, NE, Amer. Meteor. Soc, 403-409. Myers, J. N., 1974: Limitations of precipitation probability forecasts. Proc. 5th Conf. on Wea. Fcsting and Anal ., St. Louis, M0. Amer. Meteor. Soc, 36-39. National Weather Service, 1969: Operational Forecasts with Subsynoptic Advection Model (SAM)--no. 3, Tech. Proc Bui. no. 21, N0AA, U.S. Department of Commerce, 12 pp. 80 National Weather Service, 1971: Operational forecasts derived from Primitive Equations and Trajectory Model Output Statistics (PEAT MOS)— no. 1. Tech. Proc. Bui , no. 68, NOAA, U.S. Department of Commerce, 7 pp. , 1978: The use of Model Output Statistics for predicting ceiling, visibility, and cloud amount. Tech. Proc. Bui . no. 234, NOAA, U.S. Department of Commerce, 14 pp. Panofsky, H. and G. W. Brier, 1963: Some applications of statistics to Meteorology. Penn State Press, 206p. Peterson, C. R. , K. J. Snapper, and A. H. Murphy, 1972: Credible temperature interval forecasts. Bull. Amer. Meteor. Soc . , 53, 966-970. Pielke, R. A., 1977: An overview of recent work in weather forecasting and suggestions for future work. Bull . Amer. Meteor . Soc , 58, 506-519. , 1978: Response. Bull . Amer. Meteor. Soc. , 59, 433. Price, S., 1949: Thunderstorm today? Try a probability forecast. Weatherwise 2, 61-63 and 67. Ramage, C. S., 1978: Further outlook -- hazy. Bull . Amer. Meteor . Soc , 59, 18-21. Reap, R. M. , and D. S. Foster, 1977: Operational thunderstorm and severe local storm probability forecasts based on Model Output Statistics. Preprints Tenth Conf. on Severe Local Storms , Omaha, NE. Amer. Meteor. Soc, 376-381. Roberts, C. F., 1965: On the use of probability statements in weather forecasts. ESSA Wea. Bur. Tech. Note 8 FCST-1 , 15 pp. , J. M. Porter, and G. F. Cobb, 1967: Report on the forecast performance of selected Weather Bureau offices for 1966-67. ESSA Tech. Memo. WBTM FCST 9, 52 pp. Roberts, H. B., 1968: On the meaning of the probability of rain. Proc. 1st Statistical Meteor. Conf ., Hartford, CT. Amer. Meteor. Soc, 133-141. Rogell, R. H., 1972: Weather terminology and the general public. Weatherwise , 25, 126-132. , 1976: The weather test. Louisville Courier-Journal and Times, Dec 12, 1976. 81 Root, H. E., 1958: An experiment in probability forecasting. USWB San Francisco, manuscript. Abs. Bull. Amer. Meteor. Soc , 39, 180. ~ 1961: Probability statements in weather forecasting. Abs. Bull. Amer. Meteor. Soc , 42^, 290. , 1962: Probability statements in weather forecasting. J. Appl . Meteor . , 1_, 163-168. , 1965: The value of weather forecasts. ESSA Tech. Note 27-WR-3. 8 pp. Sadowski, A. F., and G. F. Cobb, 1974: National Weather Service, April 1972 to March 1973 public forecast verification summary. NOAA Tech. Memo . NWS FDST-21 . Sanders, F., 1963: On the subjective probability forecasting. J. Appl. Meteor ., 2, 191-201. , 1967: The verification of probability forecasts. J. Appl ♦ Meteor. , 6., 756-761 . _, 1973: Skill in forecasting daily temperature and precipitation: some experimental results. Bull. Amer. Meteor. Soc , 54 , 1171-1179, Sangster, W. E., 1970: A note on the comparability of the "S-score" and the reduction of variance. ESSA, Nat. Wea. Serv . Central Rgn. Tech. Attachment 70-74. Schroeder, M. J., 1954: Verification of "probability" fire-weather forecasts. Mon. Wea. Rev ., 82, 257-260. Schwerdt, R. W., 1970: The influence of weather on the economics of shipping. Mariners Wea. Log ., 14 , 194-195. Smith, D. L., 1977: An examination of probability of precipitation forecasts in light of rainfall areal coverage. Nat. Wea . Digest , 2, no. 2, 15-26. , and M. Smith, 1978: A comparison of probability of precipitation forecasts and radar estimates of rainfall areal coverage. NOAA Tech. Memo . NWS SR-96, 17 pp. Smith, M., 1977: Comments. Bull. Amer. Meteor. Soc , 58, 1090. , 1978: Response. Bull. Amer. Meteor. Soc , 69_, 432-433. Snellman, L. W., 1977: Operational forecasting using automated guidance. Bull. Amer. Meteor. Soc , 58, 1036-1044. 82 Stael von Holstein, C-A. S. and A. H. Murphy, 1978: The family of quadratic scoring rules. Mon. Wea. Rev ., 106 , 917-924. Stallard, G. , and Staff, 1965: An experiment in probability precipitation forecasting. ESSA Eastern Region Tech. Note No. 1, 15 pp. Thompson, J. C, 1946: Progress report on objective rainfall forecasting research program for the Los Angeles area. USWB Research Paper 25 , 35 pp. , 1950: A numerical method for forecasting rainfall in the Los Angeles area. Mon. Wea. Rev . , 78 , 113-124. , 1959: The snowfall probability factor. The American City , 74, 80-83. , 1962: Economic gains from scientific advances and operational improvements in meteorological prediction. J. Appl . Meteor . , 1, 13-17. , 1963: Weather decision making - the pay-off. Bull. Amer . Meteor. Soc, 44, 75-78. , 1966: A note on meteorological decision making. J. Appl . Meteor ., 5, 532-533. , and G. W. Brier, 1955: The economic utility of weather forecasts. Mon. Wea. Rev ., 83, 249-254. Thompson, P. D., 1977: How to improve accuracy by combining independent forecasts. Mon. Wea. Rev ., 105 , 228-229. Topil, A. B., 1963: Precipitation probability at Denver related to length of period. Mon. Wea. Rev ., 91_, 293-297. U. S. Army Signal Corps., 1919: The Signal Corps meteorological service, A. E. F. (Excerpts from Annual Report of Chief Signal Officer). Mon. Wea. Rev ., 4_7, 870-871. Vernon, E. M. , 1947: An objective method of forecasting precipitation 24-48 hours in advance at San Francisco, California. Mon. Wea . Rev ., 75, 211-219. , and M. E. Stoneback, 1952: A numerical approach to rainfall forecasting for San Francisco. Abs. Bull. Amer. Meteor. Soc , 33, 127. 83 Wassail, R. B., 1966: Verification of probability forecasts at Hartford, Connecticut for period 1963-1965. ESSA Tech, Memo. No. 7, 16 pp. Wasserman, S. E., 1972: Peatmos probability of precipitation forecasts as an aid in predicting precipitation amounts. NOAA Tech. Memo . NWS ER 50, 12 pp. _, and H. Rosenblum, 1972: Use of primitive-equation model output to forecast winter precipitation in the north coastal sections of the United States. J. Appl . Meteor . , 11 , 16-22. Williams, P. W., 1951: The use of confidence factors in forecasting. Bull. Amer. Meteor. Soc , 32, 279-281. Winkler, R. L. and A. H. Murphy, 1968: Evaluation of subjective precipitation probability forecasts. Proc. 1st Statistical Meteorological Conf . , Hartford, CT. Amer. Meteor. Soc, 148-157. Winkler, R. W., and A. H. Murphy, 1976: Point and areal precipitation probability forecasts; some experimental results. Mon. Wea . Rev ., 104 , 86-95. , A. H. Murphy, and R. W. Katz, 1977: The consensus of subjective probability forecasts; are two, three heads better than one? Proc. 5th Conf. Prob. and Stat ., Las Vegas, Amer. Meteor. Soc. , 57-62. Wray, W., 1977: Personal communication. 84 * U. S. GOVERNMENT PRINTING OFFICE : 1980 311-046/5 (Continued from inside front cover) WBTM FCST 15 Weather Bureau Forecast Verification Scores 1968-69 and Some Performance Trends From 1966. Robert G. Derouin and Geraldine F. Cobb, May 1970. (PB-192-949) NOAA Technical Memoranda NWS FCST 16 Weather Bureau April 1969-March 1970 Verification Report With Special Emphasis on Perform- ance Scores Within Echelons. Robert G. Derouin and Geraldine F. Cobb, April 1971. (COM- 71-00555) NWS FCST 17 National Weather Service May 1970-April 1971 Public Forecast Verification Summary. Robert G. Derouin and Geraldine F. Cobb, March 1972. (COM-72-10484) NWS FCST 18 Long-Term Verification Trends of Forecasts by the National Weather Service. Duane S. Cooley and Robert G. Derouin, May 1972. (C0M-72-11114) NWS FCST 19 National Weather Service May 1971-April 1972 Public Forecast Verification Summary. Alex- ander F. Sadowski and Geraldine T. Cobb, July 1973. (COM-73-11-55 7-AS) NWS FCST 20 National Weather Service Heavy Snow Forecast Verification 1962-1972. Alexander S. Sadow- ski and Geraldine F. Cobb, January 1974. (COM-74-10518) NWS FCST 21 National Weather Service April 1972 to March 1973 Public Forecast Verification Summary. Alexander F. Sadowski and Geraldine F. Cobb, June 1974. (COM-74-1 1467/AS) NWS FCST 22 Photochemical (Oxidant) Air Pollution Summary Information. Stephen W. Harned and Thomas Laufer, December 1977. (PB-283868/AS) NWS FCST 23 Low-Level Wind Shear: A Critical Review. Julius Badner, April 1979, 72 pp. (PB-300715) NOAA SCIENTIFIC AND TECHNICAL PUBLICATIONS > The National Oceanic and Atmospheric Administration was established as part of the Department of Commerce on October 3, 1970. The mission responsibilities of NOAA are to assess the socioeconomic impact of natural and technological changes in the environment and to monitor and predict the state of the solid Earth, the oceans and their living resources, the atmosphere, and the space environment of the Earth. The major components of NOAA regularly produce various types of scientific and technical informa- tion in the following kinds of publications: ' PROFESSIONAL PAPERS — Important definitive research results, major techniques, and special inves- tigations. CONTRACT AND GRANT REPORTS — Reports prepared by contractors or grantees under NOAA sponsorship. ATLAS — Presentation of analyzed data generally in the form of maps showing distribution of rainfall, chemical and physical conditions of oceans and at- mosphere, distribution of fishes and marine mam- mals, ionospheric conditions, etc. TECHNICAL SERVICE PUBLICATIONS — Re- ports containing data, observations, instructions, etc. A partial listing includes data serials; prediction and outlook periodicals; technical manuals, training pa- pers, planning reports, and information serials; and miscellaneous technical publications. TECHNICAL REPORTS — Journal quality with extensive details, mathematical developments, or data listings. TECHNICAL MEMORANDUMS — Reports of preliminary, partial, or negative research or technol- ogy results, interim instructions, and the like. Information on availability of NOAA publications can be obtained from: ENVIRONMENTAL SCIENCE INFORMATION CENTER (D822) ENVIRONMENTAL DATA AND INFORMATION SERVICE NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION U.S. DEPARTMENT OF COMMERCE 6009 Executive Boulevard Rockville, MD 20852 N0AA--S/T 79-11: