College and Research Libraries Research Notes 501 16. Not found. This is the catch-all category to which an item is consigned whenever it cannot be lo- cated and no reason is determined. Presumably some items have been lost or stolen, others are in transit or in use . After time has been allowed for an item to turn up a decision is made whether to attempt to replace iL The gifts and exchanges unit may be able to obtain a free copy, or it may have to be purchased. Using Time-Series Regression to Predict Academic Library Circulations Terrence A. Brooks Four methods were used to forecast monthly circulation totals in 15 midwestern academic li- braries. In a test of one-month forecasting accu- racy, the dummy regression method, a sophis- ticated forecasting method for cyclical data, exhibited the smallest average error. In a test of six-month forecasting accuracy the monthly mean method, a naive forecasting method for cyclical data, exhibited the smallest average er- ror. Straight-line predictive methods, both na- ive or sophisticated, had significantly greater error in both accuracy tests. A remaining re- search question is, Why do naive forecasting methods outperform more sophisticated fore- casting methods with monthly library circula- tion data in long-range forecasts? It is sug- gested that high levels of randomness in library-output statistics inhibit the perfor- mance of sophisticated forecasting methods. INTRODUCTION A time series is a chronological sequence of observations on a variable. 1 An example from the field of librarianship of such a variable is circulation check-outs. Library circulation counts are commonly com- piled on a daily basis and aggregated into monthly or semester reports. A series of these monthly counts is a chronological sequence of observations on the variable library circulation. Consequently, it is fair to conclude that the most typical type of statistical data libraries produce is time- series data. Library literature, however, reveals little awareness of the ways that time-series data can be used. for forecast- ing and planning. Time-series regression techniques are regression procedures used to predict fu- ture values of a time series. They are unique only in that they use past values of a time series to predict future values of the same time series. This paper reports the application of two types of time-series re- gression to the problem of forecasting aca- demic library circulation. FORECASTING ''In library planning and decision- making, predictions are invariably re- quired. ''2 Despite Hamburg's statement, there has not been much theoretical work or practical application of forecasting methodologies to library statistics. This is in sharp contrast to the acceptance of fore- casting in other disciplines. Forecasting, or trend analysis, is considered an integral part of scientific management and rational decision making. Makridakis and Wheel- wrighe describe forecasting as a tool that permits management to shield an organi- zation from the vagaries of chance events and become more methodical in dealing with its environment. Like bureaucracies everywhere, academic libraries need tools that will enhance planning and rational decision making. Filley and House4 would characterize most academic libraries today as third- Terrence A. Brooks is assistant professor at the School of Library and Information Sci; nce, University of Iowa, Iowa City, Iowa 52242. 502 College & Research Libraries stage growth organizations. Large and complex, these organizations have devel- oped beyond the early rapid growth stages identified by Filley and House and now have become institutionalized with a corps of bureaucrats who plan, organize, direct, and control. Many academic librar- ians are similarly charged with the tasks of planning, organizing, directing, and con- trolling library operations. One tool to help accomplish these managerial tasks is forecasting. There are two forecasting studies in li- brary literature worthy of note. The first is by Drake, 5 who considered linear regres- sion as a predictive technique. She con- cluded that straight-line trend projections are not the most efficient predictors in all library situations. The reason is that li- brary data, especially circulation data, show monthly or seasonal fluctuations. Cyclicity may be one of the reasons that forecasting techniques have had a re- tarded application to library statistics. Cy- clicity in library-output statistics means that a variable such as monthly circulation fluctuates up and down throughout the academic year. Such cyclical data demand forecasting techniques that can model their seasonality. The most sophisticated forecasting study in library literature to date is by Kang. 6 He forecasted the requests for in- terlibrary loan services received by the illi- nois Research and Reference Centers from 1971 through 1978 using several methods, including methods that can model cyclical data, and found regression to be the best predictive technique. He used a weighted regression formula that gave less predic- tive value to older observations, and greater weight to the most recent ones. The generalizability of Kang' s study is se- verely limited, however, due to the fact that data from only one library was used. METHODOLOGY The purpose of this paper is to evaluate time-series regression forecasting meth- ods with academic monthly library circu- lation totals. Time-series regression is a methodology that is new to library and in- formation science, but has been used ex- tensively in the social sciences, business, November 1984 and economic literatures. Makridakis and Wheelwrighe give two versions of time-series regreE?sion ap- proaches. The first time-series regression approach uses independent variables that are past values of the time series itself. 8 An example of such an approach would be us- ing the monthly circulation totals of sev- eral months past as the predictor of next month's circulation total. This simply means that a library's circulation time se- ries is regressed on itself at a certain time lag. There are two caveats with this tech- nique. First, it produces a straight predic- tion line and thus should suffer the same problem of poor fit that was noted by Drake. Second, it is a new application, meaning that the choice of time lag has not been studied sufficiently with academic li- brary circulation data. Hence, the choice of any particular time lag is completely ar- bitrary. The second time-series regression method uses qualitative or dummy vari- ables. 9 In the context of multiple regres- sion, a dummy variable is a special inde- pendent variable that can take only a lim- ited number of values such as 1 or 0. To use dummy regression for forecasting, some monthly totals of the time series are tagged by a 1, while other months of the year are given Os. The result is a multiple regression equation that can model the seasonal patterns of library circulation to- tals and should perform as a more efficient predictor than straight-line methods. To provide benchmarks for perfor- mance comparisons two averaging meth- ods were also used as forecasting meth- ods. These averaging methods were used because they represent the most direct and naive approach that any academic li- brarian could use for forecasting. For in- stance, a future circulation total could sim- ply be forecast from the average of all past values of the time series. Alternatively, a particular future monthly total could be forecast from an average of past values of that particular month. In all, four forecastin~ methodologies were used with Minitab, a statistical soft- ware program, and circulation data from several libraries: 1. Dummy time-series regression was used to find an equation to predict one month and six months in advance for each library. This is a sophisticated forecasting method that can model cyclical data. 2. Lagged time-series regression was used with each library's data lagged one month and lagged six months. The deci- sion to use a one-month time lag and a six- month time lag was arbitrary. This is a so- phisticated forecasting method that makes straight-line predictions. It cannot model cyclical data. 3. A simple average was made of each library's circulation totals to provide a straight-line benchmark for comparison purposes. This is a naive straight-line fore- casting technique. 4. A monthly average was computed for each library for one month and six months in advance. This provided a sea- sonal benchmark for comparison pur- poses. For instance, if January and June represent the forecasts for one and six months, then data from previous Januarys would be averaged to give a forecast for the month ofJanuary. Similarly, previous Junes would be averaged to give the June forecast. This is a naive forecasting method that can model cyclical data. DATA Research Notes 503 four million book titles down to a mini- mum of two hundred thousand book ti- tles. Ten libraries contributed time series of 60 months' duration, three libraries con- tributed time series of 72 months' dura- tion, one contributed 66 months, and one contributed a time series of 53 months. The most recent six months' data for each library were set aside to provide a basis for evaluating the performance of each of the four forecasting methods. Forecasts were made with each method for each of the fif- teen libraries for one month and six months in advance. Each forecasted monthly total was then compared to the actual total reported by the library and an absolute percentage error (APE) was cal- culated. The average of the APE values for each forecasting method (the mean abso- lute percentage error) was then found. An accurate forecasting method would, relative to other methods, have a small mean absolute percentage error (MAPE) across the sample of the fifteen academic libraries. An analysis of variance was per- formed comparing the MAPEs to see if there was a statistically significant differ- ence among the four forecasting methods. RESULTS A random sample of fifteen academic li- Table 1 shows the results of forecasting braries in the Midwest submitted monthly one month in advance. The dummy re- circulation data for analysis. The states of gression method had the smallest MAPE Illinois, Ohio, Michigan and Missouri followed by the monthly mean method. were each represented by three academic These methods are capable of modeling libraries, Iowa was represented by two ac- the seasonal patterns of academic library ademic libraries, and Minnesota by one circulations. The two straight-line predic- academic library. The holdings of these fif- tion methods followed with the largest teen libraries ranged from a maximum of MAPEs. TABLE 1 ANALYSIS OF VARIANCE OF MEAN ABSOLUTE PERCENTAGE ERROR FOR ONE-MONTH FORECASTS Methods Dummy regression Monthly mean Lag 1 Regression Simple mean Analysis of Variance Source df SS MS F Factor 3 3288 1096 3.74 Error 56 16391 293 Total 59 19679 15 15 15 15 (p= .0160) MAPE (%) 12.22 15.52 26.26 30.19 so (%) 11.08 12.65 15.45 25.48 504 College & Research Libraries An analysis of variance (ANOV A) test on the difference among the MAPEs of the four methods proved to be statistically sig- nificant (p=0.0160). Since the null hy- pothesis of no difference among the popu- lation MAPEs was rejected, a multiple comparison of the _sample means was indi- cated. The Neuman-Keuls procedure, as outlined by Meyer11 was used. A signifi- cant difference (p < 0.05) was found be- tween the MAPEs of the dummy regres- sion and simple mean methods. There was insufficient evidence that any other pair of means differed significantly. Table 2 shows the results of forecasting six months in advance. Dummy regres- sion and the monthly mean methods, the two techniques that can model the sea- sonal patterns of academic library circula- tions performed better than the straight- line methods. But the relative positions of each technique have changed: the averag- ing methods now outperformed regres- sion methods in both the cyclical and straight-line cases. An ANOV A test on the difference among these MAPEs proved to be statisti- cally significant (p=0.0166). The Neuman-Keuls procedure showed that the monthly mean method had a signifi- cantly (p