College and Research Libraries GEORGE PITERNICK ARL Statistics- Handle With Care The 1975-76 issue of ARL . Statistics presents conclusions concerning academic library collection growth. The procedures of drawing these inferences exhibit conceptual and computational deficiencies which impair their validity and usefulness. JAMES T. GEROULD BEGAN collecting and issuing statistics of college and uni- versity libraries in 1920 when he was li- brarian of the University of Minnesota, and continued this activity after moving to Princeton University .1 The "Princeton Statistics," as they came to be known, were issued by that universi- ty until the Association ~f Research Li- braries ( ARL) undertook their compila- tion and distribution in 1961; they have appeared yearly as "ARL Statistics" since that date. During this long period there have been many changes and in- creases in the number of libraries de- scribed, in the data included, and in the format of presentation. The first com- pilations issued were of one or two mimeographed pages-the last ( 1975·- 76) a forty-six-page booklet. 2 Data on twenty-seven colleges and universities were contained in the first issue for 1919-1920; ninety-four academic librar- ies and eleven non-academic research libraries are covered in the 1975-76 edi- tion. Six categories of data were report- ed for 1919-20; twenty-one categories appear in the 1975-76 compilation. Throughout the years the statistics have been based upon voluntary submis- George Piternick is professor, School of Librarianship, University of British Colum- bia, Vancouver. sions, by the individual libraries, of answers to specific questions in the light of rules and definitions, tight or loose, provided by the compilers. There has been little or no policing of these sub- missions to insure compliance with the rules and conformity with the defini- tions and instructions; it is easy to un- derstand why. Libraries generally have felt free to make their own interpreta- tions and change them when they have felt it advisable to do so. It is evident that all libraries do not interpret the rules in the same way and that individ- ual libraries have made major changes in their own methods of gathering and reporting over the years, resulting in otherwise inexplicable discontinuities in their data. The categories themselves are not always mutually exclusive or con- stant throughout a time series. Suffice it to say that the overall relia- bility of the statistics is not impressive, and, more lamentably, their deficiencies are unnecessarily dramatized by the spurious precision with which they are expressed. To show a given library's volume holdings to seven significant fig- ures is, under the circumstances, ludi- crous. Lastly, while the statistics, as pub- lished, do include a large number of notes, qualifications, explanations, and exceptions to the data presented in nu- merical form, these tend to be complete- I 419 420 I College & Research Libraries • September 1977 ly disregarded in subsequent handling of the data :in drawing interinstitution- al comparisons and inferential conclu- sions. It is not the purpose here to document or belabor these deficiencies; Oboler has listed a large number of them~ 3 Although his remarks refer spe- cifically to statistics at one time pub- lished by the U.S. Office of Education, 4 his criticisms are, for the most part, ap- plicable to ARL statistics also. ARL Statistics are used in the produc- tion of other statistical compilations5 and widely used by librarians in making interinstitutional comparisons, largely in budget arguments. The deficiencies of ARL Statistics are generally, if dim- ly, realized by most librarians; the user tends to rely upon them as the drunk is said to rely on the streetlamp-for sup- port rather than for illumination. The safety with which inferences may be drawn from these figures is greatest when single libraries are studied over a short period of time; much smaller when different libraries are compared or when the characteristics of the entire population are summarized, over lengthy time periods, or predictions about them made. INTERPRETATION OF STATISTICS It is in this latter area-that of in- ferring trends and changes in academic library behavior by use of the statistics -that ARL has expanded its activities recently. Median values calculated for the various categories of statistical in- formation appeared in the 1962-63 sta- tistics and have appeared yearly since then. Rank order tables for many of the categories appeared first in the 1961- 62 statistics, disappeared for a short period, and reappeared to stay in the 1965-66 compilation, presumably to aid librarians in preparing budget argu- ments. During these years special tables and analyses have been included from time to time. The 1975-76 statistics indude a new table ( p.16--17) in which percentage changes in median values over the last · eight years are displayed; in the intro- duction a series of statements are made, describing trends in academic libraries, based for the most part on these .. changes in median values. These state- ments cannot be accepted as they stand -they are misleading at best, erroneous at worst. The calculations on which the statements are based exhibit errors both conceptual and computational. The analysis to follow is based upon a single statement in the introduction to the 1975-76 statistics; the methodolog- ical criticisms made will apply, in some measure, to all those statements in the introduction which summarize statistical findings. This particular statement has been selected for analysis for several reasons: it has been identified as an espe- cially important finding by the compil- ers, it is the statement which seems most likely to be widely cited and quoted by virtue of its high emotive content in these days of academic li- brary austerity, and it treats of a cate- gory of data whose importance as a gauge of library health is indisputable. Inflation affected all library operations, but monograph purchases clearly suf- fered the most. The median number of monographs added last year was 10.3% less than the previous year, and was the lowest in the eight years re- ported. The median number of vol- umes added in 1976 was 64,800, a dramatic contrast to the 89,800 added in 1969.~ The category "number of mono- graphs added" is not used in ARL Sta- tistics. The categories related most closely are "Volumes Added, Gross" and "Volumes Added, Net," and both of these clearly include not only mono- graphs but also bound volumes of seri- als, a significant component of the ag- gregate figure. Furthermore, the figure of -10.3 percent refers to the change in median values for "Volumes Added, Net" between 1974-75 and 1975-76. Here we have the first instance of drawing inferences from data which cannot possibly support them. "Volumes Added, Gross" represents the number of volumes acquired or cataloged dur- ing a given period and hence .reflects, more or less closely, the financial situa- tion during that period. But "Volumes Added, Net" is this gross value reduced by the number of volumes withdrawn as lost, missing, mutilated, donated, etc., during that period. Assiduity in opera- tions which result in withdrawals hardly can be considered an inevitable correlate or result of financial stringency. "Vol- umes Added, Gross" is obviously the pertinent datum. Comparing median values for "Vol- umes Added, Gross," as was done for "Volumes Added, Net," results in a min- uscule increase of 0.16 percent com- pared to the "dramatic" drop of 10.3 percent. But, however reasonable the comparison of medians, it is not reason- able to compare medians in a time series when the composition of the distribu- tion has changed. The median for 197 4- 75 is the median value of a distribution of eighty-eight libraries. During 1975- 76 six new libraries were added to ARL. There are no longer any Harvards or Yales to add; new members of ARL are~ for the most part, smaller libraries just recently, and barely, grown to research library dimensions. Hence their inclu- sion in computing the median value for 1975-76 automatically depresses this datum. If the comparison is made be- tween median values for "Volumes Added, Gross" for the eighty-eight insti- tutions for which data are available for both years, deleting the values for the six added libraries, the median value for "Volumes Added, Gross" is 80,479, not 78,085, a percentage rise from the previous year of 3.2 percent. Modest- but a rise, not a decline. But is the median a very good basis ARL Statistics I 421 for comparison in situations of this kind? It must be emphasized that the median is a measure of central tendency which is not concerned primarily with the absolute magnitude of the variable but only with its relative magnitude. The median value in any distribution is simply that value above which half of the members of the distribution find themselves and below which the other half are located. Its absolute size, except to fix its ranking, is of no moment; hence to compare absolute median val- ues for the same distribution from year to year is deceptive and pointless. Changes in the median, moreover, tell next to nothing about the changes in the other values of the distribution. The usefulness, therefore, of adducing median values, even · when properly done, is quite· limited and the practice of comparing them hazardous. Are there better analytic procedures available? If the effects of recent finan- cial "cut-backs" in academic library ac- quisitions are being investigated, it seems appropriate to measure gross changes for ARL libraries as a whole. If the "Volumes Added, Gross" totals for the years 1974-75 and 1975-76 (help- fully provided by ARL) are adjusted to represent the same libraries by de- leting values for the six libraries added in 1975-76, it is seen that a slight in- crease has occurred. In the aggregate, the eighty-eight ARL academic libraries show gross additions of 7,875,033 vol- umes in 1975-76 .. These same libraries added 7,753,746 volumes in 1974-75. The increase of 1975-76 over 1974-75 is 1.6 percent. The same result is obtained, of course, by using the arithmetic mean in- stead of the totals, in this case 89,489 ( 1975-76) and 88,111 ( 1974-75). The use of the arithmetic mean-the sum of n values divided by n-as an in- dication of the central value of a dis- tribution can be questioned here. Conditioned as we are to the use of the arithmetic mean as a measure of central 422 I College & Research Libraries • September 1977 tendency in any distribution, we fre- quently forget that its use is proper only when the distribution is itself "normal," that is, when it resembles the familiar bell curve, with values more or less symmetrically distributed and me- dian and arithmetic mean very close to- gether or coincident. But the distribu- tion of values for "Volumes Added, Gross" is not normal, but "lognormal." This type of distribution is highly skewed; it has the interesting property that the logarithms of the values, not the values themselves, are distributed normally; hence its name. 7 The geo- metric mean, defined as the nth root of the product of n values, is the appropri- ate measure of central tendency in log- normal distributions, and in this case th~ geometric mean for 1975-76 is 79,322 volumes, up 2.4 percent from the corresponding value of 77,473 volumes for 197 4-75. Still another measure suggests itself. This simple, but far from contemptible, device, familiar to all who listen to stock-market reports on the radio, is that of comparing "advances" and "de- clines." On this basis, forty-seven aca- demic ARL libraries added more volumes (gross) in 1975-76 than they did in 1974-75, and forty-one added fewer. CONCLUSIONS It thus appears that ARL academic libraries did not add fewer volumes in 1975-76 than they did in 1974-75, how- ever one computes it. Indeed, all calcula- tions indicate a slight increase in volumes added but an increase not so large as to be obviously significant, given the possible errors in the individual li- brary values. Statistical inference always involves risk; it is essential, therefore, that any inferences be made with much care and some humility. It is clear that ARL' s dramatic statement on additions to ARL library collections, quoted earlier, is not supported by the data. It is contended here that a statement such as: Additions to ARL library collections remained at a generally static level in 1975-76, with forty-seven libraries adding more volumes and forty-one li- braries adding fewer volumes than they did in the previous year. conveys not only more information but more accurate information. This is not, of course, to say that 1975-76 was a good year in terms of collection growth. But it was not the catastrophic year, at least for collection growth, that the ARL statement reports. It probably is not reasonable to ex- pect any spectacular improvement in the quality of ARL statistics themselves, given the limited power ARL has to en- force any rules and guidelines it pro- mulgates, the major internal procedural changes individual libraries might have to make in order to conform to them, and the basic fact that any such rules would of necessity call for a fair mea- sure of variable interpretation. It is reasonable, however, to expect that ARL publish its statistics in such a form as to be consonant with their in- trinsic accuracy, avoiding the semblance of great precision where little, in fact, exists. And it is reasonable to expect ARL to hold back from the issuance of statistical analyses of its data and con- clusions drawn therefrom unless it is willing to make a serious attempt to de- velop an adequate analytical machinery. Compilation and publication of sim-· pie and uncritical rank-orders, ratios of medians, etc., provide little beyond in- creased opportunity for oversimplifica- tion and error. If ARL wants to improve the quality and usefulness of its statistical publication, it must inves- tigate other methods of gathering, ana- lyzing, and publishing its data. That these data are in machine-readable form should facilitate such experimentation. ARL Statist-ics I 423 REFERENCES 1. College and University Library Statistics 1919120 to 1943/44, Co~piled from Figures Supplied by the Participating Libraries (Princeton, N.J.: Princeton Univ. Library, 1947 ). 2. ARL Statistics, 1975176 (Washington, D.C.: Assn. of Research Libraries, 1976). 3. Eli Oboler, "The Accuracy of Federal Li- brary Statistics," College & Research Li- braries 25:494-96 (Nov. 1964). 4. U.S. Office of Education, Library Statistics of Colleges and Universities, 1962-1963. Institutional Data (Washington, D.C.: 1964 ). 5. For instance, Oliver Dunn and others, The Past and Likely Future of 58 Research Libra- ries, 1951-1980. 1st-9th issues (West Lafay- ette, Ind.: Instructional Media Research Unit, Purdue Univ., 1965-73) and its successor, Miriam A. Drake, Academic Research Li- braries: A Study of Growth (West Lafay- ette, Ind.: Libraries and Audio-Visual Cen- ter, Purdue Univ., 1977). 6. ARL Statistics, 1975!76, p .2. 7. A good discussion of the lognormal distribu- tion and its prevalence is found in Allan D. Pratt, "The Analysis of Library Statistics," Library Quarterly 45: 27 5-85 (July 1975).