College and Research Libraries M. CARL DROTT Random Sampling: a Tool for Library Research Questions about the accuracy of library records, the behavior or at- titudes of patrons, or the conditions of the books in the collection can often be answered by a random sampling study. Use of this time and money saving technique requires no special mathematical ability or statistical background. The concept of accuracy is discussed and a table is provided to simplify the determination of an appropriate sam- ple size. A method of selecting a sample using random numbers is shown. Three examples illustrate the application of the technique to library problems. LmRARIANS ARE continually call~d up- on to make decisions based on imper- fect data. How many books in the col- lection are in need of repair? What per- centage of our patrons use the card catalog? Which categories of books can be stored without greatly inconvenienc- ing our patrons? The cost of keeping ac- curate records to answer this type of question is great. On the other hand the librarian should have something better than an informed guess. One way of providing this information is to study- a small sample of the collection or user population, and to draw conclusions based on this sample. Sampling is, of course, a compromise measure. If unlimited amounts of time and money were available there would be no need for using methods of ap- proximation. But if one is faced with both a need for information and limited resources, it is an important manage- ment tool. In order to make the most effective use of this tool statisticians Mr. Drott is with the Community Systems Foundation, Ann Arbor, Michigan. have developed scientific sampling methods. These methods, based on ) mathematical precepts, assure maximum usefulness and validity of sampling data. AccuRACY AND SAMPLE SIZE One of the first things to decide upon is the size of a sample. Intuitively one recognizes that larger samples give rise to more accurate data. To quantify this .accuracy he must distinguish between two types of errors possible in a sam- pling study. The first type 9£ errQr is one which might be called _t_glerance_. Most com- monly, results are reported as percent- ages, for example, "25 percent of the books in our collection have circulated in the last two years." Because they are based upon a sample these figures are not exact. Thus we say something like, "Between 23 and 27 percent of our col- lection has circulated in the last two years." This tolerance is commonly writ- ten 25% ± 2 and is read "twenty-five per- ) cent plus or minus two." This tolerance is a measure of the accuracy of our- re- sult. I 119 1201 College & Research Libraries • March 1969 The second type of error measure is called confidence. It is a measure of how certain one is that the true answer lies within the limits stated in his toler- ance. For instance a confidence of 90 per cent means that there is one chance in ten ( 10 per cent) that the true value of the number he is predicting lies out- side of the tolerance he has set. A state- ment that 10% ± 4 of the patrons enter- ing the library go directly to the card catalog reported with a 95 per cent con- fidence means that there is only one chance in twenty ( 5 per cent) that the actual percentage of patrons going di- rectly to the catalog is either greater than 14 per cent or less than 6 per . cent. Confidence can also be interpreted in terms of the results expected if a sam- pling study were repeated. Thus a 90 per cent confidence means that if a sam- pling study were repeated ten times (using the same sample size and toler- ance but each time using a different sample) the results would be correct within the specified tolerance for nine of the replications. Once a tolerance and a confidence have been decided upon one can use Table 1 to find the appropriate sample size. There are several important points in the use of the sample size table. First, the sample size needed for a given toler- ance and confidence is dependent on the relative percentages which are ob- served. The correction can be made by applying a very simple formula. First estimate what per cent of the sample will be in the most important category you are dealing with. For example, if we were sampling the number of pa- trons who go directly to the card cata- log, we might estimate on the basis of preliminary observation that the num- ber is no greater than 20 per cent. We write this as decimal fraction .20. Next we subtract the fraction from one ( 1.00) and multiply our two fractions together. Thus: 1.00 - .20 = .80 .20 X .80 = .16 This result multiplied by four ( 4) gives our correction factor. .16 X 4 = .64 This factor is to be multiplied by the sample size in Table 1 to give a revised sample size. If in the example above we had decided on a confidence of 95 per cent and a tolerance of 2 per cent Table 1 gives a sample size of 2,401. Multiply- ing by our correction factor gives a re- vised sample size of 1, 737. If we cannot predict our sample percentage in ad- vance we can use the sample sizes di- rectly from the table since these repre- sent the most conservative size esti- mates. Two further points should be o b- served in using the sample size table. The first is that the approach used in preparing the table is valid only for sample sizes which are greater than thirty but less than 10 per cent of the total population. A second point is that the sample size must be calculated be- fore performing the survey. The table is not appropriate for calculating the con- fidence and tolerance of a sample al- ready collected. SELECTING THE SAMPLE The entire validity of this sampling technique is based upon the use of an unbiased sample. That is to say the sample must be as representative as possible of the entire population. To this end one should use the mathemati- cal concept of a random samp e Ran- domness in this sense means that for each selection (collecting one datum) every member of the population has an equal chance of being drawn. For ex- ample if we wanted to select several cards randomJy from a new (ordered) deck of playing cards we might throw . \ Random Sampling: a Tool for Library Research I 121 the cards into · a hat, stir them around, and draw the sample with our eyes closed. On the· other hand suppose we were to close our eyes, remove a small stack of cards from the top of the deck, draw the next card for our first datum, remove another small stack, draw our next datum, and so on. This would not be a random sample since by removing the stack of cards we gave them no chance of being selected in the follow- ing draws. This kind of non-randomness most often .appears in so-called fraction- al sampling. An example of this would be sampling a card catalog by taking every twenty-fifth card (from a random starting point). The effects of this type of violation of mathematical random- ness are often difficult to determine. Sometimes the results of a study may be invalidated, other times the violation may have no effect. The critical factor is the order in which the population is arranged. Let us consider two examples from public opinion surveys. In the first sit- uation respondents were selected by calling at every twenty-fifth house. The interviewers proceeded from block to block in an orderly manner. As they went around each block they stopped at every twenty-fifth dwelling. When the results were compared to known data they were found to be unbiased. The order in which peoples' houses are ar- ranged in a neighborhood seems to be independent of their opinions. Thus the fractional technique did not introduce any bias. In another survey respondents were selected by contacting every twen- ty-fifth person in the telephone book. The results of this survey were found to be incorrect. It was later recognized that the relationship between names and ethnic groups introduced opinion bias when the names were in an alphabetical list. To summarize, sampling techniques which are non-random can produce seri- ous and often undetectable errors. Tech- niques do exist for using certain types of structured samples but these designs require careful statistical analysis and should only be employed after careful consideration. We have stressed the importance of choosing a random sample, but how does one assure randomness? One meth- od makes use of random number tables. Many books on sampling or statistics in- clude tables of random numbers (see bibliography). For example in a random number table we may find a column of numbers like this: 174393 533251 081831 987384 381849 To use these numbers in sampling we must develop rules for each sampling situation. Suppose we wish to draw a sample from a sheHlist which consists of 9 drawers, each drawer having no more than 1,600 cards (about 16 inches). First we select a drawer. For this we can use a very simple rule. Namely, let the first digit of the random number equal the drawer number. We will delete any numbers which begin with zero, since there is no drawer zero. Next, to select a card within a drawer we could use the next four digits of each random number and count that number of cards into each drawer. This, however, would make the data collection extremely te- dious. We may decide that measuring a distance into each drawer would be sufficiently unbiased for our purposes. Thus we will wish to choose a number of inches between zero and fifteen and a number of sixteenths of an inch be- tween zero and fifteen. In combination this will allow us to have measurements of from zero to .almost sixteen inches. First let us devise a rule for converting 122 I College & Research Libraries • March 1969 the second and third digits of our ran- dom number to a number of inches be- tween zero and fifteen. The two random digits form one hundred combinations from 00 to 99. Since we want sixteen numbers (counting zero and fifteen) each group will have six numbers per group. This is because sixteen goes into one hundred a little over six times. Our rule will be: If the random digits are: Convert them to inches: 00 to 05 0 06 to 11 1 12 to 17 2 18 to 23 3 24 to 29 4 30 to 35 5 36 to 41 6 42 to 47 7 48 to 53 8 54 to 59 9 60 to 65 10 66 ~ 71 11 72 to 77 12 78 to 83 13 84 to 89 14 90 to 95 15 96 to 99 D elete To get a number of sixteenths of an inch we want to convert the fourth and 'fifth random digits to numbers be- tween zero and fifteen. We can use ex- actly the same sixteen division rule de- veloped above. Thus to convert a ran- dom number to a card location we con- vert the first random digit to a drawer number, the second and third digits to a number of inches, .and the fourth and fifth to sixteenths of an inch. For ex- ample if our random number is 17 439 we would draw a card from drawer number one at a distance of 12%6 inches from the front. Note that we have not permitted a sixteen since this could have given us 1%s. This illustrates a very important point. Suppose for example we had al- lowed 81%6 to equal nine ( 9). Then there would be two ways to get a nine ( 9%s or 81%6 ) but only one way to get a number like 9%6 . This means that whole numbers (like 9) would be more likely to occur than fractional numbers (like 93116). But our definition of ran- domness required that all numbers be equally likely. This is the reason that sixteen has been excluded. Now let us consider some applications of this sampling technique to specific problems. These examples involve the three most commonly sampled items in the library; card files, patrons, and the collection itself. Example 1 A large research library is concerned about recent discoveries of inaccuracies in their holding records. An inventory would be extremely . expensive and would consume a great deal of profes- sional time ; thus the librarian wishes to conduct a sample study to determine if an inventory is actually necessary. In order to set" a tolerance and con- fidence the librarian considers what he will do with the results of his study. The librarian has decided that if less than 2 or 3 per cent of the collection is miss- ing he will take no action. If more than 6 per cent of the collection is missing he is certain that he will conduct an inven- tory. He is not sure what action he will take if the percentage missing is be- tween 3 and 6 per cent. We can see that the tolerance must be less than 3 per cent; in making a decision it will be important to distinguish between 3 and 6 per cent. The librarian believes that a tolerance of 1 per cent will make this information most useful to him. He is not certain exactly what confidence he de- sires but because of the costs involved in b~ing wrong (e.g. , performing an un- necessary inventory) he has tentatively set a confidence of 99 per cent. The table indicates a sample size of 16,590 for these values. The librarian is certain that no more than 10 per cent of the col- lection is missing. Therefore we can cal- culate a correction factor for the sample size as follows: .1 0 X .90 X 4 = .36 Random Sampling: a Tool for Library Research I 123 We multiply our sample size by this correction factor to get a revised sample size of 5,973. Because of the importance of this measurement the librarian is will- ing to take a sample of the required size. Thus readjustment of the tolerance and confidence is not necessary. The sample will be drawn from the shelflist. Since this source is biased against serials .and other series, only monographic entries will be considered. The shelflist consists of 1,200 drawers (all numbered consecutively) each con- taining up to 14 inches of cards. The sample will be drawn by measuring the cards. The random number table used by the library has the digits arranged in columns thus: 47 38 18 55 97 68 14 11 60 83 96 42 30 53 71 90 64 98 30 30 Drawer numbers must be numbers from 0000 to 1199. To make drawer numbers the first two random digits must be con- verted to numbers between 00 and 11, while the next two digits must become numbers between 00 and 99. To convert the first two digits the following rule is developed: If random digits are: Convert them to: 00 to 07 0 08 to 15 1 16 w 23 2 24 to 31 3 32 to 39 4 40 to 47 5 48 to 55 6 56 to 63 7 64 w 71 8 72 w 79 9 oo w ~ ro 88 to 95 11 96 to 99 Delete Each group of random numbers in- cludes eight numbers because twelve (the number of numbers in the range 0 to 11) goes into 100 (number of num- bers in the range 00 to 99) eight times plus .a fraction. The second part of each drawer number can come directly from the random list. To pick a number of inches between 0 and 13 a similar rule is used except this time each division contains seven numbers thus the rule will be: 00 to 06 07 to 13 91 to 97 98 to 99 0 1 13 Delete To get sixteenths of an inch we use a rule with six numbers per division: 00 to 05 06 to 11 90 to 95 96 to 99 0 1 --r5 Delete This is the same rule developed earlier. We can now use all of our rules to pick a sample. For example, using the random numbers we would have: Random Number Drawer Inches 16's 47 68 96 90 568 13 15/ 16 38 14 42 64 414 6 10/ 16 18 11 30 98 211 4 Delete 55 60 53 30 660 7 5 / 16 97 83 71 30 Delete To make the actual task of taking the sample easier the selection can be or- dered by drawer number before col- lecting the data. If in collecting the data we should find too few cards in a drawer to take the required measure- ment the data point should be deleted. This is important in order to preserve randomness. Exmnple 2 A library is taking a survey of user's opinions about library services. The data will be collected by handing out ques- tionnaires to a random sample · of pa- trons as they enter the library. This survey will be only one of many things which the librarian will use in deciding on changes in user service. Thus a tol- erance of 5 per cent and a confidence of 90 per cent seem adequate. The li- brarian has no idea what percentages of the users will hold various opinions thus the sample size of 271 will be used di- rectly from the table. The librarian has decided that the survey should cover a 124 I College & Research Libraries • March 1969 period of two weeks to assure a repre- sentative sample of users. The sample will be drawn by converting random numbers to times. The library is open Monday through Friday from 9:00 A.M. to 9:00 P.M. and Saturdays from 9:00 A.M. to 6:00P.M. We will need rules to convert random numbers to twelve days, twelve hours, and sixty minutes. Our rule for converting to days will use the first two random digits. If random digits are: 00 to 07 08 to 15 16 to 23 88 to 95 95 to 99 Convert them to: 1 2 3 12 Delete The same rule can be used on the next two random digits to give us time. In this case one will be equivalent to 9:00 A.M., two to 10:00 A.M. and so on, with twelve being 8:00 P.M. Next we need to convert to minutes. A rule with sixty steps would be tedious to construct and to use. We may decide that no bias would be introduced by using time in five-minute intervals. Since there are twelve five-minute intervals in an hour, we need a rule with twelve divisions. We can use the same rule that we de- veloped above. We can convert by set- ting one equal to five minutes after the hour, two equal to ten minutes after, and so on with twelve being equal to sixty minutes after which is the next whole hour. Part of our sample would look like this: Random Number Day Hour Minute 0301 1594 1 9A.M. 10 8460 8881 11 4P.M. 60 (5P.M.) 8393 6703 11 8P.M. 45 6694 4640 9 8P.M. 30 9632 0005 Delete Note that even though we have used the same rule we used different random digits. The last two random digits in each line were not used. In taking the survey a questionnaire will be given to the first person ( old enough to under- stand it) to enter the library after each sampling time. Example 3 A librarian wishes to determine wheth- er significant sheH space can be ob- tained by removing little-used books from the collection. The criterion has been established that a little-used book is one which has not circulated in the last five years. The sample will be drawn by examining the date due slips and book cards in the back of randomly selected books. If 15 per cent or more of the collection can be removed, the li- brarian will take action. A confidence of 95 per cent and a tolerance of 3 per cent are desired. But budgetary restric- tions limit the sample size to 500. The librarian believes the confidence to be more important and thus adjusts the tolerance to 5 per cent. The sample size from Table 1 is 384. There is little doubt that the number of books satisfying the criterion will be less than 25 per cent of the collection. Thus the correction factor is: .25 X .75 X 4 = .75 This gives a final sample size of 288. CoNF. 99% 95% ± ± TABLE I~ CoNFIDENCE AND ToLERANCE DETERMINE SAMPLE SIZE ToL. SIZE CONF. ToL. .5% 66,358 90% ± .5% 1.0 16,590 1.0 2 4,147 2 3 1,843 3 5 664 5 7 339 7 10 166 10 .5% 38,416 .80 ± .5% 1.0 9,604 1.0 2 2,401 2 3 1,067 3 5 384 5 7 196 7 10 96 10 SIZE 27,060 6,765 1,691 752 271 138 68 16,435 4,109 1,027 457 164 84 41 ~ Values in this table are based upon formulae de- rived in Report No. MG-ML-100, Community Systems Foundation, Ann Arbor, Michigan. Random Sampling: a Tool for Library Research I 125 The collection consists of about 19,000 volumes arranged on 234 sections of shelving. Each section has six shelves and there are 25 books or less on each shelf. To pick a section we will want rules to convert the first random digit to a number between zero and two. This calls for a rule with three numbers per division. 0 to 2 is converted to 3 to 5 6 to 8 9 0 1 2 Delete We can use the second and third ran- dom digits directly as the second and third digits of the section number. It is important that we recognize that the second and third digits must range from 00 to 99 since we need to be able to ob- tain section numbers such as 095 and 173. To select a shelf within . a section we need a rule with six divisions (there is no shelf zero ) . We will use the fourth and fifth random digits. Each division will have sixteen numbers in it. 00 to 15 is converted to 1 16 ~ 31 2 32 to 47 3 48 to 63 4 64 to 79 5 80 ~ 95 6 96 to 99 Delete To pick a book from the chosen shelf we need .a number between 00 and 25. We can use the rule developed for the section number to convert the sixth ran- dom digit to a number between zero and two. The seventh random digit can be taken directly to be the second digit of the book number. Combining all of our rules we can draw our sample. Random number 17340 37589 70322 63492 44906 96988 75172 26401 Section Shelf 073 3 175 6 203 2 234 6 Book 14 Delete 25 06 Again in drawing the sample it may be convenient to convert .all of our random numbers first and order them by section and by shelf before drawing the sample. FINAL REMARKS Random sampling is not necessarily an easy operation. Much thought must go into selecting a confidence and tol- erance and developing rules for con- verting random numbers. Furthermore the tasks of actually converting random numbers and drawing the sample may be tedious. On the other hand the entire job of running a library is becoming more complex. To make decisions which are more technical and involve larger amounts of money librarians need both data and an understanding of how ac- curate it is. The material presented in this article should make it possible for librarians to perform many sampling studies by themselves. For more com- plex studies there are specially de-: veloped statistical techniques. Among these are methods of analyzing data in order to obtain more information from them, methods for more efficient sam- pling, and for recognizing and avoiding biases. Finally computers may be used for generating the sample and for anal- yzing the results. These techniques, however, are in the domain of the spe- cialized researcher rather than that of the librarian. • •