College and Research Libraries


M. CARL DROTT 

Random Sampling: a Tool for 

Library Research 

Questions about the accuracy of library records, the behavior or at-
titudes of patrons, or the conditions of the books in the collection 
can often be answered by a random sampling study. Use of this time 
and money saving technique requires no special mathematical ability 
or statistical background. The concept of accuracy is discussed and a 
table is provided to simplify the determination of an appropriate sam-
ple size. A method of selecting a sample using random numbers is 
shown. Three examples illustrate the application of the technique to 
library problems. 

LmRARIANS ARE continually call~d up-
on to make decisions based on imper-
fect data. How many books in the col-
lection are in need of repair? What per-
centage of our patrons use the card 
catalog? Which categories of books can 
be stored without greatly inconvenienc-
ing our patrons? The cost of keeping ac-
curate records to answer this type of 
question is great. On the other hand the 
librarian should have something better 
than an informed guess. One way of 
providing this information is to study- a 
small sample of the collection or user 
population, and to draw conclusions 
based on this sample. 

Sampling is, of course, a compromise 
measure. If unlimited amounts of time 
and money were available there would 
be no need for using methods of ap-
proximation. But if one is faced with 
both a need for information and limited 
resources, it is an important manage-
ment tool. In order to make the most 
effective use of this tool statisticians 

Mr. Drott is with the Community Systems 
Foundation, Ann Arbor, Michigan. 

have developed scientific sampling 
methods. These methods, based on ) 
mathematical precepts, assure maximum 
usefulness and validity of sampling 
data. 

AccuRACY AND SAMPLE SIZE 

One of the first things to decide upon 
is the size of a sample. Intuitively one 
recognizes that larger samples give rise 
to more accurate data. To quantify this 
.accuracy he must distinguish between 
two types of errors possible in a sam-
pling study. 

The first type 9£ errQr is one which 
might be called _t_glerance_. Most com-
monly, results are reported as percent-
ages, for example, "25 percent of the 
books in our collection have circulated 
in the last two years." Because they are 
based upon a sample these figures are 
not exact. Thus we say something like, 
"Between 23 and 27 percent of our col-
lection has circulated in the last two 
years." This tolerance is commonly writ-
ten 25% ± 2 and is read "twenty-five per- ) 
cent plus or minus two." This tolerance 
is a measure of the accuracy of our- re-
sult. 

I 119 


1201 College & Research Libraries • March 1969 

The second type of error measure is 
called confidence. It is a measure of 
how certain one is that the true answer 
lies within the limits stated in his toler-
ance. For instance a confidence of 90 
per cent means that there is one chance 
in ten ( 10 per cent) that the true value 
of the number he is predicting lies out-
side of the tolerance he has set. A state-
ment that 10% ± 4 of the patrons enter-
ing the library go directly to the card 
catalog reported with a 95 per cent con-
fidence means that there is only one 
chance in twenty ( 5 per cent) that the 
actual percentage of patrons going di-
rectly to the catalog is either greater 
than 14 per cent or less than 6 per . cent. 
Confidence can also be interpreted in 
terms of the results expected if a sam-
pling study were repeated. Thus a 90 
per cent confidence means that if a sam-
pling study were repeated ten times 
(using the same sample size and toler-
ance but each time using a different 
sample) the results would be correct 
within the specified tolerance for nine 
of the replications. 

Once a tolerance and a confidence 
have been decided upon one can use 
Table 1 to find the appropriate sample 
size. There are several important points 
in the use of the sample size table. First, 
the sample size needed for a given toler-
ance and confidence is dependent on 
the relative percentages which are ob-
served. The correction can be made by 
applying a very simple formula. First 
estimate what per cent of the sample 
will be in the most important category 
you are dealing with. For example, if 
we were sampling the number of pa-
trons who go directly to the card cata-
log, we might estimate on the basis of 
preliminary observation that the num-
ber is no greater than 20 per cent. We 
write this as decimal fraction .20. Next 
we subtract the fraction from one ( 1.00) 
and multiply our two fractions together. 

Thus: 
1.00 - .20 = .80 

.20 X .80 = .16 

This result multiplied by four ( 4) gives 
our correction factor. 

.16 X 4 = .64 

This factor is to be multiplied by the 
sample size in Table 1 to give a revised 
sample size. If in the example above we 
had decided on a confidence of 95 per 
cent and a tolerance of 2 per cent Table 
1 gives a sample size of 2,401. Multiply-
ing by our correction factor gives a re-
vised sample size of 1, 737. If we cannot 
predict our sample percentage in ad-
vance we can use the sample sizes di-
rectly from the table since these repre-
sent the most conservative size esti-
mates. 

Two further points should be o b-
served in using the sample size table. 
The first is that the approach used in 
preparing the table is valid only for 
sample sizes which are greater than 
thirty but less than 10 per cent of the 
total population. A second point is that 
the sample size must be calculated be-
fore performing the survey. The table is 
not appropriate for calculating the con-
fidence and tolerance of a sample al-
ready collected. 

SELECTING THE SAMPLE 

The entire validity of this sampling 
technique is based upon the use of an 
unbiased sample. That is to say the 
sample must be as representative as 
possible of the entire population. To 
this end one should use the mathemati-
cal concept of a random samp e Ran-
domness in this sense means that for 
each selection (collecting one datum) 
every member of the population has an 
equal chance of being drawn. For ex-
ample if we wanted to select several 
cards randomJy from a new (ordered) 
deck of playing cards we might throw 

. 

\ 


Random Sampling: a Tool for Library Research I 121 

the cards into · a hat, stir them around, 
and draw the sample with our eyes 
closed. On the· other hand suppose we 
were to close our eyes, remove a small 
stack of cards from the top of the deck, 
draw the next card for our first datum, 
remove another small stack, draw our 
next datum, and so on. This would not 
be a random sample since by removing 
the stack of cards we gave them no 
chance of being selected in the follow-
ing draws. This kind of non-randomness 
most often .appears in so-called fraction-
al sampling. An example of this would 
be sampling a card catalog by taking 
every twenty-fifth card (from a random 
starting point). The effects of this type 
of violation of mathematical random-
ness are often difficult to determine. 
Sometimes the results of a study may be 
invalidated, other times the violation 
may have no effect. The critical factor 
is the order in which the population is 
arranged. 

Let us consider two examples from 
public opinion surveys. In the first sit-
uation respondents were selected by 
calling at every twenty-fifth house. The 
interviewers proceeded from block to 
block in an orderly manner. As they 
went around each block they stopped 
at every twenty-fifth dwelling. When the 
results were compared to known data 
they were found to be unbiased. The 
order in which peoples' houses are ar-
ranged in a neighborhood seems to be 
independent of their opinions. Thus the 
fractional technique did not introduce 
any bias. In another survey respondents 
were selected by contacting every twen-
ty-fifth person in the telephone book. 
The results of this survey were found to 
be incorrect. It was later recognized that 
the relationship between names and 
ethnic groups introduced opinion bias 
when the names were in an alphabetical 
list. 

To summarize, sampling techniques 
which are non-random can produce seri-

ous and often undetectable errors. Tech-
niques do exist for using certain types 
of structured samples but these designs 
require careful statistical analysis and 
should only be employed after careful 
consideration. 

We have stressed the importance of 
choosing a random sample, but how 
does one assure randomness? One meth-
od makes use of random number tables. 
Many books on sampling or statistics in-
clude tables of random numbers (see 
bibliography). For example in a random 
number table we may find a column of 
numbers like this: 

174393 
533251 
081831 
987384 
381849 

To use these numbers in sampling we 
must develop rules for each sampling 
situation. Suppose we wish to draw a 
sample from a sheHlist which consists of 
9 drawers, each drawer having no more 
than 1,600 cards (about 16 inches). First 
we select a drawer. For this we can use 
a very simple rule. Namely, let the first 
digit of the random number equal the 
drawer number. We will delete any 
numbers which begin with zero, since 
there is no drawer zero. Next, to select 
a card within a drawer we could use the 
next four digits of each random number 
and count that number of cards into 
each drawer. This, however, would 
make the data collection extremely te-
dious. We may decide that measuring 
a distance into each drawer would be 
sufficiently unbiased for our purposes. 
Thus we will wish to choose a number 
of inches between zero and fifteen and 
a number of sixteenths of an inch be-
tween zero and fifteen. In combination 
this will allow us to have measurements 
of from zero to .almost sixteen inches. 
First let us devise a rule for converting 


122 I College & Research Libraries • March 1969 

the second and third digits of our ran-
dom number to a number of inches be-
tween zero and fifteen. The two random 
digits form one hundred combinations 
from 00 to 99. Since we want sixteen 
numbers (counting zero and fifteen) 
each group will have six numbers per 
group. This is because sixteen goes into 
one hundred a little over six times. Our 
rule will be: 

If the random 
digits are: Convert them to inches: 

00 to 05 0 
06 to 11 1 
12 to 17 2 
18 to 23 3 
24 to 29 4 
30 to 35 5 
36 to 41 6 
42 to 47 7 
48 to 53 8 
54 to 59 9 
60 to 65 10 
66 ~ 71 11 
72 to 77 12 
78 to 83 13 
84 to 89 14 
90 to 95 15 
96 to 99 D elete 

To get a number of sixteenths of an 
inch we want to convert the fourth 
and 'fifth random digits to numbers be-
tween zero and fifteen. We can use ex-
actly the same sixteen division rule de-
veloped above. Thus to convert a ran-
dom number to a card location we con-
vert the first random digit to a drawer 
number, the second and third digits to 
a number of inches, .and the fourth and 
fifth to sixteenths of an inch. For ex-
ample if our random number is 17 439 
we would draw a card from drawer 
number one at a distance of 12%6 inches 
from the front. 

Note that we have not permitted a 
sixteen since this could have given us 
1%s. This illustrates a very important 
point. Suppose for example we had al-
lowed 81%6 to equal nine ( 9). Then 
there would be two ways to get a nine 
( 9%s or 81%6 ) but only one way to get 
a number like 9%6 . This means that 
whole numbers (like 9) would be more 
likely to occur than fractional numbers 

(like 93116). But our definition of ran-
domness required that all numbers be 
equally likely. This is the reason that 
sixteen has been excluded. 

Now let us consider some applications 
of this sampling technique to specific 
problems. These examples involve the 
three most commonly sampled items in 
the library; card files, patrons, and the 
collection itself. 

Example 1 

A large research library is concerned 
about recent discoveries of inaccuracies 
in their holding records. An inventory 
would be extremely . expensive and 
would consume a great deal of profes-
sional time ; thus the librarian wishes to 
conduct a sample study to determine if 
an inventory is actually necessary. 

In order to set" a tolerance and con-
fidence the librarian considers what he 
will do with the results of his study. The 
librarian has decided that if less than 
2 or 3 per cent of the collection is miss-
ing he will take no action. If more than 
6 per cent of the collection is missing he 
is certain that he will conduct an inven-
tory. He is not sure what action he will 
take if the percentage missing is be-
tween 3 and 6 per cent. We can see 
that the tolerance must be less than 3 
per cent; in making a decision it will be 
important to distinguish between 3 and 
6 per cent. The librarian believes that a 
tolerance of 1 per cent will make this 
information most useful to him. He is not 
certain exactly what confidence he de-
sires but because of the costs involved 
in b~ing wrong (e.g. , performing an un-
necessary inventory) he has tentatively 
set a confidence of 99 per cent. The 
table indicates a sample size of 16,590 
for these values. The librarian is certain 
that no more than 10 per cent of the col-
lection is missing. Therefore we can cal-
culate a correction factor for the sample 
size as follows: 

.1 0 X .90 X 4 = .36 


Random Sampling: a Tool for Library Research I 123 

We multiply our sample size by this 
correction factor to get a revised sample 
size of 5,973. Because of the importance 
of this measurement the librarian is will-
ing to take a sample of the required 
size. Thus readjustment of the tolerance 
and confidence is not necessary. 

The sample will be drawn from the 
shelflist. Since this source is biased 
against serials .and other series, only 
monographic entries will be considered. 
The shelflist consists of 1,200 drawers 
(all numbered consecutively) each con-
taining up to 14 inches of cards. The 
sample will be drawn by measuring the 
cards. The random number table used 
by the library has the digits arranged in 
columns thus: 

47 
38 
18 
55 
97 

68 
14 
11 
60 
83 

96 
42 
30 
53 
71 

90 
64 
98 
30 
30 

Drawer numbers must be numbers from 
0000 to 1199. To make drawer numbers 
the first two random digits must be con-
verted to numbers between 00 and 11, 
while the next two digits must become 
numbers between 00 and 99. To convert 
the first two digits the following rule is 
developed: 

If random digits are: Convert them to: 

00 to 07 0 
08 to 15 1 
16 w 23 2 
24 to 31 3 
32 to 39 4 
40 to 47 5 
48 to 55 6 
56 to 63 7 
64 w 71 8 
72 w 79 9 
oo w ~ ro 
88 to 95 11 
96 to 99 Delete 

Each group of random numbers in-
cludes eight numbers because twelve 
(the number of numbers in the range 0 
to 11) goes into 100 (number of num-
bers in the range 00 to 99) eight times 
plus .a fraction. The second part of each 
drawer number can come directly from 
the random list. To pick a number of 

inches between 0 and 13 a similar rule 
is used except this time each division 
contains seven numbers thus the rule 
will be: 

00 to 06 
07 to 13 
91 to 97 
98 to 99 

0 
1 

13 
Delete 

To get sixteenths of an inch we use a 
rule with six numbers per division: 

00 to 05 
06 to 11 
90 to 95 
96 to 99 

0 
1 

--r5 
Delete 

This is the same rule developed earlier. 
We can now use all of our rules to 

pick a sample. For example, using the 
random numbers we would have: 

Random Number Drawer Inches 16's 

47 68 96 90 568 13 15/ 16 
38 14 42 64 414 6 10/ 16 
18 11 30 98 211 4 Delete 
55 60 53 30 660 7 5 / 16 
97 83 71 30 Delete 

To make the actual task of taking the 
sample easier the selection can be or-
dered by drawer number before col-
lecting the data. If in collecting the data 
we should find too few cards in a 
drawer to take the required measure-
ment the data point should be deleted. 
This is important in order to preserve 
randomness. 

Exmnple 2 

A library is taking a survey of user's 
opinions about library services. The data 
will be collected by handing out ques-
tionnaires to a random sample · of pa-
trons as they enter the library. This 
survey will be only one of many things 
which the librarian will use in deciding 
on changes in user service. Thus a tol-
erance of 5 per cent and a confidence 
of 90 per cent seem adequate. The li-
brarian has no idea what percentages of 
the users will hold various opinions thus 
the sample size of 271 will be used di-
rectly from the table. The librarian has 
decided that the survey should cover a 


124 I College & Research Libraries • March 1969 

period of two weeks to assure a repre-
sentative sample of users. The sample 
will be drawn by converting random 
numbers to times. The library is open 
Monday through Friday from 9:00 A.M. 
to 9:00 P.M. and Saturdays from 9:00 
A.M. to 6:00P.M. We will need rules to 
convert random numbers to twelve days, 
twelve hours, and sixty minutes. Our 
rule for converting to days will use the 
first two random digits. 

If random digits are: 

00 to 07 
08 to 15 
16 to 23 
88 to 95 
95 to 99 

Convert them to: 

1 
2 
3 

12 
Delete 

The same rule can be used on the next 
two random digits to give us time. In 
this case one will be equivalent to 9:00 
A.M., two to 10:00 A.M. and so on, with 
twelve being 8:00 P.M. Next we need 
to convert to minutes. A rule with sixty 
steps would be tedious to construct and 
to use. We may decide that no bias 
would be introduced by using time in 
five-minute intervals. Since there are 
twelve five-minute intervals in an hour, 
we need a rule with twelve divisions. 
We can use the same rule that we de-
veloped above. We can convert by set-
ting one equal to five minutes after the 
hour, two equal to ten minutes after, and 
so on with twelve being equal to sixty 
minutes after which is the next whole 
hour. Part of our sample would look like 
this: 

Random 
Number Day Hour Minute 

0301 1594 1 9A.M. 10 
8460 8881 11 4P.M. 60 (5P.M.) 
8393 6703 11 8P.M. 45 
6694 4640 9 8P.M. 30 
9632 0005 Delete 

Note that even though we have used 
the same rule we used different random 
digits. The last two random digits in 
each line were not used. In taking the 
survey a questionnaire will be given to 
the first person ( old enough to under-

stand it) to enter the library after each 
sampling time. 

Example 3 

A librarian wishes to determine wheth-
er significant sheH space can be ob-
tained by removing little-used books 
from the collection. The criterion has 
been established that a little-used book 
is one which has not circulated in the 
last five years. The sample will be 
drawn by examining the date due slips 
and book cards in the back of randomly 
selected books. If 15 per cent or more 
of the collection can be removed, the li-
brarian will take action. A confidence of 
95 per cent and a tolerance of 3 per 
cent are desired. But budgetary restric-
tions limit the sample size to 500. The 
librarian believes the confidence to be 
more important and thus adjusts the 
tolerance to 5 per cent. The sample size 
from Table 1 is 384. There is little doubt 
that the number of books satisfying the 
criterion will be less than 25 per cent of 
the collection. Thus the correction factor 
is: 

.25 X .75 X 4 = .75 

This gives a final sample size of 288. 

CoNF. 

99% 

95% 

± 

± 

TABLE I~ 

CoNFIDENCE AND ToLERANCE 

DETERMINE SAMPLE SIZE 

ToL. SIZE CONF. ToL. 

.5% 66,358 90% ± .5% 
1.0 16,590 1.0 
2 4,147 2 
3 1,843 3 
5 664 5 
7 339 7 

10 166 10 
.5% 38,416 .80 ± .5% 

1.0 9,604 1.0 
2 2,401 2 
3 1,067 3 
5 384 5 
7 196 7 

10 96 10 

SIZE 

27,060 
6,765 
1,691 

752 
271 
138 
68 

16,435 
4,109 
1,027 

457 
164 
84 
41 

~ Values in this table are based upon formulae de-
rived in Report No. MG-ML-100, Community Systems 
Foundation, Ann Arbor, Michigan. 


Random Sampling: a Tool for Library Research I 125 

The collection consists of about 19,000 
volumes arranged on 234 sections of 
shelving. Each section has six shelves 
and there are 25 books or less on each 
shelf. To pick a section we will want 
rules to convert the first random digit 
to a number between zero and two. This 
calls for a rule with three numbers per 
division. 

0 to 2 is converted to 
3 to 5 
6 to 8 

9 

0 
1 
2 

Delete 

We can use the second and third ran-
dom digits directly as the second and 
third digits of the section number. It is 
important that we recognize that the 
second and third digits must range from 
00 to 99 since we need to be able to ob-
tain section numbers such as 095 and 
173. 

To select a shelf within . a section we 
need a rule with six divisions (there is 
no shelf zero ) . We will use the fourth 
and fifth random digits. Each division 
will have sixteen numbers in it. 

00 to 15 is converted to 1 
16 ~ 31 2 
32 to 47 3 
48 to 63 4 
64 to 79 5 
80 ~ 95 6 
96 to 99 Delete 

To pick a book from the chosen shelf 
we need .a number between 00 and 25. 
We can use the rule developed for the 
section number to convert the sixth ran-
dom digit to a number between zero 
and two. The seventh random digit can 
be taken directly to be the second digit 
of the book number. Combining all of 
our rules we can draw our sample. 

Random number 

17340 
37589 
70322 
63492 

44906 
96988 
75172 
26401 

Section Shelf 

073 3 
175 6 
203 2 
234 6 

Book 

14 
Delete 

25 
06 

Again in drawing the sample it may be 
convenient to convert .all of our random 
numbers first and order them by section 
and by shelf before drawing the sample. 

FINAL REMARKS 

Random sampling is not necessarily 
an easy operation. Much thought must 
go into selecting a confidence and tol-
erance and developing rules for con-
verting random numbers. Furthermore 
the tasks of actually converting random 
numbers and drawing the sample may 
be tedious. On the other hand the entire 
job of running a library is becoming 
more complex. To make decisions which 
are more technical and involve larger 
amounts of money librarians need both 
data and an understanding of how ac-
curate it is. The material presented in 
this article should make it possible for 
librarians to perform many sampling 
studies by themselves. For more com-
plex studies there are specially de-: 
veloped statistical techniques. Among 
these are methods of analyzing data in 
order to obtain more information from 
them, methods for more efficient sam-
pling, and for recognizing and avoiding 
biases. Finally computers may be used 
for generating the sample and for anal-
yzing the results. These techniques, 
however, are in the domain of the spe-
cialized researcher rather than that of 
the librarian. • •