College and Research Libraries


Evaluating Reference Service 
in a Large Academic Library 
Cheryl Elzy, Alan Nourie, 
F. W. Lancaster, and Kurt M. Joseph 

An unobtrusive study of the ability of professional librarians to deal with factual 
questions was conducted at the Milner Library, Illinois State University. 
Students were recruited to pose questions for which answers were known, to 19 
librarians in five departments. In all, 190 test "incidents" (10 questions for 
each of the 19librarians) were used. Librarians were evaluated on the accuracy 
of the responses given and on their responsiveness and helpfulness, as judged 
by the student proxies. The methods used in the study are described, including 
the accuracy and attitude scales developed, the major results are presented, and 
suggestions are made on the follow-up action that seems appropriate after a 
study of this kind has been performed. 

everal investigations have sug-
gested that reference librarians 
may provide complete and 

_ correct answers to factual 
questions only about half the time.1 Con-
cern over this disappointing performance 
and questions about its applicability to 
reference service at Illinois State Univer-
sity prompted this study. 

Illinois State University (ISU), a multi-
purpose university of more than 22,000 
students, offers 191 degree programs in 
33 academic departments organized into 
five colleges. Master's degrees are of-
fered in most areas and doctorates in 
nine. Milner Library, the central library 
facility, is organized into five subject di-
visions on six floors: Education/Psychol-
ogy, General Reference and Information, 
Social Sciences/Business, Science/Gov-
ernment Publications, and Humani-

ties/Special Collections. The five divi-
sions are staffed by 20 members of the 
library faculty, 19 classified employees, 
and student assistants. 

OBJECTIVES 

The objectives of the study were (1) to 
estimate the probability that a user, 
walking into the library with a factual 
question, would receive or be led to a 
complete and correct answer, (2) to de-
termine to what extent student users of 
the library judge staff members to be 
responsive and helpful, (3) to identify 
conditions under which members of the 
reference staff perform well and condi-
tions under which they perform poorly, 
and thus (4) to identify ways in which 
the service might be improved. In other 
words, the focus of attention would be 
on the accuracy with which factual ques-

Cheryl Elzy is Head of the Education and Psychology Division, and Alan Nourie is Associate University 
Librarian for Public Services and Collection Development at Milner Library, Illinois State University. 
F. W. Lancaster is Professor at the Graduate School of Library and Information Science, University of 
Illinois-Urbana-Champaign, Urbana, Illinois 61801. Kurt M. Joseph is a Graduate Assistant in the 
Department of Psychology at Illinois State University, Normal, Illinois 61761. The study described was 
supported in part by the Council on Library Resources through its Cooperative Research Grant Program. 

454 


tions are answered by or with the aid of 
library staff. While we would agree with 
such writers as Jo Bell Whitlatch that 
success in answering this type of ques-
tion is not the only criterion by which an 
academic reference service should be 
judged, we have little sympathy with 
others who state or imply that accuracy 
is of little concern to library users, who 
are more concerned with such things as 
convenience, timeliness, and the librarian's 
attitude.2.3 

The study was not performed as an 
academic exercise. No hypotheses were 
formulated. Our main concerns were to 
get a better idea of the quality of refer-
ence service at ISU and, in particular, to 
identify possible problem areas. 

METHODS 

The following decisions were made at 
the outset: 

1. The evaluation should be performed 
unobtrusively. Questions, of the type 
that might reasonably be put to the 
various departments of the Milner 
Library, would be collected and stu-
dents recruited to pose them to 
members of the library faculty as 
though these questions represented 
their actual information needs. 

2. Faculty members would be evalu-
ated on attitudinal characteristics 
and on whether or not they were 
able to supply complete and correct 
answers. 

3. The study would be conducted be-
tween April17 and April24, 1989 to 
allow sufficient time to recruit and 
hire students. It would occur dur-
ing a peak time of activity in the 
academic calendar-midway be-
tween spring break and the end of 
the semester. Group and individual 
training sessions for student prox-
ies would be held one week prior to 
the study. There would be one week 
of class and one week of finals dur-
ing which proxies could pose their 
test questions if they somehow 
failed to do so during the test pe-
riod. 

Unobtrusive studies of reference ser-
vice have been performed several times 

Evaluating Reference Service 455 

in the last 20 years, and they have been 
reviewed and evaluated elsewhere.4 The 
present study differs from most earlier 
ones in several ways: 

1. In most unobtrusive studies the ques-
tions have been posed by telephone 
rather than by personal visit to the 
library. 

2. Most studies have been performed 
in order to compare libraries, and 
perhaps to identify broad catego-
ries of factors that might influence 
reference performance, rather than 
to derive detailed data on a single 
library. 

3. This study, involving 190 reference 
transactions, appears to be the larg-
est unobtrusive study of reference 
service yet to be attempted within a 
single library. 

The director of the library requested 
that a memorandum be sent to all public 
service librarians stating that the study 
would be performed some time during 
the coming year and that the results would 
not affect their annual performance evalti-
ations. This was done in January 1989. A 
few individuals expressed reservations, 
but after they learned more about the 
study and its background and intent, 
their concerns were overcome. 

An evaluation form was created for 
students to complete after each test ques-
tion had been posed. It recorded the test 
questions and the answers given by li-
brarians; it also provided space for ob-
servations on the attitude and demeanor 
of the librarians. Each item was followed 
by a request for open comments from the 
students. 

It was decided that at least 10 ques-
tions should be posed to each librarian. 
More would be desirable, but probably 
would be too unwieldy. Fewer might not 
provide a true picture of that librarian's 
attitude and skills. Therefore, 190 inci-
dents would be recorded by the stu-
dents-10 questions for each of 19 
librarians. 

Recruiting students for an unobtru-
sive study is difficult because research-
ers cannot simply advertise in the 
student newspaper or on the bulletin 
board and still keep the project confiden-


456 College & Research Libraries 

tial. Some contacts were made through 
professors known to the researchers at 
ISU and also at Illinois Wesleyan Univer-
sity across town. They were provided a 
job announcement describing the project 
and the responsibilities required of stu-
dents. Some students did apply from 
both institutions through these personal 
contacts. In the end, however, the best 
source for recruitment proved to be Mil-
ner Library's own records of students 
who had applied for positions, but could 
not be hired for administrative or finan-
cial reasons. The person at Milner who is 
in charge of student employees provided 
files on 18 ISU students who might be 
suitable candidates, and these were con-
tacted. Together with two students from 
Wesleyan, they formed the group of prox-
ies. They were to be paid $75 each for 
approximately 10 to 15 hours of work, 
including attending a training session, 
being interviewed individually by the 
investigators, asking the questions, fill-
ing out the evaluation forms and attend-
ing a follow-up, or debriefing, session. 

All 20 students were undergraduates: 
7 males and 13 females; 4 freshmen, 4 
sophomores, 8 juniors, and 4 seniors. 
Ages ranged from 17 to 23. Although the 
students came from a wide variety of 
disciplines, no attempt was made to 
match reference questions to student 
majors. The most difficult and time-con-
suming task was creating the questions 
to be used. The researchers put together 
an initial pool of hundreds of questions 
gathered from reference texts, other ref-
erence studies, and the investigators' 
own backgrounds and experiences.5 
Most were rejected on the first reading. 
Approximately 150 different questions 
were selected for research in the Milner 
Library collection and evaluated for pos-
sible use. All questions that could not be 
answered from Milner's collection were 
eliminated. Those of a type or subject not 
usually asked in this library were also 
discarded. From those that remained, 
questions were matched by subject and 
specialty with appropriate floors and li-
brarians, and 58 questions were finally 
selected. Most were asked more than 
once. For example, many questions were 

September 1991 

asked of each librarian on a particular 
floor because librarians are responsible 
not only for reference work in their own 
subject areas, but for all disciplines housed 
on their floor (e.g., the music librarian has 
to answer questions in music, fine arts, 
literature, and languages). The same 
question was asked on different floors 
where it was appropriate to do so. 
Through repetition of questions, 190 test 
incidents were created. 

The set of questions used differed 
from that of most earlier studies in that 
it contained a blend of factual questions 
(e.g., Who was the secretary of state 
when Sumner Wells was his assistant?) 
and of questions that were more research 
oriented (e.g., I need some articles dis-
cussing the short story entitled "The Lot-
tery"). Because all questions could be 
answered from the resources of the Mil-
ner Library, the study was really an eval-
uation of the librarians' ability to exploit 
the library's resources. 

Simply recognizing that one may 
be perceived in a certain way by a 
patron ... might be enough to solve 
the problem. 

The scheduling of when a student was 
to pose a question was essential to the 
project because the student was to seek 
a particular librarian, identified by desk 
nameplate. This proved difficult with so 
many people and questions involved. 
The researchers wanted to avoid ques-
tions being asked twice of the same li-
brarian. Another goal was to spread the 
questions out evenly during mornings, 
afternoons, and evenings, and through 
the several days of the study. Librarians 
work the reference desks only at certain 
times, and students had to try to match 
their schedules with those of the librari-
ans. The researchers also wanted to 
spread the student proxies out among 
the five floors and over the evaluation 
period so that they would not become 
too familiar to anyone on a given floor 
and in any time frame. The questions 
were laid out against the floors with the 


librarians' schedules, and 20 question sets 
of 8 to 10 questions each were created. 

The week before the actual study, 
group training sessions were organized 
for the student proxies. The background 
for the project was discussed, as was 
how the study would be conducted and 
how the results would be used. Packets 
were given to each stUdent with a sched-
ule of when and to whom each question 
should be asked, along with a list of 
questions and an evaluation form for each. 
The forms were explained in great detail. 
Students were encouraged to complete the 
forms immediately after the encounter so 
that information and impressions would 
be fresh in their minds, and comments 
were strongly encouraged. Finally, their 
pay, the time frame, and other require-
ments were discussed. An individual 
training session was set up with each 
proxy over the next three days to assist 
each one in understanding the project, 
questions, and forms, and to help each 
construct a "cover story" for the ques-
tions in the event of an extended refer-
ence interview. 

A number of unexpected problems oc-
curred during the study itself, such as 
librarians being on vacation or out sick, 
no nameplates on desks, making identi-
fication difficult, and so on. But only 
about 14 questions had to be carried into 
a third week; all 190 forms were re-
turned, and the students performed dil-
igently. A follow-up session was held 
with the students a few days after the 
evaluation period. They were eager to 
share their thoughts and experiences. 
Comments ranged from how the time of 
day affected how librarians handled 
questions to the perception that the bet-
ter the students dressed, the better the 
service received. 

The work of verifying and grading an-
swers, and of summarizing the mounds 
of data collected, continued through 
summer and fall. The investigators from 
Milner rechecked each answer on each 
floor and scored it. The attitudinal scores 
from each form were compiled, and 
comments on each librarian recorded. 
Each individual student, librarian, floor, 
and question was given a unique code, 

Evaluating Reference Service 457 

and the SPSS program was used to ma-
nipulate the data. Demographics on 
each student were entered, as well as 
information about each of the 190 inci-
dents (e.g., time of day and date the 
question was asked, minutes spent with 
the librarian, and so on). 

RESULTS 

Each student was asked to record the 
details of the answer and search results 
provided by the librarian for each ques-
tion posed and also to supply informa-
tion on the source of the answer-title, 
edition, page number, and so on. Also 
requested were the date and time of day 
the question was posed and how many 
minutes were spent in contact with the 
librarian. The researchers checked each 
answer in the 190 cases for completeness 
and correctness and noted sources used 
by the librarians. Each incident was as-
signed a code reflecting the relative suc-
cess of the librarian in answering the 
question (see table 1). 

Arriving at a viable scoring procedure 
was difficult. It is relatively easy to score 
the results of a question posed to a li-
brary by telephone. The library can be 
considered as simply a ''black box" and 
the librarian scored on a binary scale-
giving the correct answer or not. For a 
walk-in to an academic library, scoring is 
more complicated. It was decided to 
score from the viewpoint of the student 
user. Because we were evaluating the 
role of the library in answering questions 
rather than in bibliographic instruction, 
we decided that the best possible result 
was one in which the user was given the 
complete and correct answer. Anything 
less should receive a lower score. Being led 
to appropriate sources by the librarian was 
judged less satisfactory than receiving a 
correct answer, but being led to sources 
was judged better than being pointed to 
them. It was also thought that being led or 
pointed to several sources, one of which 
included the complete and correct an-
swer, was less satisfactory than being led 
or pointed to only one source that con-
tained the complete and correct answer. 
Finally, the worst result was one in 
which the user finished with an incorrect 


458 College & Research Libraries September 1991 

TABLE1 
SCORING METHOD USED 

Points 

Student provided with complete and correct answer . . . . . . . . . . . . . . . . . 15 

Student led to a single source, which provided complete and correct answer . . . 14 

Student led to several sources, at least one of which provided complete and 
correct answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 

Student directed to a single source, which provided complete and correct answer 12 

Student direced to several sources, at least one of which provided complete and 
correct answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

Student given an appropriate referral to a specific person or source, which provided 
complete and correct answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

Student provided with partial answer . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

Student given an appropriate referral to the card catalog or another floor . . . . . . . 8 

Librarian did not find an answer or suggest an alternative source . . . . . . . . . . 5 

Student given an inappropriate referral to catalog, floor, or source, or librarian 
unlikely to provide complete and correct answer . . . . . . . . . . . . . . . . . 3 

Student given inappropriate sources 

Student given incorrect answer . 

answer. These principles are reflected in 
the scoring method used. 

Some findings from earlier investiga-
tions support the scoring method used 
in this study. Wyma J. Hood and Monte 
J. Gittings found that librarians accom-
panied users about 54% of the time when 
answering reference questions in one ac-
ademic library.6 About 46% of the time, 
they showed them how to find the an-
swer, and about 34% of the time, they 
found the answer for the user. Thomas 
Childers, in an unobtrusive study of ref-
erence service in public libraries, found 
that when proxies were directed to li-
brary tools, such as indexes or catalogs, 
they were not accompanied by the librar-
ian in more than half the cases. 7 When 
directed to browse through shelves, books, 
chapters, or articles to find answers to 
their questions, they were not accompa-
nied by the librarian in about 80% (129 
out of 159) of the cases. Charles A. Bunge 
found that library users received correct 
answers only 47% of the time when 
merely directed to appropriate sources 
by a busy librarian. This improved to 
59% when a busy librarian helped the 
users search and to 65% when a librarian 

2 

0 

not otherwise busy helped the patrons 
search.8 

The scoring method we adopted served 
our purposes, but it can hardly be consid-
ered definitive, and other researchers 
might well disagree with it. Its major 
limitation is that, to avoid complications 
in data processing, a zero was assigned 
to the situation in which a student re-
ceived an incorrect answer when, in fact, 
it would make much more sense to give 
this situation a minus score. If a zero is 
associated with an incorrect answer, it is 
necessary to give certain other unsatis-
factory outcomes (e.g., an inappropriate 
referral) a positive value, because they 
are considered somewhat less heinous, 
but this hardly seems logical. In fact, a 
more logical scoring procedure would 
probably assign a zero to the situation in 
which no answer was provided (which 
scored 5 in table 1) and would give 
minus values to inappropriate referrals 
and to incorrect answers. 

The attitude scale, in contrast to the 
accuracy scale, was a very simple one. 
The students rated the librarians on their 
helpfulness and approachability on a 
continuous scale of 0 to 10 in answer to 


24 questions (e.g., Looks approachable? 
Acknowledges user's approach to desk? 
Friendly attitude?). The attitude value 
for a librarian or a floor is merely the 
mean of all values recorded by the stu-
dents for that librarian or floor for all24 
questions. 

The scores for the 190 reference inci-
dents are summarized in table 2. In 58 
cases, the librarian received a score of 15, 
meaning that he or she provided a com-
plete and correct answer. However, the 
philosophy of service in academic librar-
ies often is to provide the appropriate 
sources or point the student in the right 
direction. If that level of service is ac-
cepted as adequate, then scores of 10 or 
abov.e would be considered acceptable. 
This was true of 111 cases (58% of all 
incidents). If an appropriate referral to 
the card catalog or another floor is con-
sidered acceptable, the library's score in-
creases to 121 cases (about 64% of all 
incidents). It goes to 128 cases (about 
67% of the total) if partial answers are 
considered acceptable. In)8 of 190 cases 
(9.5%), the librarian could not or did not 
find an answer. In 36 (19%) instances, 

Evaluating Reference Service 459 

inappropriate referrals and sources or 
outright wrong answers were given. 

The time of day the question was asked 
was coded to determine whether it seemed 
to affect the accuracy or the behavior of 
librarians. Accuracy decreased during 
evening hours, but the decrement was not 
statistically significant. Time of day also 
had no significant effect on how well the 
students perceived they had been treated 
(attitudinal scores). 

Table 3 shows how often each of the 58 
questions was posed, the accuracy of 
scores achieved (mean score for ques-
tions posed more than once), and the 
mean time spent per question. Because 
of minor mix-ups, questions 7 and 17 
were not asked. In a few other cases, 
students failed to record needed data-
particularly minutes spent with the li-
brarian-or sufficient information on 
answers to allow the investigators to ver-
ify them. The number of these cases is 
noted in parentheses in table 3. 

The frequency with which a question 
was asked ranged from 1 to 8 times. Ac-
curacy ranged from 2.3333 on question 
43, to 8 total questions scoring 15, and 7 

TABLE2 
ACCURACY OF ANSWERS PROVIDED 

Answer Code Frequency Percent 

15 58 30.5 

14 24 12.6 

13 13 6.8 

12 5 2.6 

11 8 4.2 

10 3 1.6 

9 7 3.7 

8 10 5.3 

5 18 9.5 

3 10 5.3 

2 16 8.4 

0 10 5.3 

Missing"' 8 4.2 

190 100.0 
"'Some students failed to provide enough information upon which to base judgments, or asked the 

question in such a way as to change the expected response, thus invalidating the question. 


460 College & Research Libraries September 1991 

TABLE3 
QUESTION BY QUESTION RESULTS 

Mean #of Minutes 
Question Times Posed Accuracy Spent on Question 

2 12.0000 13.5 

2 2 13.0000 5.0 

3 2 7.5000 4.0 

4 2(1)* 15.0000 3.25 

5 2 14.0000 6.5 

6 4(1)t 5.5000 6.0 

7 

8 2 8.0000 9.0 

9 5 10.2000 4.2 

10 2 14.0000 3.0 

11 4 9.7500 4.2 

12 4 13.2500 8.0 

13 2(1)* 14.0000 5.0 

14 2 15.0000 3.5 

15 2 11.5000 3.0 

16 5(1)t 5.2500 4.3 

17 

18 3 10.5000 15.0 

19 5(1)t 10.4000 2.5 

20 7 9.7143 8.63 

21 4(1)t 13.0000 4.66 

22 5 5.8000 6.4 

23 7 10.1667 4.93 

24 2 8.5000 2.0 

25 1 11.0000 2.0 

26 6 15.0000 6.66 

27 3 11.3333 5.83 

28 4 3.4000 4.0 

29 4 7.2500 ·2.5 

30 4 7.6667 9.0 

31 1 15.0000 12.5 

32 3 9.3333 5.6 

33 2(2)* 6.0 

34 3 15.0000 11.66 

35 2 15.0000 7.5 

36 · 3 15.0000 9.3 

37 3(1)* 5.0000 6.3 


Question Times Posed 

38 5 

39 2 

40 3 

41 4 

42 3 

43 3 

44 3 

45 3 

46 2 

47 3 

48 7 

49 8(1)* 

50 3 

51 6 

52 3 

53 3(1)* 

54 3 

55 7 

56 2 

57 3 

58 

"' Missing data in accuracy code. 

t Missing data in minutes spent. 

scoring 14 on the accuracy scale. Minutes 
spent went from a low of 0 (the librarian 
did not leave his or her office, but con-
ducted the transaction through the stu-
dent assistant covering the desk) to a 
high of 25 minutes. Mean actual minutes 
spent went from 2 on questions 24 and 
25, to 15 on question 18. There was no 
appreciable correlation between min-
utes spent with students and accuracy (r 
= .1008, n = 179, p > .09). Thus, the accu-
racy variable was collapsed into three 
categories for further analysis (15 to 10 
as acceptable, 9 to 6 as minimal, 5 to 0 as 
unacceptable), but there was still no re-
lationship between minutes spent and 
accuracy (r = .0299, n = 179, p = .346). 
Time spent with the student did notre-
flect how complete and correct the an-
swer was. 

Evaluating Reference Service 461 

Mean #of Minutes 
Accuracy Spent on Question 

11.6000 6.9 

5.0000 12.5 

4.3333 5.3 

7.5000 5.0 

10.3333 7.3 

2.3333 7.83 

7.0000 5.66 

14.3333 3.0 

8.0000 5.25 

6.3333 9.33 

11.5714 9.86 

7.1429 3.69 

15.0000 4.83 

12.3333 5.25 

10.0000 3.1 

14.0000 4.3 

6.3333 6.83 

9.5714 6.43 

14.0000 7.5 

12.0000 3.0 

14.0000 5.0 

Table 4 shows the accuracy and 
attitude scores achieved by floors or di-
visions. Floor E achieved both the lowest 
accuracy figure and the lowest attitude 
scores. Floor C received highest marks 
for attitude, while floor B was most ac-
curate. 

Table 5 records the attitude and accu-
racy scores by librarian, along with the 
number of questions each librarian was 
asked and the minutes the librarian 
spent with the students. Attitude scores 
ranged from 8.75 on a 10-point scale, 
down to 5.74. Accuracy ranged from 
13.889 out of 15, down to 7.125. While the 
librarian who scored highest in attitude 
also received highest marks in accuracy, 
the same was not true of the lowest 
scores in each category. Both accuracy 
and attitudinal scores are discriminat-


462 College & Research Libraries September 1991 

TABLE4 
ACCURACY AND ATTITUDE SCORES BY FLOOR 

Floor Questions Accuracy Attitude 

A 30(3)* 10.4074 8.2100 

B 30 12.7333 8.2067 

c 20(2)* 11.7778 8.5200 
D 71(2)* 9.6377 7.7141 

E 39(1)* 8.1053 7.1256 

Mean 190(8)* 10.1538 7.8342 

* Missing data for accuracy scores. 

TABLES 
ACCURACY AND ATTITUDE SCORES FOR EACH LIBRARIAN 

Number of 
Librarian Questions Asked Attitude Accuracy Mean Minutes Spent 

1 10(1)* 

2 10 

3 10 

4 9(1)* 

5 10(1)* 

6 10 

7 10 

8 10 

9 10(1)* 

10 10 

11 10(1)* 

12 10(1)* 

13 10(1)* 

14 10 

15 10 

16 12 

17 10 

18 9 

19 10(1)* 

Mean 190(8)* 

* Missing data for accuracy scores. 

ing. One individual, for example, scored 
a low 7.2222 on accuracy and a low 5.74 
on attitude. 

Accuracy was found to be only mini-
mally associated with attitudinal scores 
(r(182) = .2482, p < .0001). Answering a 
question correctly and completely was 

8.1900 

7.0000 

7.6300 

7.6000 

8.7500 

8.2100 

7.7200 

8.2300 

8.2900 

7.8000 

5.7400 

7.3600 

7.7800 

7.8700 

8.1800 

7.0750 

8.6900 

8.2444 

8.6600 

7.8342 

10.3333 4.35 

7.6000 5.45 

7.5000 6.975 

7.1250 5.65 

13.8889 7.88 

13.000 4.85 

11.8000 6.7 

10.8000 6.3 

9.6667 4.3 

9.5000 7.6 

7.2222 2.15 

11.8889 3.95 

11.2222 6.95 

8.6000 8.05 

9.7000 5.85 

8.5833 4.75 

13.4000 7.30 

10.2222 8.05 

9.6667 8.5 

10.1538 

not a good predictor of how well the 
students in this study perceived they 
were being treated. Conversely, librari-
ans who project positive images ,do not 
necessarily answer questions with the 
highest accuracy. Minutes spent with stu-
dents apparently did affect the attitudinal 


scores they assigned to the librarians. 
Librarians who spent 4 or more minutes 
with students tended to get assigned a 
higher attitude score than those who 
spent less time (F = 7.592, p < .00001). 

In 36 instances, inappropriate refer-
rals and sources or outright wrong 
answers were given. 

It was thought that perhaps differ-
ences might be found between questions 
of a ready reference nature and those in-
volving more extended research. There-
fore, the 58 questions were divided into 
two groups based on the number of 
sources needed to find the complete and 
correct answers and the level of diffi-
culty of the questions (as perceived by 
the investigators) to test the prediction 
that when ready reference questions are 
asked, the accuracy and attitudinal 
scores will be higher. However, the type 
of question did not affect either attitudi-
nal scores (t (186) = -.30, p = .768) or 
accuracy (t (177) = 1.10, p = .271). There-
fore, the difficulty of the question did not 
significantly affect student ratings of ac-
curacy or attitude. 

SIGNIFICANCE AND 
USE OF THE RESULTS 

The study was intended as a practical 
one-to gain information and insights 
on how reference service at ISU might be 
improved. While the significance of certain 
relationships was examined statistically, no 
formal hypotheses were formulated or 
tested. This section of the paper, then, deals 
with the authors' perceptions of the value 
of the results to ISU and with how these 
results have been and are being used. 

As full faculty, ISU librarians are eval-
uated each year for the distribution of 
merit dollars. Three areas of perfor-
mance are scrutinized: (1) practice of li-
brarianship (the equivalent of teaching 
performed by general faculty), (2) re-
search and scholarly activity, and (3) ser-
vice. Librarianship (the most heavily 
weighted component) may also be the 
most difficult to evaluate in many in-

Evaluating Reference Service 463 

stances-especially for public service li-
brarians. In evaluating reference activity, 
impressionistic anecdotes or testimonials 
from colleagues often replace more objec-
tive data. Teaching faculty have tradi-
tionally been subjected to regular 
student evaluations. In a similar fashion, 
unobtrusive evaluations, such as that re-
ported here, furnish a comparable exam-
ination of reference performance from 
several perspectives, accuracy and de-
portment among them. Such evaluations 
allow the quality and character of refer-
ence service to be discussed and evalu-
ated at a level more concrete than 
opinion, conjecture, or speculation. 

In considering the results of this study, 
a consensus must first be arrived at as to 
exactly what is an acceptable level of 
accuracy and of attitude. Is 70% accuracy 
acceptable? Is 50%? Is an attitude score 
of 7.8 on a 10-point scale what an insti-
tution should be aiming for or should 
tolerate? What level is unacceptable-7, 
6, 5? Is the fact that 15% of the questions 
are dealt with in less than two minutes 
significant? That 37% are dealt with in 
less than four minutes? 

In making use of the results the librar-
ians involved should be made thor-
oughly familiar with the methodology of 
the project and the instrument used. 
Once the group recognizes that there 
very well could be problems in the level 
of service furnished, ideas on how to 
address them can be solicited, or pre-
sented, and discussed in an informal 
meeting. On one level, simply recogniz-
ing that one rna y be perceived in a certain 
way by a patron, or that two or three 
minutes may not be an appropriate 
amount of time to give all questions, or 
that one may have developed a tendency 
over the years to point students in the 
direction of sources rather than lead 
them, might be enough to solve the prob-
lem. With some librarians, the mere fact 
that they are reminded of possible prob-
lems or weaknesses in their performance 
may be enough to create a self-correcting 
situation. However, this will not always 
be the case, and other options should be 
explored-for example (1) personal in-
terviews for the librarians falling at the 


464 College & Research Libraries 

low end of the rating scales; (2) use of 
outside speakers to present a workshop 
on improving reference service and com-
bating and reducing the effects of burn-
out; and (3) identification of the types of 
questions most likely to be dealt with 
inadequately. 

From an unobtrusive study of the type 
described, improvement in reference ser-
vice can be addressed at several levels: 
personal, divisional, and institutional. If 
warranted, personal conferences with the 
librarians can be conducted to discuss, 
for example, undesirable elements of 
service. This might be a tendency to use 
inappropriate reference sources, to con-
duct peripheral business at the reference 
desk, or to give an undesirable impres-
sion of one's approachability, friendli-
ness, or willingness to help. At this 
personal level, one can simply run 
through the list of comments made by 
the surrogate users and discuss the indi-
vidual questions with the librarians. 

On the divisional or institutional level, 
the collective consciousness relating to 
reference service can be heightened by 
broad, nonconfrontational group discus-
sion of patterns detected. Traditional as-
sumptions and platitudes about the 
excellence of service furnished can be chal-
lenged and strengths and weaknesses 
pointed out. Librarians with an accuracy 
score below some selected level should 
be consulted privately. The pattern of 
time spent on questions may be worth 
discussion with some librarians (one li-
brarian spent one minute or less on half 

September 1991 

the questions received and less than three 
minutes on 80% of them), as would the 
collection of comments made by observ-
ers (about 7 pages for each librarian). 

Traditional assumptions and plati-
tudes about the excellence of service 
furnished can be challenged and 
strengths and weaknesses pointed out 

The third level for discussion would 
occur at the divisional level. Here, if the 
assessment of performance showed real 
excellence, as it did in some instances, it 
can be commended and serve as a mo-
rale builder. If, on the other hand, unde-
sirable trends were disclosed (e.g., 
reluctance to handle questions dealing 
with a certain collection located on the 
floor), they should be discussed and exist-
ing policy regarding them clarified or re-
vised. One unfortunate aspect of 
providing anonymity in such a project is 
that, while the identities of the under-
achievers are protected, so too are the iden-
tities of the stars-the librarians whose 
performance is truly exemplary and who 
should be used as role models. 

After conducting personal interviews, 
general and divisional meetings, and an 
in-house developmental institute, the li-
brary should implement a similar proj-
ect, after an appropriate amount of time 
has passed, to determine what changes, 
if any, have occurred as a result of the 
evaluation process. 

REFERENCES AND NOTES 
1. Terence Crowley, "Half-Right Reference: Is It True?" RQ 25:59-68 (Fall1985). 
2. Jo Bell Whitlatch, "Unobtrusive Studies and the Quality of Academic Library Reference 

Services," College & Research Libraries 50:181-94 (Mar. 1989). 
3. Duane E. Webster, "Examining the Broader Domain," Journal of Academic Librarianship 

13:79-80 (May 1987); and Joan C. Durrance, "Reference Success: Does the 55 Percent 
Rule Tell the Whole Story?" Library Journal114:31-36 (Apr. 15, 1989). 

4. Ronald Rowe Powell, "Reference Effectiveness: A Review of Research," Library and 
Information Science Research 6:3-19 (Jan.-Mar. 1984); Crowley, "Half-Right Reference," 
p.59-68; F. W. Lancaster, If You Want to Evaluate Your Library . .. (Champaign: Univ. of 
Illinois Graduate School of Library and Information Science, 1988); and Peter Hernon 
and Charles R. McClure, Unobtrusive Testing and Library Reference Services (Norwood, 
N.J.: Ablex, 1987). 


Evaluating Reference Service 465 

5. Rolland E. Stevens and Donald G. Davis, Jr., Reference Books in the Social Sciences and 
Humanities, 4th ed. (Champaign, Ill.: Stipes, 1977); Thomas P. Slavens, Informational 
Interviews and Questions (Metuchen, N.J.: Scarecrow, 1978); Marcia J. Myers and Jassim 
M. Jirjees, The Accuracy of Telephone Reference/Information Services in Academic Libraries: 
Two Studies (Metuchen, N.J.: Scarecrow, 1983); Charles R. McClure and Peter Hernon, 
Improving the Quality of Reference Service for Government Publications (Chicago: American 
Library Assn., 1983); and Janine Schmidt, "Reference Performance in College Librar-
ies," Australian Academic and Research Libraries 11:87-95 (June 1980). 

6. Wyma Jane Hood and Monte James Gittings, Evaluation of Service at the General Reference 
Desk, University of Oregon Library (Eugene: Univ. of Oregon, 1975), ERIC Document 
Reproduction Service No. ED 110 038. 

7. Thomas Childers, The Effectiveness of Information Service in Public Libraries: Suffolk 
County. Final Report (Philadelphia: Drexel Univ., School of Library and Information 
Science, 1978), passim. 

8. Charles A. Bunge, "Factors Related to Reference Question Answering Success: The 
Development of a Data-Gathering Form," RQ 24:482-86 (Summer 1985). 

"Some librarians decide to consolidate their business with one 
serials vendor because they appreciate the 

benefits of dealing with a single representative. 
We're consolidating as many titles as we can 

through Faxon for an even more 
simple reason. 

I've never asked Faxon for anything I haven't gotten." 

&:on 

-DINA GJAMB11 HEAD OF ACQUISITIONS AND SERIALS, 
KE T STATE UNIVERSITY 

Helping you manage your world of information. 
To learn more about the Faxon Company, the international subscription 
agency with a commitment to quality service, calli (800) 766-0039.