PAL: Toward a Recommendation
System for Manuscripts
Scott Ziegler and
Richard Shrake
INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 84
Scott Ziegler (sziegler1@lsu.edu) is Head of Digital Programs and Services, Louisiana State
University Libraries. Prior to this position, Ziegler was the Head of Digital Scholarship and
Technology, American Philosophical Society. Richard Shrake (shraker13@gmail.com) is a Library
Technology Consultant based in Burlington, Vermont.
ABSTRACT
Book-recommendation systems are increasingly common, from Amazon to public library interfaces.
However, for archives and special collections, such automated assistance has been rare. This is partly
due to the complexity of descriptions (finding aids describing whole collections) and partly due to the
complexity of the collections themselves (what is this collection about and how is it related to
another collection?).
The American Philosophical Society Library is using circulation data collected through the collection-
management software package, Aeon, to automate recommendations. In our system, which we’re
calling PAL (People Also Liked), recommendations are offered in two ways: based on interests
(“You’re interested in X, other people interested in X looked at these collections”) and on specific
requests (“You’ve looked at Y, other people who looked at Y also looked that these collections”).
This article will discuss the development of PAL and plans for the system. We will also discuss
ongoing concerns and issues, how patron privacy is protected, and the possibility of generalizing
beyond any specific software solution.
INTRODUCTION
The American Philosophical Society Library (APS) is an independent research library in
Philadelphia. Founded in 1743, the library houses a wide variety of material in early American
history, history of science, and Native American linguistics. The majority of the library’s holdings
are manuscripts, with a large amount of audio material, maps, and graphics, nearly all of which are
described in finding aids created using Encoded Archival Description (EAD) standards.
Like similar institutions, the APS has long struggled to find new ways to help library users
discover material relevant to their research. In addition to traditional in-person, email, and phone
reference, the APS has spent years creating search and browse interfaces, subject guides , and web
exhibitions to promote the collections.1
As part of these ongoing efforts to connect users with collections, the APS is working on an
automated recommendation system to reuse circulation data gathered through Aeon. Developed
by Atlas Systems, Aeon is a “request and workflow management software specifically designed for
special collections libraries and archives,” and it enables the APS to gather statistics on both the
use of our manuscript collections and on aspects of the library’s users.2 The automated
recommendation system, which we’re calling PAL, for “People Also Liked,” is an ongoing effort.
This article presents a snapshot of current work.
PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 85
https://doi.org/10.6017/ital.v37i3.10357
LITERATURE REVIEW
The benefits of recommendations in library OPACs has long been recognized. Writing in 2008
about the library recommendation system BibTip, itself started in the early 2000s, Mönnich and
Spiering observe that “library services are well suited for the adoption of recommendation
systems, especially services that support the user in search of literature in the catalog.” By 2011
OCLC Research and the Information School at the University of Sheffield began exploring a
recommendation system for OCLC’s Worldcat.3
Recommendations for library OPACs commonly fall into one of two categories, content-based or
collaborative filtering. Content-based recommendations pair specific users to library items based
on the metadata of the item and what is known about the user. For example, if a user indicates in
some way that they enjoy mystery novels, items identified as mystery novels might be
recommended to them. Collaborative filtering combines users in some way and creates
recommendations for one user based on the preferences of another user.
There can be a dark side to recommendations. The algorithms that determine which users are
similar and thus which recommendations to make are not often understood. Writing about
algorithms in library discovery systems broadly, Reidsma points out that “in librarianship over the
past few decades, the profession has had to grapple with the perception that computers are better
at finding relevant information then people.”4 The algorithms that are doing the finding, however,
often carry the same hidden biases that their programmers have. Reidsma encourages a broader
understanding of algorithms in general and deeper understanding of recommendation algorithms
in particular.
The history of recommendation systems in libraries has informed the ongoing development of
PAL. We use both the content-based and the collaborative filtering approach to offering
recommendations to users. For the purposes of communicating them to nontechnical patrons, we
refer to them as “interest-based” and “request-based,” respectively. Furthermore, we are cautious
about the role algorithms play in determining which recommendations users see. Our help text
reinforces the continued importance of working directly with in-house experts, and we promote
PAL as one tool among the many offered by the library.
We are not aware of any literature on the development of recommendation tools for archives or
special-collections libraries. The nature of the material held in these institutions presents special
challenges. For example, unlike book collections, many manuscript and archival collections are
described in aggregate: one description might refer to many letters. These issues are discussed in
detail below.
PUTTING DATA TO USE: RECOMMENDATIONS BASED ON INTERESTS AND REQUESTS
The use of Aeon allows the APS to gather and store data, including both data that users supply
through the registration form and data concerning which collections are requested. PAL use both
types of data to create recommendations.
Interest-Based Recommendations
The first type of recommendation uses self-identified research interest data that researchers
supply when creating an Aeon account. When registering, a user has the option to select from a list
of sixty-four topics grouped into seven broad categories (figure 1). The APS selected these
INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 86
interests based on suggestions from researchers as well as categories common in the field of
academic history.
Upon signing in, a registered user sees a list of links (figure 2); each link leads to a full-page view
of collection recommendations (figure 3). These recommendations follow the model, “You’re
interested in X, other people interested in X looked at these collections.”
Request-Based Recommendations
Using the circulation data that Aeon collects, we are able to automate recommendations in PAL
based on request information. Upon clicking a request link in a finding aid, the user is presented
with a list of recommendations on the sidebar in Aeon (figure 4). Each link opens the finding aid
for the collection listed.
Figure 1. List of interests a user sees when registering for the first time. A user can also revisit this
list to modify their choices at any point by following links through the Aeon interface. The selected
interests generate recommendations.
PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 87
https://doi.org/10.6017/ital.v37i3.10357
Figure 2. List of links appearing on the right-hand sidebar, based on interests that users select.
Figure 3. Recommended collections, based on interest, showing collection name (with a link to
finding aid), call number, number of requests, and number of users who have requested from the
collections. The user sees this list after clicking on option from sidebar, as shown in figure 2.
INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 88
Figure 4. Request-based recommendation links appearing on the right-hand sidebar after a
patron requests an item from a finding aid.
THE PROCESS
Currently, the data that drives these two functions is obtained from a semidynamic process via
daily, automated SQL query exports. Usernames are employed to tie together requests and
interests but are subsequently purged from the data before the results are presented to users and
staff. This section explains the process in detail and presents code snippets where available. All
code is available on GitHub.5
Interest-Based Recommendations
For interest-based recommendations, we employ two queries. The first query pulls every
collection requested by a user for each topic for which that user has expressed an interest. The
second aggregates the data for every user in the system. The following queries get data from the
Microsoft SQL database, via a Microsoft Access intermediary, that Aeon uses to store data. Because
of the number of interest options in the registration form, and the character length of some of
them (“Early America - Colonial History,” for example) we encode the interests in shortened form.
“Early America - Colonial History” becomes “EA-ColHist” so as not to run into character limits in
the database. This section explores each of these queries in more detail and provides example
code.
PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 89
https://doi.org/10.6017/ital.v37i3.10357
The first query gathers research topics for all users who are not staff (user status is ‘Researcher’),
and where at least one research topic is chosen (‘ResearchTopics’ is not null). The data is exported
into an XML file that we call “aeonMssReg.”
SELECT AeonData.dbo.Users.ResearchTopics, AeonData.dbo.Transactions.CallNumber,
AeonData.dbo.Transactions.Location
FROM AeonData.dbo.Transactions INNER JOIN AeonData.dbo.Users ON (AeonData.dbo.Users.UserName =
AeonData.dbo.Transactions.Username) AND (AeonData.dbo.Transactions.Username =
AeonData.dbo.Users.UserName)
WHERE (((AeonData.dbo.Users.ResearchTopics) Is Not Null) AND
((AeonData.dbo.Transactions.CallNumber) Like 'mss%' Or (AeonData.dbo.Transactions.CallNumber)
Like 'aps.%') AND ((AeonData.dbo.Users.Status)='Researcher'))
FOR XML RAW ('aeonMssReq'), ROOT ('dataroot'), ELEMENTS;
The second query combines all data for all users and exports an XML file ‘aeonMssUsers.’
SELECT DISTINCT AeonData.dbo.Users.ResearchTopics, AeonData.dbo.Transactions.CallNumber,
AeonData.dbo.Transactions.Location, AeonData.dbo.Transactions.Username
FROM AeonData.dbo.Transactions INNER JOIN AeonData.dbo.Users ON (AeonData.dbo.Users.UserName =
AeonData.dbo.Transactions.Username) AND (AeonData.dbo.Transactions.Username =
AeonData.dbo.Users.UserName)
WHERE (((AeonData.dbo.Users.ResearchTopics) Is Not Null) AND
((AeonData.dbo.Transactions.CallNumber) Like 'mss%' Or (AeonData.dbo.Transactions.CallNumber)
Like 'aps.%') AND ((AeonData.dbo.Users.Status)='Researcher'))
FOR XML RAW ('aeonMssUsers'), ROOT ('dataroot'), ELEMENTS;
Each query produces an XML file. These files are parsed using XSL stylesheets into subsets for
each research interest. The stylesheets also generate counts of users requesting a collection and
number of total requests for a collection by users sharing an interest.
An example is the following stylesheet for the topic “Early America - Colonial History,” which pulls
from the XML file “aeonMssReg”:
The collections most frequently requested from researchers who expressed an interest in
You expressed interest in:
To ensure a user only sees the links that match the interests they have selected, we use JavaScript
to determine the expressed interests of the current user and display the corresponding links to the
HTML pages in a sidebar. This approach works well, but we must account for two quirks. The first
is that many interests in the database do not conform to the current list of options because many
users predate our current registration form and wrote in free-form interests. Secondly, Aeon
stores the research information as an array rather than in a separate table, so we must account for
the fact that the Aeon database contains an array of values that includes both controlled and
uncontrolled vocabulary.
First, we set the array as a variable so we can look for a value that matches our controlled
vocabulary and separate the array into individual values for manipulation:
// Use var message to check for presence of controlled list of topics
var message = "<#USER field='ResearchTopics'>";
// Use var values to separate topics that are collected in one string
var values = "<#USER field='ResearchTopics'>".split(",");
PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 91
https://doi.org/10.6017/ital.v37i3.10357
We also create variables to generate the HTML entries and links out when we have extracted our
research topics:
var open = " "
Next we set a conditional to determine if one of our controlled vocabulary terms appears in the
array:
//Determine if user has an interest topic from the controlled list
if ((message.indexOf("EA-ColHis") > -1) || (message.indexOf("EA-AmRev") > -1) ||
(message.indexOf("EA-EarlyNat") > -1) || (message.indexOf("EA-Antebellum") > -1) ||
…
If the array contains a value from our controlled vocabulary, we generate a link and translate our
internal code back into a human-friendly research topic (“EA-ColHist,” for example, becomes once
again “Early American - Colonial History”):
for (var i = 0; i < values.length; ++i) {
if (values[i]=="EA-ColHis"){
document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early
America-Colonial History" + close);}
else if (values[i]=="EA-AmRev"){
document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early America-
American Revolution" + close);}
else if (values[i]=="EA-EarlyNat"){
document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early America-
Early National" + close);} else if (values[i]=="EA-Antebellum"){
document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early America-
Antebellum" + close);}
…
See figure 2 for how this appears to the user. Users only see the links that correspond to their
stated interest.
If the array does not contain a value from our controlled vocabulary, we display the research-topic
interests associated with the user account, note that we don’t currently have a recommendation,
and provide a link to update the research topics for the account.
Else
{document.getElementById("notopic").innerHTML = "
Collection
Call Number
# of Requests
# of Users
We are unable to provide a specific collection recommendation for you. Please visit our User Profile page to select from our list of research topics.
" } Request-Based Recommendations In addition to interest-based recommendations, PAL supplies recommendations based on past requests a user has made. This section details how these recommendations are generated. Aeon allows users to request materials directly from a finding aid (see figure 6). To generate our request-based recommendations we employ a query depicting the call number and user of every request in the system and export the results to an XML file called “aeonLikeCollections.” INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 92 SELECT subquery.CallNumber, subquery.Username, IIf(Right(subquery.trimLocation,1)='.',Left(subquery.trimLocation,Len(subquery.trimLocation)- 1),subquery.trimLocation) AS finallocation FROM ( SELECT DISTINCT AeonData.dbo.Transactions.CallNumber, AeonData.dbo.Transactions.Username, IIf(CHARINDEX(':',[Location])>0,Left([Location],CHARINDEX(':',[Location])-1),[Location]) AS trimLocation FROM AeonData.dbo.Transactions INNER JOIN AeonData.dbo.Users ON (AeonData.dbo.Users.UserName = AeonData.dbo.Transactions.Username) AND (AeonData.dbo.Transactions.Username = AeonData.dbo.Users.UserName) WHERE (((AeonData.dbo.Transactions.CallNumber) Like 'mss%' Or (AeonData.dbo.Transactions.CallNumber) Like 'aps.%') AND ((AeonData.dbo.Transactions.Location) Is Not Null) AND ((AeonData.dbo.Users.Status)='Researcher'))) subquery ORDER BY subquery.CallNumber FOR XML RAW ('aeonLikeCollections'), ROOT ('dataroot'), ELEMENTS; We then process the “aeonLikeCollections” file through a series of XSLT stylesheets, creating lists of every other collection that every user of the current collection has requested. First the stylesheets remove collections that have only been requested once. Then we count the number of times each collection has been requested: