Taking Local Resources Global Issues in Science and Technology Librarianship Spring 1998 DOI:10.5062/F48913VH URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. Taking Local Resources Global: The NCSTRL Experience at UC Berkeley Library Ann Jensen Kresge Engineering Library University of California at Berkeley ajensen@library.berkeley.edu Abstract NCSTRL (Networked Computer Science Technical Report Library) is an international collection of computer science technical reports from over 100 academic and industrial research institutions. Using Dienst software which was developed at Cornell University specifically for the NCSTRL project, these distributed collections are searchable as an integrated collection from anywhere in the world. At most participating institutions, the technical reports are mounted and maintained on NCSTRL by staff in the computer science or research departments. Early in 1997, the UC Berkeley Library began a partnership with the UCB CS department, whereby the Library took over this function. This paper will discuss UC Berkeley Library's experience in taking over as the provider of these technical reports. It will include a brief non-technical description of NCSTRL, a discussion of the expectations of the Library and reasons for our proposal, the benefits, costs and technical challenges to the library in both development and maintenance of the data, evaluation of the benefits to the user group, and conclusions about the effectiveness of the library as the provider of local research reports for a World Wide Web audience. NCSTRL may be accessed at: {http://www.ncstrl.org/} Introduction Technical reports are the vehicle of choice for timely introduction of computer science research to the academic and commercial community. In addition to a high level of usage of reports near the time of their publication, many of these reports also sustain an archival value for years after their introduction. The Networked Computer Science Technical Report Library - NCSTRL (NCSTRL Documentation) is a distributed electronic collection, developed in 1995 under the leadership of Carl Lagoze and Jim Davis of Cornell who developed the Dienst (Dienst Documentation) software which allows the distributed collections to function as one (Davis et al 1996). Within NCSTRL, Berkeley's reports are combined with computer science technical reports from more than 100 academic and industrial research institutions to comprise a collection of the best in worldwide computer research. The actual reports and the engines for searching them reside at repositories distributed among the participating institutions. Users from around the world may search, browse, read, and download reports using a WWW interface as if it were one unified collection. Technical reports from UC Berkeley Computer Science Division were part of NCSTRL from the onset, and were managed and served from a machine in the Computer Science Department. Early in 1996, The UCB Library was looking for important collections which would lend themselves to digitization and electronic delivery, and for which high demand by users would test viability. At the same time, the UCB Computer Science Department recognized that the maintenance and service of these reports was of lesser interest to them than had been their participation in the ground breaking beginnings of distributed collections of computer science technical reports (Anderson et al. 1996). The lower priority given NCSTRL support over time had resulted in instability in presentation and subsequent frustration by users. After preliminary discussions between Library and CSD staff, a working partnership was forged wherein the Library would move the existing computer science technical report archive over to a Library server, and take over management of the archive and the addition and distribution of new reports. While the Library had long carried on the traditional services of archiving, cataloging, and circulating paper copies of these reports, they had not interacted with the Computer Science Division concerning them other than to receive their donated copies. The collaboration born from this project is direct and ongoing, between the Library and the Department and occasionally involving individual authors of the technical reports. Both parties in the collaboration had clear objectives at the outset. Through this project, the Library wanted : 1) to take over electronic archiving and distribution of UC Berkeley computer science technical reports and measure staff and other costs in doing so, and 2) to gather factual data to assess whether the same system could be expanded to include research reports from other campus disciplines. The Computer Science Department saw an opportunity to continue high visibility for their reports without the responsibility for maintenance and ongoing support, which for them had become routine and no longer in line with their research goals. Costs The Library estimated its major costs to be in programmer and librarian time during the first year of the project; 9 gigabytes disk space to hold the initial archive and additional disk space as new reports are added; and staff costs to maintain the service into the future. Costs to the CSD department were expected to be limited to staff time to register each report, liaison with library staff over problems, semi-annual publication of printed price lists, and a continuation of the departmental sale of paper copy. In Fall 1996, the Library loaded the Dienst software onto our local system (http://sunsite.berkeley.edu/), and spent several months testing and adapting this to our local environment. This included incorporation of a major Dienst upgrade which occurred simultaneously. This adaptation consumed about 30% of a Programmer Analyst's time for approximately three months. While the programming was being done, we edited the bibliographic records of the existing files (approximately 800+ reports), which were in varying states of legibility and accuracy. In December 1996, the entire file of Berkeley reports was transferred to the library Sunsite, where it now resides, and the Library became the active server of this resource. Estimates for the developmental phase of the project proved to be accurate, but the library staff time necessary for ongoing maintenance is slightly higher than anticipated. Files created with certain versions of software work better than others; for example, files created in Framemaker 4.0 load easily, while files created in Framemaker 5.0 often needed tweaking. This and other problems with the Postscript format of the reports as they are received into the system result in more time than anticipated to load certain new reports. Submission of new reports is a semi-automated process. The author requests a unique technical report number in the form CSD-xx-xxx from the publications assistant in the Computer Science Division. The NCSTRL librarian receives an electronic notification of that assignment, along with the title, author name(s), and abstract of the corresponding report. The author then delivers a Postscript file to the Library server via FTP, and notifies the NCSTRL librarian by e-mail that it has been sent. The NCSTRL librarian completes an HTML template, and electronically submits that to the Library server. Using a series of three UNIX commands, the NCSTRL librarian loads and indexes the new report and its bibliographic file into NCSTRL. The master index is rebuilt several times a week at Cornell. When all systems work as planned, this submission process is routine and rapid, taking from 3-10 minutes, depending on the length of the report. When any detail of the process fails, trouble-shooting is required, which can include e-mail correspondence between the NCSTRL librarian and the author, additional formatting and FTPing of the file, and multiple attempts to reload the file . On occasion, specific assistance is requested and supplied from the Dienst masters at Cornell. Staffing The library team consists of a Project Manager who is a librarian in the Engineering Library and a Programmer Analyst with UNIX expertise from the Library Systems Office. We anticipated that once the project was in production mode, the mechanics of loading files could be performed by a library assistant rather than a Librarian, with backup support from the Library Systems Office technical staff. While there is still logic in that plan, suitable staffing has not been identified so the Project Manager continues to load new reports. An administrative assistant within the Computer Science Division handles the permissions, funding allocations for research, paper copy production and distribution, and is the liaison to the faculty and graduate student authors. The Library's project manager works closely with her in a productive and collegial link between CS Division and the Library. Evaluation Several enhancements were realized immediately by the transition of these reports to the Library: service became stable (the server at the Computer Science Division was old, and the responsibility for management of the collection moved unpredictably from one clerk to another); access and subject questions are readily addressed to patrons around the world via electronic mail, and reference help is provided as needed. By editing the archive of bibliographic records, keyword searching became more reliable. Format options now include TIFF, GIF, Postscript (when available), thumbnails and PDF format for all reports from 1995 to the present. Many of the older documents were scanned, and are available in OCR (optical character recognition) format only. Instructions to users indicate they should be able to download these older scanned documents for printing by a Postscript printer. Depending upon the printer capability of the user, this is sometimes impossible. There is an online up-to-date price list. Prior to electronic availability, users could obtain these reports either by purchase from the Computer Science Division, or through interlibrary or direct borrowing of the paper copy held in the UC Berkeley Engineering Library. The UC Berkeley catalog contains catalog records for these reports as a series, and individually after 1992. These catalog records now include a link to NCSTRL: Multipol : a distributed data structure library / Soumen Chakrabarti ... [et al.]. Berkeley, Calif. : Computer Science Division (EECS), University of California, [1995] Chakrabarti, Soumen. Report (University of California, Berkeley. Computer Science Division) ; no. UCB/CSD 95/879. Electronic location: http://sunsite.berkeley.edu/NCSTRL/ Engineering TK7885.A1.R46 no.95: Users around the world look at about 4500 Berkeley computer science technical reports per month, or 150 per day. This is dramatically higher than the print collection's use. NCSTRL is currently one of the UC Berkeley Sunsite's most heavily used specialty online services. About 40% of these accesses are from outside of the United States. Users of the system readily send queries when they experience difficulty accessing reports or have related technical report questions. Questions are answered within 24 hours in most cases and always within 72 hours by either the librarian or the technical expert. In addition to taking over the technical delivery of this collection, the Library added the value of its traditional care and support in the form of answers to reference queries and trouble-shooting issues of access, thus providing richer and broader access to the information. Technical queries related to downloading or viewing are answered by our UNIX expert; other queries are answered by the engineering librarian on the project as would any other questions related to the collections. Reports in the NCSTRL collection, though electronic and housed all over the world, are an integral part of the Engineering Library's collection. Continuing Issues When the Berkeley Library took over as archivist, we committed to the clean-up and enhancement of reports from 1995 onward, and for all future reports. Many of the pre-1995 reports were scanned using optical character recognition software, and contain errors and omissions which limit their usefulness to the public. We realized going into the project that we didn't have the resources to retrospectively re-load faulty reports, or seek out new versions of them. We do, however, try to augment access on a report-by-report basis when a patron contacts us with a problem. We consider a complete and stable continuation of the collection as a large part of the value-added by the Library, thus we are willing to spend quite a bit of time to assure continuity in our collection as it grows with additional reports. Summary The Library had two goals at the outset of this project: 1) to take over electronic archiving and distribution of UC Berkeley Computer Science Technical Reports; and 2) to assess whether the same system could be expanded to include research reports from other campus disciplines. The first goal has been met successfully. Progress towards the second goal is less definitive. The uniqueness of NCSTRL as a digital archive is in the access it provides to distributed related collections of material in computer science through the Dienst system architecture. Dienst software could theoretically be adapted for other campus collections, but in the absence of a need for unified access to widely distributed but related collections, it would be more complex than necessary. Cornell supports the Dienst software and has provided vital technical support to us as we have taken over this service for Berkeley. If we imagined creating another distributed collection, and using Dienst software as the architecture, we would not be able to rely on Cornell for such support and we would have to support the project at the level of actual software support and maintenance ourselves. There is no question but that electronic availability increases the audience for and use of these locally generated technical papers. The Berkeley experience with the NCSTRL project has shown us what level and form of support is necessary to begin and sustain a service of this kind. Costs per use are low due to the high levels of use of reports in this particular series. Yet the staff time necessary to attend to electronic submission when they are occasionally problematic can be high. As an individual project, Berkeley's experience as server of the Computer Science technical reports represents an additional work load. As projects such as this become ubiquitous, and more of campus collections become electronically available and supported electronically, it may well represent a change in work flow and resource allocation for many layers of workers with the library system. Staff costs per access will diminish as the electronic collections scale up. For larger portions of campus collections to become electronic, and to become more integrated into campus holdings, new and creative distribution of tasks will be necessary. A different mix of high level, lower level, and routine tasks may be the result. While maintenance duties for Berkeley's NCSTRL reports are for the most part routine, they need to be undertaken by staff who are willing to learn new skills, and to work collegially with technical experts as needed. New partnerships were formed in this project, and directions were identified which will define new partnerships within the library workplace in support of electronic delivery of information. References Anderson, Greg, Lasher, Rebecca, and Reich, Vicky. 1996. The Computer Science Technical Report (CS-TR) Project: A Pioneering Digital Library Project Viewed from a Library Perspective. The Public-Access Computer Systems Review 7(2). [Online]. Available: {http://epress.lib.uh.edu/pr/v7/n2/ande7n2.html} [March 31, 1998] Davis, James R., Lagoze, Carl and Kraft, Dean B. 1996. Dienst: Building a Production Technical Report Server in Digital Libraries. Research and Technology Advances (ADL 95 Forum). 259-71. Davis, James R and Lagoze, Carl. 1994. A protocol and server for a distributed digital technical report library. TR94-1418, June 24, 1994. [Online]. Available: {http://dl.acm.org/citation.cfm?id=866695} [March 31, 1998] Dienst Documentation 1995. [Online]. Available: {http://www.ncstrl.org/Dienst/htdocs/Info/protocol4.html} [March 31, 1998] Fox, Edward A. 1995. World-Wide Web and Computer Science Reports. Communications of the ACM 38(4):43. French, James C. et al. 1995. Wide Area Technical Report Service: Technical Reports Online. Communications of the ACM 38(4):47. French, James C. And Charles L. Viles. 1996. Ensuring Retrieval Effectiveness in Distributed Digital Libraries. Journal of Visual Communication and Image Representation 7(1):61-73. Lagoze, Carl and Davis, James R. 1995. Dienst: An Architecture for Distributed Document Libraries. Communications of the ACM 38(4):47. NCSTRL Documentation [Online]. Available: {http://www.ncstrl.org/Dienst/htdocs/Info/about-ncstrl.html} [March 31, 1998]