College and Research Libraries RICHARD DE GENNARO A Strategy for the Conversion Of Research Library Catalog To Machine Readable Form This paper describes in very general terms a strategy for converting the retrospective catalogs of the nation's research libraries into machine readable form. The method envisages a class-by-class conversion and printing out in main entry order of the shelflist of the Library of Con- gress. The larger libraries would compare their shelflists against these lists adding their location symbols and unique titles to the master ma- chine record and pulling from the master record machine readable catalog copy for their own holdings in each class. The resulting aug- mented LC master record would become a kind of national union cata- log in machine readable form. uNTIL A FEW YEARS AGO librarians were rather skeptical about the technical and economic feasibility of converting the massive catalogs of multi-million volume research libraries into machine readable form. The view was generally held that while current input into these catalogs could be computerized the problem · of converting the retrospective file into ina- chine readable form was so enormous that future technological advances would have to be awaited before it could be undertaken. The science and medical li- brarians, citing the rapid obsolescence of r their literature, concentrated their ef- forts where the most immediate payoff was available-in computerizing the rec- ord for current acquisitions. While li- brarians of humanistic collections could not completely turn their backs on the bibliographical heritage of the past, many of them were prepared to settle either for maintaining the retrospective catalog in its traditional format or for Mr. De Gennaro is Associate University Librarian for Systems Development, H ar- vard University. reproducing it in book form by offset photography. Thus, the computer would give us a powerful handle on current acquisitions but could not relate them to the total record. These views are be- ginning to change. Many librarians are now becoming less pessimistic about the technical feasi- bility of converting mass catalogs. Prac- tical experience in conversion has been acquired, photocomposition devices and print chains with upper- and lower-case and diacritical marks are available, key- boarding equipment has been improved, and new online keyboarding devices and techniques are being introduced. The ex- tremely high cost of converting mass catalogs still remains a chief obstacle, but even here the picture is beginning to change and there is reason for op- timism. With the federal government's growing interest in research libraries it seems reasonable to hope that funds may eventually be made available to convert to machineable form certain library cata- logs or bibliographical records of nation- al importance. Since the National Union I 253 254 I College & Research Libraries • July, 1967 Catalog is the largest and most compre- hensive and therefore potentially the most useful record available, attention has been focused on it as the most likely candidate for conversion. One study has already been made of the feasibility of such a project and the techniques by which it might be accomplished, and a committee of the Association of Research Libraries is presently exploring the prob- lem. While there are many advantages to starting with the NU C there are also some serious disadvantages. It is an alphabetical file of fifteen million cards, all of which would have to be converted before much real use could be made of it, since a portion of an alphabet is of limited utility. The conversion of fifteen million entries complete with notes and added entries is a formidable undertak- ing and would require several years and a considerable investment of editorial effort, which might spell the death of the project if allowed to get out of control. The end product, in spite of its tremen- dous usefulness, would still be incom- plete and inaccurate by the standards that are used to judge the catalogs of large research libraries. Advances in computer and communication technolo- gy will tend to make these standards even less acceptable in the future than they are now. The purpose of this brief paper is to suggest as a possible alternative a meth- od of converting the retrospective cata- logs of the nation's research libraries and eventually creating a national union catalog in machine readable form as a byproduct of that effort. The strategy would be to avoid a frontal assault on a multi-million card dictionary catalog and a straight A-to-Z conversion, and to di- vide this massive single conversion proj- ect into a series of smaller and more manageable projects, each of which would utilize and build on the experi- ence gained in the previous ones, gen- erating useful outputs as the effort pro- gresses. A similar approach is being used with considerable success in the Wide- 1 ner library shelflist conversion project at Harvard.1 416: The starting point for this conversion effort would be the shelflist of the Li- brary of Congress, a bibliographical rec- ord that is relatively accurate and up to date. Since it is a unit-card shelflist, each entry is complete with notes, sub- ject, and added entries, and once con- verted to machine form would serve as the basic record from which all other secondary records could be generated by computer. What is being suggested here is that the LC shelflist might be convert- ed class-by-class to form the basis for constructing a master machineable bibli- ographical record in LC classification order and alphabetically in main entry order within each class. Other libraries could compare their shelflists against these basic LC lists, adding their own location symbols and unique titles to the master file and pulling from it machine- able catalog copy for their own holdings in each class. The resulting augmented LC master record would eventually be- come an accurate and serviceable na- tional union catalog in machine readable form. The problem is to develop strate- gies and techniques to facilitate not only the conversion of the basic LC file, but also for comparing and adding the new titles and locations for the titles held by each succeeding library as it enters the system and for enabling a library to extract catalog entries for its own hold- ings from the record. If we can assume that a MARC-type standardized format for inputting bib- liographical data into a system will have been developed and adopted within the next few years, then one could envisage a project being refunded to re-create LC' s catalog in machine readable form using a class-by-class shelflist approach. Initially, a subdivision of a science class 1 Richard De Gennaro, "A Computer Produced Shelflist,'' CRL, XXVI (July 1965), 311-15, 353. Conversion of Research Library Catalogs to Machine I 255 such as physics or geology, and a part of a history or literature class might be selected as pilot projects to test assump- tions and develop techniques. For the sake of discussion, however, let us sup- pose that LC started its conversion with theE-For American history class. Upon completion of the conversion of the en- l tire class or a logical segment of it such as U.S. history, a printout would be pro- 1 duced listing the entries alphabetically by main entry. The American or U.S. his- tory holdings of another research library, that of a university for example, could then be compared with this list. One possible way of doing this would be to search the entries of the university li- brary's American history shelflist against this alphabetical main entry printout. Each time a match was encountered, the local call number would be noted on the main entry printout. At the end of this comparison, the local library would have an annotated printout accounting for a large proportion of the titles in its collection. It could then pull those entries held in common with LC from the master tape by simply keyboarding the LC card number (or a special ma- chine-assigned identification number) to- gether with its own call number and other local information, and having the computer create a new local tape com- bining the LC entries with the local ones. The entries present in the university library's shelflist that were not present in the LC list could be duplicated by photography and converted, using the standard input format that had been used for the LC list. This could be done at the university library, but it might be preferable to send them to a central fa- cility for further searching and conver- sion and for entry into both the master LC file and the university library file. These entries would also have to be as- signed LC class numbers. The university library would then have in its tape file the bibliographical information it needs to re-create its shelflist and catalog and to produce other listings either in hard copy or machine form. The central mas- ter file would now be augmented to cer- tain titles in the local library that were not held by LC along with locations for all the titles held by the local library. Several problems remain, such as recon- structing the syndetic apparatus or the complex of cross references in the cata- log, and accounting for the titles in American history held by the university library but classified elsewhere for local reasons such as in reference or rare books collections, etc. The latter prob- lem would be the responsibility of the local library while the former one would have to be dealt with by the central authority. The same techniques could be applied to each successive segment of the LC shelflist as the conversion effort pro- gressed. As classes were completed the computer could sort them into a single main entry list and eventually re-create a version of the dictionary catalog. After the contents of several major collections had been compared with and added to the augmented master LC file, the com- parison and conversion effort of each additional library would be made in- creasingly easier because the number of titles not found in the master file would be decreasing. The comparison proce- dure would be easiest for those libraries which are classified according to the LC system because there would be a rela- tively close correlation of scope in the two shelflists. For this reason it might be better if the pilot comparison effort took place in such libraries rather than in those which do not use LC. This problem of scope of shelflists could well be one of the most serious objections to the strategy being suggest- ed. Many of the older libraries with rich collections, such as the New York public library, Harvard, Yale, etc., have classi- fication systems which may be difficult to correlate effectively with LC' s classes. 256 I College & Research Libraries • July, 1967 ventory of the holdings of the nation's j major research libraries. The method suggested looks toward building this .. record in a gradual, orderly, and eco- nomical manner. Each bibliographical record would be in a standardized for- mat, and the master £le would be the basic record which would be put into mass random-access storage for online long-distance consultation when these techniques become economically feasible This difficulty might not be as serious as it may seem at first glance if one bears in mind that the comparison or searching is done in a printout of a class of the LC shelflist that has been sorted by com- puter into main entry alphabetical order rather than the list in classified order. Thus the American literature class of a library with its own scheme would be searched against the equivalent part of the LC schedule arranged by main entry. Nevertheless, the problem remains and should not be minimized. On the other hand, the catalogs of these libraries, be- cause of their uniqueness, age, size, and complexity, are going to present serious problems of compatibility in any future national bibliographical system based on computers and sooner or later these problems will have to be tackled and solved. The techniques outlined for compar- ing, searching, annotating, .and adding to £les are here described in terms of to- day's familiar technology for the sake of clarity. In an actual project the whole process would presumably be consider- ably streamlined by the use of advanced -online computer technology with visual .display consoles, mass random access storage, and sophisticated means of com- munication. Thus, instead of actually producing · a computer printout of the segment of the LC shelflist to be used for comparison, it could be in random access storage and :accessible through a cathode-ray tube or visual display con- sole. The local card · shelflist entries would be searched in sequence by call- ing for the appropriate part of the alpha- bet on the console display unit. Each time a match was encountered a symbol would be .added to the machine record together with the local call number and any other necessary local information. This would greatly facilitate the entire process and reduce keyboarding to a minimum. The ultimate goal of the effort is to create in machine readable form .an in- in the future. The £le would serve as a data bank from which extracts of various types and for various uses could be drawn. While it is theoretically possible to produce the entire contents of this £le periodically in printed form, this would be extremely expensive and prob- able unnecessary. It might be far more useful to produce a large variety of shorter and more specialized lists based on class, subject, language, date of pub- lication, etc. Some of the principal advantages of this conversion strategy are summarized below. 1. The master record is based on a rela- tively accurate and solid foundation, i.e., the current inventory records of LC and the participating libraries- ' their shelflists. 2. It is a gradual process which can be changed, developed, and improved with experience. It is flexible, unlike the single frontal assault required for an A-to-Z conversion of fifteen million entries. 3. It would not only give LC a tremen- dous impetus in its total systems effort but would also make possible a paral- lel development for the entire re- search library community by remov- ing the chief bottleneck -conversion of the retrospective file. 4. The cost and effort of keyboarding a bibliographical entry would only oc- cur once and in a favorable environ- ment. 5. The funding of this single but seg- Conversion of Research Library Catalogs to Machine I 257 mented effort might be facilitated be- cause the subject approach would create interest and enthusiasm among the various segments of the research community including user groups as well as funding agencies. The E-F classes would interest historians, sci- entists would be eager to see the Q class done, etc. 6. The strategy and techniques could be inexpensively and meaningfully tested and costed in one or more pilot proj- ects, such as the conversion of the Physics or Geology subdivision of the ·Science class, and a segment of a history or literature class. A decision to proceed with, modify, or abandon the strategy could be made on the basis of the experience and informa- tion generated in these pilot efforts. 7. There is no reason why, after suitable pilot projects, several classes could not be converted simultaneously. The work could be geographically decentralized by duplicating portions of the LC shelflist by photography and having the conversion work done outside of Washington, where space and personnel might be more readily available. 8. U sefullists of all kinds, such as shelf- lists, classed catalogs, subject bibliog- raphies, chronological, alphabetical, and language listings, etc., could be created as each portion of the list is completed. There is no need to wait until the entire Library of Congress shelflist has been converted and aug- mented to obtain products of this kind. 9. Eventually the complex of cross refer- ences that tie a catalog together could be reproduced and all classes merged by computer into a single dictionary catalog in machine form. The conversion of the present NU C or the re-creation of it in a new form is obviously an extremely complex and costly undertaking and one which has tremendous implications for the future development of libraries. This brief pa- per is not meant to give pat answers as to how it should be done nor does it pretend to be a detailed and carefully constructed master plan. The most that can be said for it is that it offers an idea for a strategy which may be worth con- sidering along with others that are being discussed. Whatever the strategy, the job of con- verting the massive retrospective record can and should be done, but it need be done only once in a standard format pro- viding for full access. These millions of bibliographical entries were keyboarded several times before they came to rest as printed LC cards, and it is not unreason- able to suggest that they be keyboarded once more in machineable form to put the nation's research libraries firmly into the computer age. • •