College and Research Libraries By DAVID C. WEBER A Quagmire of Scientific Literature? EVER SINCE JULY 1945, when Vannevar Bush described the quandary of sci- entists who are swamped by the litera- ture of their field, men working in pure science or technology have been worry- ing about bibliographical control over the flood of their publications which threatens to interrupt their own re- ~earch.1 .John E. Burchard, writing four years after Bush, thought that the sheer bulk of published writings and the diffi- culties of quick and explicit accessibility were causing a literary "Waterloo of Sci- ence."2 In 1953, Maierson and Howell stated that "for a number of years it has been apparent that conventional meth- ods of indexing and classifying technical literature can no longer cope with the ever increasing flood. It is frequently more economical to repeat work of the past than to search the technical litera- ture for the desired item .... "3 And Mitchell clearly argued that "the tremen- dous increase in the volume of technical literature of all Kinds and fields is pre- senting the librarian with an almost im- possible reference task. The sheer volume of these documents is creating a filing problem of the first magnitude. When this volume is combined with the fact that many documents cut across classifi- cation lines, the problem of providing reference bibliographies is made that much more difficult." 4 Librarians as a 1 Vannevar Bush, "As We May Think," Atlantic Monthly, CLXXVI (1945), 101-08. 2 John E. Burchard, "The Waterloo of Science," Re'l/ue de Ia Documentation. XVI (1949), 94-97. 3 Alvin T. Maierson and W. W. Howell, "Application of Standard Business Machine Punched-card Equio- ment to Metallurgical Literature References," Ameri- can Documentation, IV (1953), 3. 4 Herbert F. Mitchell, Jr., "The Use of the UNIVAC Fac-tronic System in the Library Reference Field ," American Documentation, IV (1953), 16. Mr. Weber is assistant to the director) Harvard University · Library. group have been slow to realize that sci- entists are truly worried about their lit- erature situation. An analysis of this literature problem shows that in the last fifteen years the scientist has . become a publisher in simi- lar quantity to the humanists and social scientists of the last several centuries; and, in the field of science, the unit need- ing classification and housing and re- trieval "has changed from macroscopic masses embodied in books to microscopic units embodied in articles ."5 A compari- son of publishing method in different dis- ciplines may reveal the cause of the scien- tists' dilemma. In the humanities and social sciences, publication is primarily divided between periodicals, which de- scribe . the results of new research, and monographs, which provide the more fully documented statements. For both of these, there is adequate listing and suit- able indexing. In science, on the other hand, the publishing scheme is a com- plex one made up of the technical report, the pre-print, the periodical, and finally the monograph. There is little control bibliographically over the technical re- port, none over the pre-print, and only delayed control over the periodical. How- ever, when the scientist is asked his in- formation-gathering habits, he replies as follows, in this approximate order: his direct sources are advanced publications, research periodicals, technical reports, and handbooks, and his indirect sources are conversations, regular perusal of peri- odicals, references cited in books and papers, abstracts and indexes.6 5 S. R. Ranganathan, Philo sophy of Library Classifi- cation (Copenhagen: Munksgaard, 1951), p. 13. 6 Saul Herner, in a paper entitled "The Information Gathering Habits of J ohns Hopkins Scientists" which was reported by Marjorie R. Hyslop in her "Documen- talists Consider Machine Tech niques," Special Librar- ies, XLIV (1953), 197-98. It should be evident, therefore, that unified bibliographical control over this variety of publishing forms is really the _ problem, and the difficulty is not caused by any form of informational freakish- ness which should force librarians or sci- entists to turn to machine storage in order to gain access to the material they need. It is all too often that the scientist or documentation expert starts his argu- ment with the thesis that scientific litera- ture is flooding the laboratories and pro- ceeds to argue for the development of the Memex, Ultrafax, Rapid Selector, Avakian's AMFIS, Minicard, and other complex and expensive devices for stor- age and retrieval of information. It can often be seen, through hind- sight, that a problem has not been tackled by a slight adjustment but by a wholly new process or device which many times proves less suitable than the old process in its greatest development. Bat- telle Memorial Institute reported one typical instance in a recent evaluation of techniques commonly used for literature collection and analysis. "It became quite evident during preliminary investiga- tions that the old-fashioned manual sys- tems had not previously been thoroughly evaluated and that these techniques, thought to be outdated, seemed not to have been fully exploiteq in the past. It was concluded that the time had come for re-evaluating manual systems or com- binations of manual-machine methods before proceeding exclusively to the eval- uation and development of machine sys- tems."7 The result at Battelle was a com- pletely manual system. The handling of difficult collections of materials, be they pamphlets, reprints, serials, documents, or monographs, has been the long-standing business of the li- brary profession. If the librarian in the disciplines of pure science and technol- ogy professes inability to handle these 7 Ro~ert W. Gibson, Jr., and Ben-Ami Lipetz, "New Look m Manual Methods ," Sp ecial Libraries XL VII (1956). 108. , materials and produce the information desired by the scientists, it may well be that the librarian's approach is wrong, that the library is understaffed, or that there is not enough money put into the bibliographical apparatus-an expense which is not so glamorous a way to spend money as would be some unorthodox ma- chine. To put it another way, if the in- dexing and abstracting services in science do not provide the information which is needed, the librarian should make every effort to do this listing and indexing for reports, pre-prints, or periodical articles, whenever needed by his clients, just as he now does for monographic materials. It is a simple problem, and the solutions are also simple, though they may be mod- erately expensive. Subject analysis of material-and its corollary, the location of material from a subject approach-is a separate and dis- tinct problem from that of author and title listing. The latter is but a temporal problem needing concerted attention. But the subject approach to one's library is ordinarily fragmentary indeed, as com- pared to the relative comprehensiveness for the author approach, so it .should ipso facto be part of a system of indexes designed to reveal what exists anywhere in print on the particular subject of con- cern. However, take the scientist who professes interest in the subject content of only his own library, perhaps because he can assume his library is all but com- prehensive within his interests. Even here, library methods of an orthodox type can do practically everything a ma- chine can, and can generally do it faster. Shaw has said that, "depending upon the type of search, it is even doubtful whether the fastest electronic machine that we can postulate will ever be able to search for a series of author entries as rapidly or as economically as ... can be done in a con- ventional card catalog." And he goes on to say that "when large files have to be maintained and when they have to be 104 COLLEGE AND RESEARCH LIBRARIES 1 searched · repeatedly for subject informa- tion, great reduction in space require- ments and in searching time and in copy- ing time may be achieved by mechaniza- tion."8 Even this qualified statement, by a person who is adept at machine applica- tion, suggests more for the machine than should be expected. The most important factor which is usually overlooked is that the machines contribute substantially only to the consumption end, not the pro- duction end; because human cataloging or encoding is the essential preliminary to any mechanized storage and consulta- tion. Vannevar Bush is at his most imagi- native when he outlines how machines might hurdle this biggest of problems: "When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard." 9 Note that the human being must "name" the subject before the machine can store and return it for use; machines cannot yet replace traditional library methods in this analysis. And even on the consump- tion end, Dr. Bush reminds us that "the prime action of use is selection, and here [machines] are halting indeed." 10 Let us turn to more minute concerns. Discussion as to the relative merits of card catalogs and storage machine fre- quently boils down to two capacities: high subject specificity and multiple sub- ject approach. Specificity refers to sub- ject access at the particular level rather than the general. It is one thing to put a book on female cat diseases under a subject heading MEDICINE. It is more spe- cific to put it under the heading MEDICINE -ANIMALS, or, even more specific, under MEDICINE-CATS-FEMALE. Although li- brarians have always aimed at placing a book under its most specific heading, it has been understood that this would never be taken to extremes. On the other 8 Ralph Shaw, "Mechanical and' Electronic Aids for Bibliography," Library Trends, II (1954), 530-31. Italics mine. o Bush, op. cit., 107. 10 Ibid., 105. MARCH} 1957 hand, scientists want headings that reg- ularly place the information under the most specific heading possible. Taxonom- ic classification, based on family relation- ships, would theoretically satisfy every- one; but neither for machines nor for a classed catalog has a universally accept- able taxonomic classification for the en- tire range of know ledge been developed. Under any condition, therefore, the card , catalog can do as well as the machine on specificity. As for multiple subject approach, class- ification of books on the shelf provides single access, and this does not suffice for adequate subject approach in the sci- ences, nor even in the humanities and social sicences. However, card catalogs, and particularly classed card catalogs, can satisfy this need. A book that is listed in the catalog under the headings CATS and VETERINARY MEDICINE and ANIMAL DISEASE will be given three approaches. Here again, the card catalog is theoret- ically as versatile as the machine. To see where machines run into their basic trouble, one has only to consider the mathematical structure of language. Language, as analyzed by symbolic logic, presents extreme complications to the coding process and the subsequent re- trieval; for every language has built-in entropy (electronic's "noise"), in phonet- ics, semantics, inflection, and syntactical construction. However, definition in terms of probabilities goes far to point out a solution, even allowing full weight to redundancy (whether it is the "K" of key and the "K" ·of cool} or "page" as a messenger or leaf of a book) ; but it is still only a theory, which will not come to practical application for many years. In his discussion of machine translation which involves coding followed by de- coding, Whatmough explains this small but as yet u~surmounted barrier: A human translator has the necessary cir- culatory pathways established already as pat- terns of neural activity by virtue of being 105 bilingual. It appears likely, simply in terms of regional examination of the human liv- ing brain and its functioning that speech and "thought" are very much connectible. Lan- guage to a tremendous extent is a matter of habit-if it were not, communication would be impossible; .but the areas of association on the basis of which most of our linguistic and non-linguistic behavior is to be accounted for, the socio-personal areas, are so closely linked, that cerebration, if done symbolically, with both the outside universe and inner "ex- perience" as a unified frame of reference, is done with linguistic symbolism, or at least within a system of operations based on lin- guistic symbolism.ll Machines imitate the human brain which is based on the neuron's binary action and which handles morphemes (words or independently significant parts) rather than phonemes (parts of words which are minimum speech sounds). But, and this is the crux of the matter, the machine must now be provided with a statistical distribution law for the rel- ative frequenc y of occurrence of the units and constructions of language, the "cir- culatory pathways" using "linguistic sym- bolism," in order for it to be an informa- tion system independent of restrictions of subject matter, size of vocabulary, hu- man pre-editing or post-editing, and the amount of text. Such a law is not yet within sight. Taube and his associates found that a "dictionary of associations" would be necessary to solve many of the semantic problems still faced by their sys- tem ·of coordinate indexing. 12 And, most recently, Perry and his associates have spent years working on machine litera- ture searching before finding that the coding system for machines would have to use symbolism for "semantic factors" and "analytic relationships" and that a "code dictionary" would have to be con- 11 Joshua Whatmou gh, Lang·uage, a Modern Synthesis (New York: St. Martin's Press, 1956), 2 13· 14. 12 See page 7 and passim in Mortimer Taube and Associates , "Stora ge and Retrieval of Information by Means of' the Association of Ideas, " American Docu- mentation, VI (19 55). structed so as to deal with language prob- lems.13 The conclusion to be drawn is that the use of machines for storage and retrieval of information is likely to be practicable only through a man-machine partnership, and is not going to be commonly feasible for many years to come. If financial costs can be left out of the question, and if specificity and multiple approach are not critical determinants, under what con- ditions may storage machines be superior to the card catalog? It is here contended that the machine will be the better choice only when all of the following conditions prevail: I. A single subject is being covered. 2. There is a high concentration of publications in this subject area. 3. There is a continuing high intake rate of publications. 4. Adequate subject access is unavail- able in published form. 5. Use is made by people having sev- eral different approaches or uses in mind. 6. There is high urgency in the loca- tion of every pertinent publication. In such a case, there is a .probability that some unorthodox method of storing and retrieving information may be re- quired. (The Uniterm system of coord- inate indexing seems suitable only when the above conditions apply and when the collection indexed is not to reach 100,000 items.) Shera says that the use of ma- chines "seems likely to' be limited to the more complex problems of bibilograph- ical searching, and therefore, they may not be applicable to the entire range of bibliothecal operations." 14 It is nevertheless unquestioned that li- braries in science and technology must improve in order to cope with the growth of their diverse literature. Comprehen- ( Continu ed on page 118) 1 3 Jam es W . Perry, Allen Kent and Madeline M. Berry, Machine L iteratttre S earching (New York: Inte: rscience Publishers, 1956), p. 84. 14 Je sse H. Shera, "Effect of Machine Methods on the Organization of Knowledge," America.n Documen-- tation, III (195 2 ) , 16. 106 COLLEGE AND RESEARCH LIBRARIES VI. INQUIRIES Search circulation file for book card. Same as A If no card is found, book is in the Same as A library. If card is found, book is out. Same as A Search circulation file for charge card. Same as A. When card is found, check trans- action number on check list. If the transaction number is checked off on the list, book has been returned, and the charge card should be dis- carded. I.f the number is not checked off, book is still out. VII. BooK CoLLECTIONS CHECKED OuT AND RETURNED IN ONE PARCEL (Reserve room, departments, binding, class room, etc.) Books must be discharged indi- Same as A vidually. By use of specially coded cards, the sorting key can be used to discharge the whole collection at once. This method can be used also for taking inventory of books loaned to a spe- cial collection. A Quagmire of Scientific Literature? (Continued from page 106) sive current subject bibliographies are a primary need. Tauber has stated that "it is almost certain that more selective sub- ject catalogs and more extensively used subject bibliographies will characterize subject analysis in the immediate fu- ture."15 A secondary need is for compre- hensive indexing of serial publications, where the situation is distinctly unsatis- factory. Librarians have been ineffectual in eliminating wasteful overlapping of services and in obtaining inclusive index- ing; this is a critical situation into which 15 Maurice F. Tauber and Associates, Technical Serv- ices in Libraries (New York: Columbia University Press, 1954), p. 175. 16 On the indexing situation, see Verner W. Clapp and Rathrine 0. Murra, "The Improvement of Biblio- graphic Organization," LibrMy Qttarterly, XXV (1955), 107. must be put much more effort. 16 It is log- ical to expect that a great increase in ex- tremely brief subject entries, arranged in chronological order, will characterize the future subject indexes to scientific mate- rials-with the older material being in- dexed merely by an author file, and with subject cards thrown out after a period of time. It can be said with complete assurance that scientific libraries have somewhat different problems from libraries in other disciplines, that they are still far from having satisfactory bibliographical con- trol over scientific literature, and that existing library methods if fully exploit- ed can bring firm ground out of the quag- mire that now seems to be threatening. 118 COLLEGE AND RESEARCH LIBRARIES I