College and Research Libraries MARTIN DILLON The Impact of Automation On the Content of Libraries And lnforination Centers Information needs are growing more rapidly than the abilities of re- search libraries and information centers to meet them. Two reasons and their influence on information systems are discussed: a shift in scientific endeavor from basic science to applied, leading to the emer- gence of programmatic research; and the technology of science itself. INTRODUCTION IT HAS BEEN GENERALLy AGREED IN RE- CENT YEARS that information needs are growing more rapidly than the abilities of research libraries and information centers to · meet them. Most often, the reason given for this phenomenon has been the growth in the amount of pub- . lished literature. Two additional rea- sons, which are little noted though per- haps more important, will be discussed here, along with their consequences for information centers: a shift in scientific endeavor from basic science to applied, leading to the emergence of program- matic research; and the technology of science itself. These sources of difficulty will be discussed briefly before consid- ering their influence on information sys- tems. BASIC TO APPLIED To PRoGRAMMATic REsEARCH Scientific research is traditionally di- vided into basic and applied, where the Mr. Dillon is in the Sclwol of Library Science and the Department of Computer Science at the University of North Caro- lina, Chapel Hill. 418 I former is described as an activity direct- ed toward "a fuller knowledge or un- derstanding of the subject under study, rather than a practical application thereof," and the latter is "directed to- ward practical application of knowl- edge."1 No great insight is necessary to see that stock in knowledge for its own sake has taken a tumble in recent years. 2 A more business-oriented federal gov- ernment, a greater consciousness of the ill effects of socially undisciplined re- search (the Vietnam war), and a height- ened awareness of short-term social needs compared with long -term benefits from basic research-all of these con- tribute to the disenchantment with basic research and a consequent shift in em- phasis toward practical application in the ways we use knowledge and gener- ate new knowledge. In particular, a new form of applied science is emerging: programmatic re- search, the marshalling of technology and men to the achievement of some change in the world. As it is used here a program specifies a sequence of actions organized and directed toward solving a specific problem, or system of related problems in our physical environment, as contrasted with efforts which attempt to discover, describe, or add to knowl- edge. Urban redevelopment, environ- ~ mental protection, population control- the pursuit of goals such as these cir- cumscribe action-oriented disciplines with relatively specific, even short-lived goals. As organizations of technological and scientific effort, these efforts contrast with knowledge-oriented academic dis- ciplines where goals are diffuse. Since it is possible that these new dis- ciplines are transitory, that as the prob- lems which prodded them into existence disappear, they will also, it may be asked why we should consider developing ap- ~ proaches and techniques to cope with their special needs. There are two rea- sons. Such problem-focused disciplines are important enough on our current horizon, both in size and import, to elic- it special attention from information scientists. Moreover, even if today's dis- tinct forms disappear, the generic activ- ity is likely to persist: other special forms will arise to take their place, or- ganized around other large problems but with similar information needs. How do the information needs of such activities differ from normal sci- ence? This question is hard to answer, and a complete answer will not be at- tempted here. On the one hand, the in- formation needs-both in sheer bulk and in management tools-are usually greater: normal science can often ad- vance fragmentarily, through the efforts of individual scientists and without in- tegrative paradigms. Spaceships or anti- ballistic missile systems cannot, general- ly, be built that way. Usually the new sciences are conducted by large multidis- ciplinary teams organized in an explicit way. There are _ delineated lines of au- thority and responsibility, with subgoals and divisions of labor clearly specified. Increased organization increases the bur- den of communication. The increase and demand for communication often lead to informal channels, and an over- all drop in system efficiency. So far, lit- Impact of Automation I 419 tie is understood about how best to or- ganize information for such efforts. THE TECHNOLOGY OF SciENCE A second source of trouble for li- braries and information centers, con- tributing to the gap mentioned above, and one which in the long run may be more significant, comes from the prod- uct of science: technological develop- ment. The advent of the computer and the subsequent growth in its use is rap- idly redesigning most human efforts, and science is no exception. The degree of complexity which can be meaningfully managed either for practical ends or in basic science has grown enormously in the last decades. One can appreciate the extent of the change by considering application areas where the techniques of operations re- search are suitable tools of analysis. Changes of degree-in terms of the size problem which can be meaningfully tackled, and amounts of data which can be represented or analyzed-are so great as to produce changes in kind. Linear programming models can employ thou- sands of constraints to model problems where fifty were excessive before com- puters were available to solve the result- ing system of equations. 3 Simulation models accounting for thousands of re- lationships likewise are beginning to be commonplace. 4 In statistical modeling, to mention only one change, factor analysis of hundreds of variables is pos- sible where twenty was arduous labor for the analyst using a hand calculator. In similar ways, the computer is revo- lutionizing the technology of science, necessarily leading to changes in the conduct and conceptual structure of sci- ence. The paramount change to date has been in the role of data: the amount which can be easily manipulated has in- creased by orders of magnitude. Though often altering little more than the bookkeeping, this radically en- larged empirical basis will no doubt 420 I College & Research Libraries • November 1973 soon lead to qualitative changes beyond bookkeeping and affect the very nature of science. Each of the above points has serious implications for the organization and operation of libraries and information centers. The discussion which follows attempts to highlight the more serious, organized around three central artifacts of science: ( 1 ) documentation: the de- tailed public exposition of research re- sults, whether basic or applied; ( 2) theories and models: the construction of formal representations of research re- sults which synthesize, integrate, or ex- plain them; ( 3) data: the organized groups of symbols or numbers which are the results of scientific observations of the world and serve as the empirical roots of theory. Though it will not be argued here, it is likely that all these artifacts are poor- ly understood-both in what they con- tain and in how they do or should func- tion. Automatic information retrieval systems have had very limited success in explicating documents and their use; philosophies of science have made little progress since Newton; and both macro and micro physics have their trouble with data (in giving exact meaning to very small or very large measurements ) . These fundamental difficulties at the root of scientific activity are a major source of uncertainty in designing in- formation systems which function as faithful adjuncts to scientific research. The implicit emphasis here is not on sci- ence itself, but on systematic program planning where the same uncertainties are ameliorated somewhat by explicit program goals. The systems design can be evaluated for such programs in ways unavailable in basic science. DocuMENTATION AS SYSTEMS DESCRIPTION: THE ORGANIZATION GAP In research undertaken to support some facet of program planning, the achievement of program goals is para- mount and serves as a focus to the re- search. Such an obvious observation seems barely worth making were it not that the organization of its documenta- tion rarely clarifies the role of such re- search in the overall planning effort. Ac- cess to research documentation is usual- ly determined by channels .and methods within the discipline in which the re- search was carried out, not through pro- gram-determined organizations of knowledge. Very rarely are attempts made to show relationships among inter- disciplinary materials or to model docu- ment collections on program needs. Some examples will help here. In the field of population activities, a fairly single-minded short-term goal can be cited: control of population. Related research and its documentation is grow- ing at a remarkable rate, but as yet no substantial subject indexing system is available, much less one specifically or- ganized to aid in the development of population programs. Moreover, the re- search effort itself, which is presumably geared to aid specific aspects of program design or implementation, cannot by any reasonable method be connected to programming concepts. And the related research is being carried out all over the globe and by workers in many disci- plines. This research area, due to its easily expressed global goals, is more or- ganized than some which might be cited. Urban redevelopment has barely started on such efforts and as a consequence in- formation organization is solely along specialty divisions, or ad hoc construc- tions of locallibraries.5 Such defects in documentation con- trol are not to be blamed necessarily on the libraries. At least part of the respon- sibility lies with current problem-solv- ing techniques as they are embodied in science: they are fragmentary, rarely ex- plicit, .and probably incoherent. Even the health sciences, presumably guided by expressible goals, pursue them unsys- tematically, in an order and with an em- phasis determined by the puffs of perfid- ious politics. s As a consequence, organizations of knowledge specific to a discipline are in- ferred from existing literature; docu- mentation of research undertaken to support programming sporadically bor- rows from these exis~ing structures and attempts, in a makeshift manner at best, to relate them to program goals. For the multidisciplinary research teams which participate in program planning on a large scale, communication through a centralized document collection is as es- sential as the rarity of such systems in practice. The nature of the tasks cur- rently under attack (waging war, ex- ploring space, remaking cities) usually involves natural, social, and engineering scientists, where differences of view- point are almost cultural. A slight alleviation of the organiza- tion gap is available through automated systems which allow some degree of user organization through formulation of specific queries. Such systems in conjunc- tion with vocabularies developed in cog- nizance of these problems go some way toward facilitating use of interdiscipli- nary document collections. When two or more disciplines have a common ob- ject for analysis or similar goals, their vocabularies usually express this inter- section, though not systematically. Ex- actly how this occurs has not been inves- tigated as yet, but since the practition- ers of each discipline share a common world and a common natural language, it is easy to see why. Since the overlap is only rough, the individual scientist n1ust interpret the details according to his own light and in conjunction with his own goals. Though such techniques are still primitive at best, their princi- ples of operation shed some light on the defects of organization in science and its documentation. COMPUTER PROGRAMS AS THEORY A further problem, and one whose Impact of Automation I 421 implications and consequences have been even less well perceived and at- tacked, is the growing tendency of com- puter programs to embody the essential properties of theory. The reasons for this development are fairly straightfor- ward: programming languages possess many of the communicative advantages of formal languages used in mathemat- ics: they say what they mean in a sense so literal that to translate their logic into other languages is not attractive. Second, the computer program is always a strictly formal entity: it is well de- fined and its parts have a clear meaning (to the initiated). Third, the program itself is available (the complete theory) and can always be used by someone wishing to explore the consequences of a particular formulation, or to trace the effects of specific assumptions. This development is particularly no- ticeable in applied fields where large models are the vehicle for exploring sys- tem interactions. The model can either have an explicit mathematical formula- tion, as do linear programming models, or, there may be no alternative represen- tation, as is so often the case with sim- ulation models ("A simulation model is a theory describing the structure and interrelationships of a system.") 7 Both techniques are of increasing importance in program planning due to the com- plexity of the problems which must be solved. In either case, the computer model serves as a theory for the system which is being modeled. A good example of this tendency is available in Jay W. Forrester's Urban Dynamics where a simulation model of urban areas is described.8 Indeed, Ap- pendix A is entitled, "The Model-A Theory of Urban Interactions," and the language used to express system func- tions and equations is exactly the lan- guage accepted by a computer system for constructing and executing simula- tion models. The dependence on the computer formulation is understand- 422 I College & Research Libraries • November 1973 able when one considers that the model contains some 150 equations which in- teract to describe the urban areas in complex ways, and involving hundreds of parameters and variables. Equally in- teresting is the "world model" described in World Dynamics and developed in The Limits to Growth. 9 The technical problems are similar: great complexity and 1 detail; the solution the same: for- mulate the theory in terms of a computer simulation model. How these developments affect the un- derlying assumptions and formal ap- paratus of science cannot be determined as yet, but sure to be altered is the shape and function of theory. Part of the rea- son for this is the speed with which de- velopments are occurring. There was a time when today's theory was tomor- row's computer program. Increasingly, today's program is today's theory. For program planning especially, computer models replace theories, both as organi- zations of knowledge and predictors or determiners of the future. It is likely that we are at the periphery of such use, and that the future will see more and more of it. Whether this development is to be lauded or regretted may be debatable. Second class theories or not, computer models cannot easily be dismissed: their numbers are growing. In this context, the point to be made is that the com- puter program represents something es- sential about the theory, and the theory is often approached and understood by researchers through its computer repre- sentation. Libraries and information centers which support research, if they are to satisfy the information needs of their users, must get into the business of providing access to such programs. Exactly how this need 'should be metre- mains a mystery, though some ideas fol- low. THE CHANGING RoLE oF DATA It is obvious that as computer pro- grams like those cited become the stan- dard means of communicating results of research, the role of data in libraries and information centers will grow apace. In many ways, scientific documenta- tion is primarily processed data. The contents of research articles are often formed from samples of the data and fragmentary evidence in support of the author's conclusions; when the conclu- sions are questioned or, more often, when different questions need to be asked, the data is more valuable than the documentation. As a form of knowledge, the article becomes less at- tractive as the ways of processing data increase. As the variety of analytic tech- niques increases, the likelihood that an analyst will be satisfied with this or that particular analysis decreases. We now have automated procedures for data analysis: everyone becomes his own ana- lyst and can perform his own analysis tailored to his own needs. · The same point can be made in rela- tion to the simulation models cited: they are ways of processing data; they are a means for digesting pasts. Both develop- ments increase the importance of raw data. Libraries of data are becoming commonplace and, as a national asset, it can be argued that the Bureau of Standards with its data collections is more valuable than the Library of Con- gress. Certainly for science and technol- ogy this is true. As evidence of this changing role of raw data, two impor- tant examples can be cited, each of which incorporates extensive data bases and a means for selectively processing them. At the Bureau of Labor Statistics, U.S. Department of Labor, a system has been developed to provide analyses of a growing body of U.S. economic data (described in "The Computer and Eco- nomic Analysis at the Bureau of Labor Statistics") .10 At the Bureau of the Cen- sus, a more elaborate system has been under development to handle demo- graphic data, primarily the 1970 census data.1 1 Do these developments have implica- tions for research libraries? It is after all an accident of technology that books and journals are the vehicles for storing and communicating research results. As data becomes a more dynamic part of an information system, continually re- analyzed from varying points of view, representations of the data in printed form are reduced in value, and the cor- responding computer representation has increased value. If libraries and infor- mation centers are to continue as vital adjuncts to science, they must accommo- date themselves to this shift from prod- uct to process. INTEGRATED INFORMATION SYSTEMS From the foregoing it would appear that scientific research efforts would be best supported by an information sys- tem of three major components: ( 1) A documentation component, which in- cluded interactive text editing facilities as well as retrieval capabilities. Ques- tion-answering systems as they are cur- rently understood would derive from this component, insofar as they are based on natural languages such as En- glish. ( 2) The second component would include data bases, especially those which contributed to the technical pa- pers in the document section. They would normally contain far more, even data which had not been documented, though unanalyzed data would require sufficient definition to be used by the community. ( 3) The third component would be the techniques for analysis, especially those which were actually used in the reported literature. The term "techniqties" as used here is mere- ly a euphemism for computer program or technique .available through one. It is likely that the long-term solution to these problems will be through inte- grated systems such as Project INTREX where data, analytical procedures and Impact of Automation I 423 documentation each will have a place. 12 In such systems, users will communicate with one another and to their programs and data through terminals in an inter- active environment. With common file definitions and the facilities for work- ing with them, the frame exists for an on-line community of scholars or re- search specialists: instant publication. Unfortunately, such systems are far in the future. What can be done in the interim? For information centers al- ready employing automated components the hardware and software technology exists to eliminate or minimize some of the deficiencies noted above. Each of the three facets of scientific research- documentation, programs, and · data- has been dealt with separately and fr.ag- mentarily by different approaches and procedures. Though no system exists which incorporates all three suitably, they can be had individually with less effort than might be supposed. The first major task is to tie the pro- duction of research documentation more closely to the .automated system. This can best be done within the constraints of current technology, through avail- able text editing systems.1 3 Text-editing systems allow alterations and corrections to be made to manuscript material through its computer representation. The advantage is that only a small por- tion usually need be changed, the fitting of the text to pages, including altered pagination, spacing, paragraphing, etc. being done automatically in a subse- quent reprinting of the text. Since no new errors are introduced, such systems usually reduce the overall labor and im- prove the product at the same time. In theory at least, text-editing systems can interface directly with computer driven printers, removing the need for addi- tional proofreading (especially valuable for texts heavy with formulas). Why they are not in more widespread use is a mystery, but one which will not per- sist for long: their advantages will soon 424 I College & Research Libraries • November 1973 make them commonplace. In this context such systems are cited for making .available machine-readable versions of documentation at their ori- gin, increasing the amount of text so available as well as doing it more speed- ily. Thus eventual use of such text for retrieval or question-answering is as- sured. The second problem-availability of analytical techniques or models in the form of computer programs-requires for its solution better management of research efforts, and more cooperation ( legislated or otherwise) among librar- ies and information systems. For exam- ple, we have not yet reached the stage where research designs contain, or in the case of federal funding, are required to contain, explicit means for communi- cating, in addition to the conclusions of the research, the analytical techniques which were used and the data they were applied to. To get some idea how this approach works out in a book medium, consult Cooley and Lohnes, M ultivari- ate Data Analysis where a national sur- vey is referenced throughout as a source of examples. 14 The book itself contains copies of the computer programs neces- sary to duplicate most of the analyses which were carried out on the data, and the data can be obtained through the .authors. Such an exemplary practice will more and more be copied as its value to the research community becomes more ob- vious. A concomitant responsibility falls on the information center to make such corroborative or subsidiary tools as computer programs and data available as well as the documentation: they will soon become as essential, if not more es- sential than the documentation. As ana- lytical techniques become more stan- dardized, it will become easier for in- formation centers to provide them to users. Programming systems which in- clude global file definition and a broad selection of statistical procedures have been commonplace for some time. The Biomedical Package ( BMD) series and the Statistical Package for the Social Sciences are two of the more generally available examples. 15 A more critical example, apt since it deals with information retrieval, is in the SMART efforts carried out at Cornell University over the last few years. 16 Part of the burden of the re- search was the development of special- ized files and procedures and their or- ganization into a single system capable of achieving the research goals of the participants. The system itself is avail- able from Cornell and interested re- searchers are able to carry out duplicate tests, either on the original or their own data, as well as inventing their own ex- periments. CONCLUSION The final solution, like all final solu- tions, is far in the future. What can be done immediately? Some action is re- quired on the federal level: the specifi- cation of data interchange codes, the standardization of analysis techniques; the requirement that all federally fund- ed research specify fully and in ad- vance the form and ultimate end of any data; standardization of biblio- graphic records, publication standards, and far more. On another level what is needed is a more thorough analysis of the relation- ships among the sciences, and the devel- opment of common tools of analysis and common languages. At the same time, more attention needs to be paid to the development of scientific planning methods: too often seat-of-the-pants de- cisions are based on seat-of-the-pants reasoning. From such studies should emerge better techniques for control- ling the access to documentation for the purposes of improved planning, both in the use of scientific resources and in the development of social programs. Even in the absence of these obvious- ly worthwhile endeavors, information centers must develop ways of managing more than documents; they must devel- op a means of controlling data and pro- grams as well, and understanding the ·Impact of·Automation I 425 use to which their users would have them put.· As more information centers automate their services, they will more easily be able to extend them to include making available computer programs and data in electronic form. REFERENCES 1. These definitions are taken from National Patterns of Research and Development Re- sources: 1953-71, NSF 70-46, p.24-25, and were (in part) used in the questionnaires to gather the data appearing there. 2. Ibid., passim, is available to aid insight. The document is a good summary of re- search and development expenditures in the U.S. The key figures are: a drop in to- tal R & D spending from 3 percent of the GNP to 2.7 (Chart 3, p.3 ). 3. William Orchard-Hayes, Advanced Linear- Programming Computing Techniques (New York: McGraw-Hill, 1968) presents the state of the art. as of 1968. 4. Recently, the capabilities· of formulating simulation models for computers has been extended to using the English language. See George E. Heidorn, "The Specification of Decoding and Encoding Processes for Natural Language Man-Machine Commu- nication," paper given at the lOth Annual Meeting of the Association for Computa- tional Linguistics, July 1972. 5. Brenda White, Sourcebook of Planning In- formation: A Discussion of Sources of In- formation for Use in Urban and Regional Planning and In AUied Fields (London: Bingley, 1971). 6. F. W. Lancaster, Evaluation~ of the MEV- LARS Demand Search Service (Bethesda, Md.: National Library of Medicine, Jan. 1968) gives some insight into this problem. 7. Jay W. Forrester, Urban Dynamics (Cam- bridge, Mass.: The M.I.T. Press, 1968), p. 112. 8. Ibid. 9. Jay W. Forrester, World Dynamics (Cam- ' bridge, Mass.: The M.I.T. Press, 1968); Donella H. Meadows, et al., The L·imits to Growth (New York: Universe Books, 1972). 10. Rudolph C. Mendelssohm, "The Computer and Economic Analysis at the Bureau of Labor Statistics," The American Statistician 22 (April 1968). 11. Abbot L. Ferris, Research and the 1970 Census (Oak Ridge, Tenn.: 1971). 12. Report of a Planning Conference on Infor- mation Transfer Experiments (Cambridge, Mass.: 1965) remains the best overview of the system's design and major goals. 13. Andries Van Dam and David E. Rice, "On- Line Text Editing: A Survey," Computing Surveys (June 1972), p.65-79. Of particu- lar interest · are Text 360 Reference Man- ual Operation Guide (New York: IBM Corp., 1968) and Andries Van Dam, FREES (File Retrjeval and Editing Sys- tems) User's Guide (Barrington, R.I.: Text Systems Inc., 1971), the former for its gen- eral availability; the latter for the original- ity of its facilities. 14. William W. Cooley and Paul R. Dohnes, Multivariate Data Analysis (New York: John Wiley, 1971). 15. W. R. Schucany, Paul D. Minton, and B. Stanley Shannon, Jr., "A Survey of Sta- tistical Packages," Computing S-urveys (June 1972), p.65-79. 16. Gerard Salton describes the system and its use in Automatic Information Organization and Retrieval (New York: McGraw-Hill, 1968 ).