^2 /-^- /<* .' ^c^* - C V % f St °>. V J \ m °»EA\1 Of NBS Special Publication 500-64 U.S. DEPARTMENT OF COMMERCE National Bureau of Standards NATIONAL BUREAU OF STANDARDS The National Bureau of Standards 1 was established by an act or Congress on March 3, 1901. The Bureau's overall goal is to strengthen and advance the Nation's science and technology and facilitate their effective application for public benefit. To this end, the Bureau conducts research and provides: (1) a basis for the Nation's physical measurement system, (2) scientific and technological services for industry and government, (3) a technical basis for equity in trade, and (4) technical services to promote public safety. The Bureau's technical work is per- formed by the National Measurement Laboratory, the National Engineering Laboratory, and the Institute for Computer Sciences and Technology. THE NATIONAL MEASUREMENT LABORATORY provides the national system of physical and chemical and materials measurement; coordinates the system with measurement systems of other nations and furnishes essential services leading to accurate and uniform physical and chemical measurement throughout the Nation's scientific community, industry, and commerce; conducts materials research leading to improved methods of measurement, standards, and data on the properties of materials needed by industry, commerce, educational institutions, and Government; provides advisory and research services to other Government agencies; develops, produces, and distributes Standard Reference Materials; and provides calibration services. The Laboratory consists of the following centers: Absolute Physical Quantities 2 — Radiation Research — Thermodynamics and Molecular Science — Analytical Chemistry — Materials Science. THE NATIONAL ENGINEERING LABORATORY provides technology and technical ser- vices to the public and private sectors to address national needs and to solve national problems; conducts research in engineering and applied science in support of these efforts; builds and maintains competence in the necessary disciplines required to carry out this research and technical service; develops engineering data and measurement capabilities; provides engineering measurement traceability services; develops test methods and proposes engineering standards and code changes; develops and proposes new engineering practices; and develops and improves mechanisms to transfer results of its research to the ultimate user. The Laboratory consists of the following centers: Applied Mathematics — Electronics and Electrical Engineering 2 — Mechanical Engineering and Process Technology 2 — Building Technology — Fire Research — Consumer Product Technology — Field Methods. THE INSTITUTE FOR COMPUTER SCIENCES AND TECHNOLOGY conducts research and provides scientific and technical services to aid Federal agencies in the selection, acquisition, application, and use of computer technology to improve effectiveness and economy in Government operations in accordance with Public Law 89-306 (40 U.S.C. 759), relevant Executive Orders, and other directives; carries out this mission by managing the Federal Information Processing Standards Program, developing Federal ADP standards guidelines, and managing Federal participation in ADP voluntary standardization activities; provides scientific and technological advisory services and assistance to Federal agencies; and provides the technical foundation for computer-related policies of the Federal Government. The Institute consists of the following centers: Programming Science and Technology — Computer Systems Engineering. 'Headquarters and Laboratories at Gaithersburg, MD, unless otherwise noted; mailing address Washington, DC 20234. : Some divisions within the center are located at Boulder, CO 80303. o Q. 9 COMPUTER SCIENCE & TECHNOLOGY: DATA BASE DIRECTIONS— The Conversion Problem Proceedings of the Workshop of the National Bureau of Standards and the Association for Computing Machinery, held at Fort Lauderdale, Florida, November 1 - 3, 1977 John L. Berg, Editor: Center for Programming Science and Technology Institute for Computer Sciences and Technology National Bureau of Standards Washington, D.C. 20234 Daniel B. Magraw, General Chairperson Working Panel Chairpersons: Milt Bryce, James H. Burrows, James P. Fry, Richard L. Nolan Sponsored by: acm National Bureau of Standards Association for Computing Machinery U.S. DEPARTMENT OF COMMERCE, Philip M. Klutznick, Secretary Luther H. Hodges, Jr., Deputy Secretary Jordan J. Baruch, Assistant Secretary for Productivity, Technology and Innovation © NATIONAL BUREAU OF STANDARDS, Ernest Ambler, Director Issued September 1980 Reports on Computer Science and Technology The National Bureau of Standards has a special responsibility within the Federal Government for computer science and technology activities. The programs of the NBS Institute for Computer Sciences and Technology are designed to provide ADP standards, guidelines, and technical advisory services to improve the effectiveness of computer utilization in the Federal sector, and to perform appropriate research and development efforts as foundation for such activities and programs. This publication series will report these NBS efforts to the Federal computer community as well as to interested specialists in the academic and private sectors. Those wishing to receive notices of publications in this series should complete and return the form at the end of this publicaton. National Bureau of Standards Special Publication 500-64 Nat. Bur. Stand. (U.S.), Spec. Publ. 500-64, 1 78 pages (Sept. 1980) CODEN. XNBSAV Library of Congress Catalog Card Number: 80-600129 U.S. GOVERNMENT PRINTING OFFICE WASHINGTON: 1980 Ken- sale liy the Superintendent of Documents. U.S. Government Printing Office Washington. D.C. 20402 -'Price $5.50 TABLE OF CONTENTS 3. Page INTRODUCTION 3 1.1 THE FIRST DATA BASE DIRECTIONS WORKSHOP 3 1.2 PLANNING FOR SECOND CONFERENCE 4 1.3 DATA BASE DIRECTIONS II 4 1.4 CONCLUSION 5 EVOLUTION IN COMPUTER SYSTEMS 7 2.1 QUESTIONS 7 2.2 HARDWARE CHANGES 9 2.3 SOFTWARE CHANGES 10 2.4 EVOLUTIONARY APPLICATION DEVELOPMENT 11 2.5 MIGRATION TO A NEW DBMS 13 ESTABLISHING MANAGEMENT OBJECTIVES 19 3.1 OVERVIEW 20 3.2 CONVERSION TO A DATA BASE ENVIRONMENT 24 3.2.1 Impact On the Application Portfolio 25 3.2.2 Impact On the EDP Organization 29 3.2.3 Impact On Planning and Control Systems. ... 35 3.2.4 Impact of Conversion On User Awareness. ... 38 3.3 MINIMIZING THE IMPACT OF FUTURE CONVERSIONS .. 39 3.3.1 Institutionalization of the DBA Function. . 39 3.3.2 DBMS Independent Data Base Design 40 3.3.3 Insulate Programs From the DBMS 40 3.4 CONVERSION FROM ONE DBMS TO ANOTHER DBMS 41 3.4.1 Reasons for Conversion 42 3.4.2 Economic Considerations 43 3.4.3 Conversion Activities and Their Impact. ..." 44 3.4.4 Developing a Conversion Strategy 48 - i i i - 3.5 SUMMARY 50 ACTUAL CONVERSION EXPERIENCES 53 4.1 INTRODUCTION 54 4.2 PERSPECTIVES 55 4.3 FINDINGS 57 4.3.1 Industrial /Governmental Practices 58 4.3.2 The First DBMS Installation 60 4.3.3 Desired Standards 61 4.3.4 Desired Technology 61 4.4 TOOLS TO AID IN THE CONVERSION PROCESS 62 4.4.1 Introduction 62 4.4.2 Changing From Non-DBMS To DBMS 62 4.4.3 Changing From One DBMS To Another 64 4.4.4 Changing Hardware Environment 66 4.4.5 Centralized Non-DBMS--di stri buted DBMS. ... 66 4.4.6 Centralized DBMS--di stributed DBMS. .• 67 4.5 GUIDELINES FOR YOUR FUTURE CONVERSIONS 67 4.5.1 General Guidelines 68 4.5.2 Important Considerations 68 4.5.3 Tight Control 69 4.5.4 Precise Planning/pre-planning 69 4.5.5 Important Actions 70 4.6 REPRISE 73 4.7 ANNEX: CONVERSION EXPERIENCES 73 4.7.1 Conversion: File To DBMS 73 4.7.2 Conversion: Manual Environment To DBMS. .. 79 4.7.3 Conversion: Batch File System To a DBMS. . 84 4.7.4 Conversion: DBMS-1 To DBMS-2 87 STANDARDS 93 5.1 INTRODUCTION 93 5.1.1 Objectives 93 5.1.2 What Is a Standard? 94 5.1.3 Background 94 5.2 POTENTIAL BENEFITS THROUGH STANDARDIZATION ... 97 5.3 SOFTWARE COMPONENTS IN CONVERSION PROCESS 98 -i v- 5.3.1 Scenario 1 98 5.3.2 Scenerio 2 99 5.3.3 Scenario 3 100 5.3.4 Scenario 4 100 5.3.5 Miscellaneous Standards Necessary 100 5.3.6 Non-software Components Necessary 100 5.4 RECOMMENDATIONS 101 5.4.1 The Development of a Standard DBMS 101 5.4.2 Generalized Dictionary/directory System. . 101 5.5 CONCLUSION 103 5.6 REFERENCES 103 CONVERSION TECHN0L0GY--AN ASSESSMENT 105 6.1 INTRODUCTION 106 6.1.1 The Scope of the Conversion Problem 106 6.1.2 Components of the Conversion Process 107 6.2 CONVERSION TECHNOLOGY 109 6.2.1 Data Conversion Technology 110 6.2.2 Application Program Conversion 120 6.2.3 Prototype Conversion Systems Analysis. ... 129 6.3 OTHER FACTORS AFFECTING CONVERSION 138 6.3.1 Lessening the Conversion Effort 138 6.3.2 Future Technologies/Standards Impact 145 BIBLIOGRAPHY 150 PARTICIPANTS 161 -v- PREFACE In 1972 the National Bureau of Standards (NBS) and the Association for Computing Machinery (ACM) initiated a series of workshops and conferences which they jointly sponsored and which treated issues such as computer security, privacy, and data base systems. The three -day Workshop, DATA BASE DIRECTIONS--The Conversion Problem , reported herein continues that series. This Workshop was held in Fort Lauderdale, Florida,on November 1-3, 1977, and is the second in the DATA BASE DIRECTIONS series. The first, DATA BASE DIRECTIQNS--The Next Steps , received wide circulation and, in addition to publication by NBS, was published by ACM's Special Interest Group on Management of Data and Special Interest Group on Business Data Processing, the British Computer Society in Europe, and excerpted by IEEE and Auerbach. The purpose of the latest Workshop was to bring together leading users, managers, designers, impl ementors, and researchers in database systems and conversion technology in order to provide useful information for managers on the possible assistance database management systems may give during a conversion resulting from an externally imposed system change. We gratefully acknowledge the assistance of all those who made the Workshop's results possible. S DWector, Cerrter for Programming Science and Technology Institute for Computer Sciences and Technology - vi - A MANAGEMENT OVERVIEW To a manager, conversion answers the question, "How do I preserve my investment in existing data and programs in the face of inevitable changes?" Selection of conversion as a solution depends directly on issues of cost, feasibility, and risk. Since change is inevitable, prudent managers must consider preparations that ease inevitable conversions. How does a manager choose a course of action? When Mayford Roark, Executive Director of Systems for the Ford Motor Company, and Keynoter of the Workshop, sought an analogue for examining DBMS and the conversion problem, he aptly selected the idea of "mapping a jungle." Reporting on his experience, he noted that 90% of his computers were changed within three to five years and major software changes from the vendor occurred somewhere between five and ten years after acquisition. These forced changes coupled with the organization's changing requirements led Roark to a basic point: "evolutionary change is the natural state of computer systems." In short, the ADPmanager's continual task is to "manage change." Roark's managers experienced the classic benefits of DBMS: quicker response to changing requirements, easier new application development, and new capabilities not possible in the earlier systems, but Roark summarized his conversion experience within a DSBMS environment in this way: . hardware changes — having a DBMS was a moderate to major burden. . software changes — dependent on the circumstances, having a DBMS ranged from negligible impact to am a j or burden. . evolutionary application change — havi ng a DBMS was a moderate boon. It proved effective, but very expensive. Considering even the conversion to a DBMS, Roark scores this process a moderate burden because of the risks and costs associated with DBMS. - vi 1 - He emphasized this point by describing the major DBMS need as "easy-to-use, easy-to-apply , and inexpensive approaches for upgrading" two decades of computer files to a data base environment. Given this charge, how did the workshop respond? Consideration of the conversion to a DBMS led to several specific caveats intended to control the risk inherent in such a step. Though conversion to a DBMS requires careful preplanning and may not be appropriate for every application, managers should consider data base technology an inevitable development thrusting itself on future data processing installations. A manager will have no choice but to face this decision eventually. The first DBMS application can make or break the success of the conversion. The new system's users must have a receptive disposition which results only from careful preparation, preplanning, and the application of basic management skills. The initial application plays an important tutorial role for everyone in the organization including such subtle 1 essons a s : --whether management is truly committed to the new system by supporting it with adequate resources, planning for the continuous support of the system, and applying the necessary managerial cross- department discipline. --whether the installation technical staff truly appreciates user needs, can adjust to user changes, and has the necessary skills and backing to carry- off the task. --whether any of the DBMS conversion proponents have accurate estimates of the costs, the proper tools to use, and a feasible conversion plan expressed in terms that satisfy a! 1 risk sharers. - VI 1 1- While the staff must know the new technology, they must not conclude that the new technology relieves them of the old project management controls that all new systems reauire. Tight planning, management control, cost monitoring, contingency approaches, user review, and step-wise justifications must be used. No "final conversion" ex i sts--pl anni ng for the next one begins now! Prepare your system to permit evolutionary change to enhanced technology--!' ncl udi nq improvements to DBMS. How will future technology help Hardware development, particularly the prol of mini- and micro-computers, networks, a mass storage, will increase the need for ge conversion tools. On the other hand, hardware costs will make conversion inc more acceptable. Additionally, speci machines which promote logical level in will simplify the conversion process, advances in improved data independence software design will also simplify bu eliminate the conversion problem. Of concern to managers: user demand for seve models will continue. mana i f er nd rera dro reas al terf th t sp ral gers? a t i o n 1 arge 1 i z e d p p i n g i n g 1 y DBMS a c i n g Major rough never e c i a 1 data In the next five years, managers can expect to see more operational generalized conversion tools but certainly not full automation of the process. Significantly, standards design and acceptance by vendors will plan a major role in the success of generalized conversion tools. Commercially available tools for data base conversion seem likely in ten years but the conversion of application programs is not likely to have a generalized solution in the next five years. Standards address several manager needs in the conversion process. A standard DBMS would considerably ease the future conversions involving a DBMS. A standard data dictionary/directory would facilitate all conversions. This latter point emphasizes that the data dictionary/directory can stand apart from data base systems and. therefore, can assist the conversion to a first DBMS. -ix A standard data interchange format would ease considerably the loading and unloading of data and thus facilitate the development of generalized conversion tools. Manufacturer acceptance of the standard format, would permit their development of convertors from the standard form to their system, a boon to managers either forced to or desirous of considering a different system. Similarly, standardization of the terminology used in data base technology, convergence of existing DBMS to using common functions, and use by DBMS of a micro-language or standard set of atomic functions would assist managers in dealing with conversion from DBMS to DBMS. In summary: in the next five to ten years managers must depend on existing good management practices rather than wait for automated conversion tools. Standards,. as a management exerted discipline, will facilitate conversions but users can expect reluctant acceptance of standards. -x- DATA BASE DIRECTIONS — The Conversion Problem John L. Berg, Editor ABSTRACT What information can help a manager assess the impact a conversion will have on a data base system, and of what aid will a data base system be during a conversion? At a workshop on the data base conversion problem held in November 1977 under the sponsorship of the National Bureau of Standards and the Association for Computing Machinery, approximately seventy-five participants provided the decision makers with useful data. Patterned after the earlier Data Base Direc- tions workshop, this workshop, Data Base Directions — the convers ion problem , explores data base conversion from four perspectives: manage- ment, previous experience, standards, and system technology. Each perspective was covered by a workshop panel that produced a report included here. The management panel gave specific direction on such topics as planning for data base conver- sions, impacts on the EDP organization and appli- cations, and minimizing the impact of the present and future conversions. The conversion experience panel drew upon ten conversion experiences to com- pile their report and prepared specific checklists of "do's and don'ts" for managers. The standards panel provided comments on standards needed to support or facilitate conversions and the system technology panel reports comprehensively on the systems and tools needed — with strong recommenda- tions on future research. Key words: Conversion; Data Base; Data Description; Data Dictionary; Data Directory; DBMS; Languages; Data Manipulation; Query. -1- 1. INTRODUCTION Daniel B. Magraw GENERAL CHAIRMAN Biographical Sketch Daniel B. Magraw is Assistant Commissioner, Department of Administration, State of Minnesota. For nearly ten years he has been responsible for all aspects of the State of Minnesota information systems activities. His more than thirty years' experience in systems divides almost equally between the private and public sectors. A frequent contributor activities, he was one of the past president of the National State Information Systems. He Systems for 22 years in the Minnesota Extension Division to professional founders and is a Association for taught courses in University of and he has been a frequent speaker on many matters relating to information systems and has been deeply involved with both Federal and state data security and privacy legislation. He was keynote speaker at the 1975 Data Directions Conference. Base 1.1 THE FIRST DATA BASE DIRECTIONS WORKSHOP In late October, 1975, a Workshop entitled "Data Base Directions: The Next Steps" was held in Fort Lauderdale, Florida. Resulting from a proposal brought to Seymour Jeffery at the National Bureau of Standards by Richard Canning and Jack Minker, the workshop was sponsored jointly by the Association for Computing Machinery and NBS. The product of the intensive two and a half day effort was a series of panel reports which, as subsequently edited, were issued under the title of the workshop as NBS Special Publication 451. -3- As early as December, 1975, suggestions were made to ACM and NBS concerning the desirability of one or more future conferences on the same general topic. These suggestions were based on the belief that data base systems will grow increasingly in importance and pervasiveness and were supported by perceptions, even prior to issuance of NBS SP 451, that the workshop had more than met expectations. The report was issued in 1976; it was generally thought to be a valuable contribution to several audiences including top management, EDP management, data base managers, and the industry". It has had an unusually wide circulation for reports of this nature. 1.2 PLANNING FOR SECOND CONFERENCE In February, 1977, ACM and NBS decided in favor of a second data base workshop and invited me to serve as General Chairman. An initial planning group was established consisting of Dick Canning and Jack Minker representing ACM, John Berg representing NBS, and the chairman. The planning group entitled the Conference "Data Base Directions II--The Conversion Problem." Four working panel subjects were selected. Panel chairmen were recruited and became members of the planning group. Subject matter coverage for each panel was specified by the planning group. Each panel chairman then selected members of his working panel. In addition, the planning group specified that the format and procedure for the workshop should follow closely that of the first workshop. As in the 1975 workshop, attendance was by invitation only. Work was done by each panel prior to arriving at the work shop . 1 .3 DATA BASE DIRECTIONS II The workshop was held November 1-3, 1977, in Fort Lauderdale, Florida. There were approximately 75 in attendance. Mayford Roark, Ford Motor Company, gave an excellent keynote address in plenary session, reproduced herein. Final instructions were given as to objectives of the workshop, and all panels were in full operation by 10:30 a.m., November 1. From then until the closing plenary session, each of the four panels met separately with the ultimate purpose, of developing a consensus among its members on which to base the panel report. -4- Each working panel chairman was expected to guide the group discussion and to assure preparation of a good rough draft report prior to the closing plenary session. Though each panel chairman organized his panel in a form best for his subject, each followed a general pattern. A recorder was selected from among each panel's members to maintain "minutes" in visual form on flip chart sheets. These were displayed so that the principal discussion and consensus points were visible. One of the panels had done extensive drafting of report segments prior to the beginning of the workshop. Each of the other three accomplished the objective of rough draft preparation by developing detailed outlines. Portions of the outlines were assigned to individual panel members or to two person teams for draft preparation. When time permitted, drafts were reviewed by others on the same panel prior to their incorporation into the final draft. Following the completion of the workshop, the drafts were put into "final" form and circulated to panel members for final comment prior to submission to the proceedings edi tor . Communication among the four panels was maintained mainly through the panel chairmen and members of the planning group who circulated among the panels. At the closing plenary session, each working panel chairman presented a twenty minute summary of his panel's findings and responded to Questions or comments from the floor. 1.4 CONCLUSION T perspe a base unpara materi the c It is impact on two attemp d e c i s i II , ho that t he cti v of llel al v ount hope th mor ted on- weve hi s mix o es , ex knowl e ed. T al ue , e ry car d that' an did e years to be m makers . r , will documen f P p e r i e dge he o s p e c i ry ing this Data of e ore d The be d t pro a r t i c i p nces , a in d a t utput al ly to respon p u b 1 i c a Base D x t e n s i v i r e c t a real v i r e c 1 1 y v i d e s t ants nd te a ba from thos sibil tion i r e c t e ex nd po al ue prop o the with c h n i c a 1 se syst that g e d e c i s i ty for can have ions I b p e r i e n c e i n t e d in of Data o r t i o n a 1 several the expe ems roup ion- data an ecau an its Ba to cl a i r d r t i s e that shou makers base even se it d bee a d v i c se Di the as sses o iffe prov may Id b ac f utu gre i s b ause e to rec t si st f us ring ides be e of ross res . ater ased it the ions ance ers . -5 EVOLUTION IN COMPUTER SYSTEMS Mayford L. Roark KEYNOTER Biographical Sketch Mayford L. Roark is Executive Di Systems for the Ford Motor Company in Michigan. He has been in charge of the systems function at Ford since 1965, as Controller and later as Director of th Office before assuming his present p 1973. He joined the Ford Division of F Company in 1952 as Senior Financial An managed various financial departments at 1955-1965. Mr. Roark was a Budget Examiner at Bureau of the Budget from 1947 Previously he was with the U.S. Weather the Colorado Department of Revenue. rector of Dearborn , corporate Assistant e Systems o s i t i o n in ord Motor alyst, and Ford from the U.S. to 1952. Bureau and He studied at the University of Colorado, receiving the B.A. "Magna Cum Laude" degree (Economics), and the M.S. (Public Administration). He is a member of Phi Beta Kappa. 2.1 QUESTIONS When I was asked to address this group, with its theme of "The Conversion Problem," I felt some puzzlement about the thrust of the meeting. Was there a concern that data base systems miqht be something of a special burden for the organization faced with a conversion problem? Or, rather, was there the hope that the data base system might be something like "Seven League Boots," for making otherwise tough conversions into happy and speedy journeys? Or, was there a lingering horror that the data base systems themselves might turn out to be conversion nightmares as improved DBMS resources become available? -7- As the working outlines began to arrive through the mail, it became clear that all of these questions were concerns. As Jim Fry s statement put it, "The basic problem we are addressing is the need to migrate data and applications due to the introduction of new, or change in ^tpmc^T . re ; u1 r? ments » hardware, and/or software •un 1 short, it appears that your intent is to map a I won't try to preempt the work of the individual a s My n A impressions. As you survey a topic in depth, "yoTmay well conclude that some of these were inaccurate. This ha certainly been the detailed surveys re travelers. panels by offering any detailed map of my own. Rather as frequent travel er through this jungle, my traveler's 'note may be of some help to your individual survey parties M comments will take the form of generalizations an -■■»--«- "^i^ niattuiflie. inis usual result in geographic history, fine the rough findings of ea s as arly As an initial generalization, let me conversion problems into four families of change Hardware Changes, Software Changes, Evolutionary Application Development, Migration to a Mew DBMS. categorize I shoul conversion their freque respect to d as specific DBMS in each "Boon or Bur help for a it as plus 2 1. If DBM shall score a discussion d like to discuss each of these families of problems, and to relate my own impressions as to ncy of occurrence and their significance with ata base management systems. In an effort to be as possible, I will try to rate the impact of a against a "BB seal e"-- that ' s shorthand for If a DBMS, in my view, is a major boon or of conversion problem, I shall score a moderate boon, that will be plus likely to be a burden than a boon, I 1 or minus 2. Now let me proceed to case den." q i v e n . If S i s it as class it is more minus of these families of conversion problems. 2.2 HARDWARE CHANGES One of othe a c q u i s i t rental for a do on about universe process i years fr mi nicomp a p p 1 i c a t s i g n i f i c computer us , 3 t whatever powerful new pro improvem think t h awhile. of my rs , a ion i n v of more zen yea 3,000 , I w ng comp om thei uters ions; ant ch s do ch o 5 yea comput We ducts e n t s i e 3 to chore " p r i o 1 v i n than rs no compu oul d uters r ac used these ange ange rs is er we are every n co 5 ye s at For or r e v i g a chan $10,000 w , d u r i n ter proj guess are 1 i k q u i s i t i o i n may r for a fairly r 1 ong en have an al so f ac few y st effec ar cycl d is to ew" of ge of a year g which ects . that 9 ely to n. Th d e d i c a t emai n decade a p i d 1 y , ough to d to mo e d with ears , t i v e n e s e i s unde eve proc . I time If Fo pe re w be i s ed, i n or mo howe satu ve on a da al way s. S 1 i k e 1 rtake , ry com essor have b I hav rd is rcent pi aced oul d i n d u s opera re . D ver . rate t to s z z 1 i n g s wit o , for y to wi t pute or een e s typi of wit not tri a tion ata For he c omet sue h s mos con h the r har an doing i g n e d cal o all hi n 3 hold 1 co wi proce mos a p a c i hi ng c e s s i ubsta t of t i n u e aid dware added this off f the data to 5 for ntrol thout s s i n g t of ty of more on of n t i a 1 us , I for The extent of conversion trauma from a hardware upgrade depends on the nature of the change. If the change involves shifting to another supplier, an event that occurred in something like 5 percent of our hardware changes, the conversion effort can be an extended affair requiring a major effort over a period of a year or more. In such cases, a DBMS is likely to be something of a burden. In all probability, the new equipment will require a shift to a different DBMS. In any event, the logic requiring conversion will be somewhat more complex and difficult to handle if a DBMS is present. One of our divisions recently completed a conversion from Honeywell to Burroughs, a change made necessary because of reorganization unrelated to the system function. Some of the application programs to be converted had been developed in Honeywell's IDS environment; they were converted to Burroughs DMS II. IDS is a network system; DMS II is a hierarchical structure, so one might guess that we would have had problems. In fact, I am assured, both by our own people and by the Burroughs conversion team, that this transition went fairly smoothly. On the basis of that once-in-a- lifetime experience, I would have to score the DBMS in this case as minus 1. Perhaps it would have been minus 2 if the application programs involved had been larger and more compl ex . -9- Hardware conversion can also be a serious trauma when migrating to new products of an incumbent supplier Suppliers do present us with new families of hardware from time to time that require structural changes in our application programs and supportina software. " Hopefully this kind of change will be less frequent in the future than it has been in the past. Even so, I suspect we are likely to be faced with something of this kind e^ery 10 years or so. Already, one hears strange rumblings about the kinds of changes likely to unfold 2 or 3 years hence in a new product line called 'The New Grad." So far, DBMS has not figured importantly in any of our conversions within the hardware products of a given supplier. On the basis of guesswork and a skeptical disposition, I would have to assume that the presence of a DBMS for a major hardware family transition would be similar to that where a change in supplier is involved, possibly minus 1 or minus 2 on the BB scorino scale. 3 The remaining hardware conversions, which represent the vast majority of all hardware upgrades, involve moving up within a compatible family of products offered by a single supplier. In these cases, DBMS would not be much of a factor, so we might score its presence as zero on the BB scale. 2.3 SOFTWARE CHANGES Despite IBM's assurance that MVS is here to stay our experience would suggest that we can look for "major" software upgrades or enhancements from any supplier every 5 or 10 years. At Ford, we are somewhat preoccupied at the moment with the MVS problem at our large data center in Dearborn. The completion of this conversion will require a major effort extending over a two-year period. There are three substantial groups of data base systems that will be affected by the conversion; these involve, respectively IMS, TOTAL, and SYSTEM 2000. I have checked with each of the systems groups involved to see how they assess the impact of DBMS at this point, when we are about midway in the conversion effort. The managers working with IMS agree that its presence has added substantially to the difficulty and complexity of the conversion task--somethi ng close to twice as much effort as would have been involved with non- DBMS applications. The IMS conversion involves more than a new operating system, however. It requires the transition to a new DBMS called IMS-VS. One manager sees the new package as offering many attractive new features. Another manager sees no incremental benefits at all for his particular applications; but he has no choice, his present IMS software will not be supported under MVS. -10- The managers using TOTAL and SYSTEM 2000 see no special conversion problem at all in going to MVS. This sampling of experience adds up to a very mixed bag. As the warnings say about some drugs, a major software conversion "may be hazardous to your heal th"--but again, it may not. I think we have to score the presence of a DBMS in such a situation with a range from to minus 2 on the BB seal e . We do have numerous other software changes, of course, on an almost continuous basis--new releases to old operating systems, new software packages to handle new functions, and the like. There is nothing in our experience to indicate that DBMS is much a factor one way or another in these minor changes . 2.4 EVOLUTIONARY APPLICATION DEVELOPMENT Now we move into the function is all about, and terri tory of what DBMS al so what the systems is all about. Before getting into the particulars about this family of change, we ought to be clear on one central point. "Evolutionary change" is the natural state of computer systems, just as it is for biological systems. Perhaps I should not push this analogy too far because there is one dramatic difference — evolutionary change works more rapidly in computer systems, where even 5 years can produce dramatic changes in structure, outputs, and even objectives. During the past 5 years, the workload at computer centers at Ford has been growing at percent annually. I might explain here that we measure w6rkload in BIPS (Billions of Processed) for purposes of capacity planning, say that growth has been close to 20 percent a that the BIPS have been growing at close to annual ly . our major close to 20 attempt to Instructions So, when I year, I mean 20 percent As best as I can judge, about results from new applications and adaptations of existing applications. hal f of thi s growth about hal f from new The new adaptations, which are often described by the rather condescending term "maintenance," include a lot of things. One is simply the effect of volume. If we sell more cars, other things equal, we will process more BIPS. Another is changing requirements. In our industry, every model yedr is, in a sense, a new game. The products may change, terminologies may change, and data requirements may -11- change. One of the biggest sources of changing requirements in the automotive industry is government regulation. The work of the regulators never ceases, nor does the growth in requirements for additional data elements in the burgeoning computer records that we must maintain as evidence of compliance with all sorts of directives from Washington Some of these changing requirements are massive like OSHA regulations or like corporate fuel standards. Others seem almost trivial until things economy , they are examined in the context of required changes in computer files. The industry is currently being asked to adopt as an international standard a 16-character vehicle identification number. Our existing identification numbers, with 11 characters, turn up in more than 50 separate computer files in North America alone. The cost of converting these files and their related application programs to the international standard is estimated at $3 million. new In all fairness, the government is not the only source of changing requirements, our engineers and product planners are pretty good changers themselves. Our service parts files include roughly twice as many parts today as 10 years ago, when our basic inventory control system was developed. Our organization people contribute more than a fair share of changing reaui rements . Every time there is a corporate reorganization, we find it necessary to go through a restructuring of the hundreds, or even thousands, of computer files and application programs that are affected by the real ignment . And even systems people are contributors to the evolutionary process, only we usually call the changes efficiency improvements." Almost e\/ery systems activity has its own more or less continuous effort aimed at cleaning up programs that run inefficiently, that are hard to maintain, or that otherwise need overhaul. To make another sweeping generalization, I would judge that each of our systems activities annually rewrites somewhere between 5 percent and 25 percent of its accumul a ted code . What a fantastic opportunity for data base management systems! I would score the DBMS impact on this area of evolutionary application development as plus 2 and proceed to the next topic except for one problem. The DBMS benefits can be extremely difficult to realize in practice, and the price of the cure is sometimes worse than the pain of the ailment. -12- Ma af f ai rs of data coun ted burden several cost, overhea and fou resul te process r e d u c i n ny of our Some of For sys in 6 o in a range hundreds One of our d associa nd that 70 d from D of restru g process i files these terns o r 7 of 30 of a c t i v ted w i perce BMS o c t u r i n na req most files f this figure perce thousa i t i e s th one nt of verhea g its ui reme subject run i n t size, p s annua nt to 40 nds of 1 ast yea of its the i n s t d. Thi DBMS w nts by a to the roces lly. perc dolla r mad 1 arge r u c t i s ac i th full change billion sing cos An adde ent can r s in ad e an an data ba o n s b e i n t i v i ty the o b j 40 perc are s of ts ma d ove amoun ded a a 1 y s i se sy g exe i s i e c t i v ent . huge bytes y be rhead t to nnual s of stems cuted n the e of of our activities launched a in which nearly half of the for pointers. Processing can far exceed expectations This sort of overhead can Several years ago, another large-scale data base system file requirements were required requirements in such a case based on any prior experience, mean a loss in processing productivity, something that has to be weighed against any benefits in programming productivity. The best score I can give DBMS, therefore, for its impact on "evolutionary application development" is plus 1. It sometimes expensive. works well, but it can also be awfully Somewhere in the deliberation of this "or related groups," there ought to be some consideration of what can be done to simplify DBMS technolgy. If this is not possible, perhaps there could be some guides or standards to protect the systems designer with a modest problem from unwittingly stumbling into a huge solution out of all proportion to his need. Sometime in the future, I would like to go through an exercise like this again and conclude, without any reservation, that DBMS is an unqualified boon for the evolving system regardless of its size and complexity. We are still in the very early stages of DBMS art. 2.5 MIGRATION TO A NEW DBMS I have saved this family of conversion problems until last. In a sense, it overlaps all three of the categories I discussed earlier. Migration to a new DBMS may be prompted by a change of hardware; it may be forced by a change of software; or it may be undertaken with a view to realizing the benefits we discussed under "evolutionary applications development." Still, the migration to a new DBMS is a major event that deserves consideration in its own right, whether the migration is from a non-DBMS environment or is the rare change from one DBMS to another. -13- There is a logical inconsistency in assigning any rating at all to this family of conversion problems. It is like asking, "What impact does DBMS have upon itself?" Yet, at the risk of sounding completely illogical, I am aoing to assign a BB rating of minus 1 to this category of conversion problems on the ground that a DBMS is a high-risk undertaking entirely apart from whether it is related to changes in hardware, software, or applications. Moving to a new DBMS is not unlike the process of getting married, it takes a lot of desire, commitment, sacrifice, and investment. It involves moving to a wholly new lifestyle. Once there, the return to the old lifestyle may be difficult or impossible without wrenching adjustment probl ems . I have several times had the a spokesman for some other organ iz made the policy decision that all our organization will be based shudder. It is a little like sayi everyone should get married." I every instance cross examinatio policy or no policy, the organizat a lot of its computer business out I expect this will be true of many years to come, or at least un can be simplified to the point w total desire, total commitment, to i n vestment . expe a ti o f utu on ng, must n h ion side near til here tal ri en n wh re d DBM "We add as i n q the ly a the it prep ce o o te evel S." 've , ho esta uest DBM 11 o DB no 1 arat f en lis opme Thi dec weve bl i s ion S en f us MS onge ion , coun me , nt w s ma i ded r, t hed s ti 1 vi ro for tech r re and t e r i n q "We 1 ve o r k in kes me that hat in that , 1 does nment . many, no! ogy quires heavy comp d i v i like from deve appl comp this neve must syst unde when Not too uter sys s i o n s and 50,000 major lop. Ot i c a t i o n s 1 e x i ty . whole ra r shoul d be taken ems peop rtake sue we do . long terns affi appl proje hers repre DBMS nge o . In as o le. ham ago, app 1 i at icat cts were sent te f de any ne o We igra we 1 ica es . ion req rel s a chno vel o eve f 1 all tion compi t i o n s The prog u i r i n a t i v e wide logy pment nt , m ife' s nee and led a c in use a 1 i s t i n g rams. So g scores 1 y simple spectrum today doe a 1 r e q u i r i g r a t i o n c 1 i m a c t d more gu how to st atal og tour came me of of . The of da s not ements to a n ic ev i d a n c e av ou of North to s these man-y e full ta ne real ly Per ew DBM ents about t of all the Ameri can omethi ng resul ted ars to range of eds and address haps it S system to most when to trouble These, then, are the major problems to be found in this jungle of systems conversions. To summarize, my BB scorings have suggested that a DBMS can constitute a moderate-to- heavy burden for major hardware conversions and major software conversions. The presence or availability of DBMS, however, can be a moderate- to-strong boon for evolutionary -14- application development. Finally, the migration to a DBMS system can be a long and tedious journey, especially where existing systems of great size and complexity are involved. DBMS', in short, still offers attractive visions of a world in which systems might respond quickly to changing requirements, in which new creative applications might be easily spawned from existing data files, and where data bases, in the words of Dan Magraw at the earlier conference of two years ago, can "move the DBM's into the area of decision making." These visions ought not to be taken lightly. In preparing for this meeting, I checked back with our managers who have been in the DBMS mode for 5 years or more. What benefits did they really get? Their consensus might be summarized as follows: 1. They, indeed, have been able to respond more quickly to changing requirements, although not always as easily as they once hoped. 2. New applications development has been easier. The managers see a productivity improvement factor in a range of 10 percent to 20 percent. 3. Most important, the managers believe they have capabilities that previously did not exist at all. Let me expand on these capabilities for a moment. Those 50,000 computer programs I mentioned are a long-time accumulation in response to numerous problems and opportunities that were perceived by our division and staffs over a period of two decades. Our moderate range of yet in a its own greatly " struc tu these a would gi wi th accessib files, w ty p i c a 1 -size c 1,500 t DBMS en set of f i nfl uen red prog pproache ve some what o i 1 i ty i s i th all d i v i s ompany , o 3,000 v ironme i 1 e s . ced by rammi ng s may h a c c e s s i ur pe someth the ace ion, w h has it program nt, each In the 1 the cone ." For ave prov b i 1 i ty t opl e c i n g els ess keys i c h m i g h s own ace s. If th of those ast two y epts of " systems i d e d a 1 o o files, all "sp e--f ragme hidden i t CO umul a e di prog ears , top-d of r g i c a 1 For aghet nted n " sp rre spond t i o n , us vision rams com we hav own d e s i ecent v struc tu ol der s t i pro and sc a g h e 1 1 i to a ual ly a is not e s with e been gn" and i n t a g e , re that ystems , grams ," attered code." As functional entities for the purposes they were first created, these old programs may serve well. Even where we want to launch a new application that will need to draw on the data in these files, the problem is not overwhelming. -15 We can, and do, write programs to get files, wherever they exist. at the needed data But suppose we do not want a new system. All we want is an in-depth analysis of a difficult problem on which 5 or 10 different files have something useful to say. This is a sort of problem that gives computer people a bad reputation as being slow -moving and unresponsive. It can be extremely difficult to extract data from 5 or 10 different sources, all with different maintenance cycles and data control procedures, and produce anything but a mess. We do a lot of computer-based analysis at Ford because we have found that the data resources in our computer files can open up all sorts of insights and understanding that would otherwise be lost. I wish we could do even more, but this form of analysis can be very time consuming and frustrating in the non-DBMS environment. We have been working on one such exercise involving non-DBMS file sources for more than three months, all because of the relative inaccessibility of data. Last week we decided to skip a promising analysis altogether because it would have taken more than a month to extract and organize the needed data from a variety of source files. So, when our managers talk about new capabilities from DBMS, they are talking about one of the computer's most important potenti al s--the power to carry analysis to entirely new levels of understanding. The benefits cited by our managers are impressive testimonials. Why, then, has the data processing world been so slow in converting to DBMS? I have seen no studies that would provide an accurate measure as to the proportion of data processing oriented to DBMS. In the absence of any sure data, I am going to offer the opinion that the proportion is no greater than 10 percent to 20 percent. I have never expected that all computer systems would, or should, move to DBMS. In the early Seventies, however, as we first began to realize the potential of this technology, most of us, even the conservatives I believe, would have expected that something approaching half of all computer files would acquire a DBMS format by the end of 1977. Even at Ford, where I believe the impact of DBMS has probably been greater than average, the progress has seemed slower than I would have expected. This painfully slow progress points up perhaps the greatest conversion problem of all--how can we move to a DBMS environment with existing systems? -16- Most of our DBMS user divisions made their first moves to data base at least five years ago. A couple of these divisions went through a conversion trauma, eventually recovered, and never undertook a subsequent DBMS project. Once was enough, they concluded. The other divisions have continued to extend their data bases in areas of new systems development. But, this leaves a very large accumulation of computer files more or less untouched by the new technology. One manager, possibly our most enthusiastic data base advocate, believes that about 40 percent of his divisional data is now in DBMS format after some five years of development. He guesses that this figure may reach 70 to 80 percent in another five years. We have still another group maintenance workload who have approach at all. of divisions elected not to with heavy try the DBMS The problem is not unlike that of our Detroit skyline. We recently completed a magnificent new Renaissance Center along the riverfront, with some of the most beautiful hotel and office structures to be found anywhere. This is exciting, but there is still a long way to go to bring the rest of the city up to Renaissance Center standards. The accumulation of history is still a huge obstacle to those who want the best of things right now. Omar Khayyam, who summed up so many things in language that a sophomore can understand, expressed this frustration perfectly: ... could thou and I with grasp this Sorry Scheme of To Woul d we Re-moul d it not shatter it to fate conspire things entire, bits--and then nearer the Heart's desire! All of us, however, have to find ways to working with what we have inherited. We might all wish we could somehow get rid of the old mess and start all over again. Unfortunately, that old mess represents an investment in the hundreds of millions of dollars for my company and in many tens of billions of dollars for all of us collectively. One of our divisions several years ago tackled this rebuilding problem in what seemed to me an innovative kind of way. This division had identified 15 overlapping files that had evolved over the years, with inventories and other data related to parts. As might be expected, there was much redundancy, and it was difficult to reconcile one file to another. A complete overhaul of the applications programs did not seem feasible, but the division hit on the idea of a DBMS master file to serve as a sort of front end to these -17- application programs. There was a saving of more than $100,000 annually in data preparation and data control. This limited effort brought the division into the DBMS environment and laid the basis for solid evolutionary development, including subsequent application revisions to exploit more fully the potential of the data base environment. If there is one thing I would particularly want to see come out of this conference, then, it would be some easy- to-use, easy-to-apply , and inexpensive approaches for upgrading this great accumulation of computer files we have all been working on for the last two decades or so. We have all had tantalizing but too-brief experiences with data bases as they can be. The question before us now is, what can we do to make those benefits available wherever we need them? Where does an organization with an accumulation of 1,500 to 3,000 programs begin, without going out of business for a year or two? What tools can it use to map out its data resources? How can it go about restructuring these files without scrapping its investment in application programs? And, even if the files can be rebuilt, what about the necessary remodeling of the interface points within the old programs? I hear that some of you came up with answers to these questions. Perhaps you can point the way to this new data-rational world we all seek. The realization of this promise is still a long way off. The real world of tomorrow will come when the DBMS can contribute to the evolutionary process of applications development and to the full analytical use of computer resources, without exacting an extortionate price; when the DBMS can aid in the evolutionary process of hardware and software change; and when the decision to go to a DBMS is no longer a high risk affair requiring an all-out commitment through one of the most difficult conversion problems to be found in systems. I have no doubt that something \/ery much like this world of tomorrow will appear one of these days, in great part because groups like this one pressed on relentlessly in the definition of problems and the search for creative solutions. -18- 3. ESTABLISHING MANAGEMENT OBJECTIVES Richard L. Nolan CHAIRMAN Biographical Sketch Richard L. Nolan is a researcher, author, and consultant in the management of information Chairman of the Nolan, Norton & , and a former Associate Professor at Business School, Dr. Nolan has to improving the management of data in complex organizations. His include major publications in the systems. As Company, Inc the Harvard contributed processi ng contributions areas of: The four stages of EDP growth Management accounting and control of data processi ng Managing the data resource function Computer data bases Dr. Nolan's experience with the management of computer systems includes earlier associations with the Boeing Company, the Department of Defense, and numerous other large U.S. and European public corporations. PARTICIPANTS Marty Aronoff Richard Canning Larry Espe Gordon Everest Robert Gerritsen Richard Godlove Samuel Kahn Gene Lockhart John Lyon Thomas Murray Jack Newcomb T. Will iam 01 le Michael Samek Steven Schindler Richard Secrest Edgar Sibley -19- 3.1 OVERVIEW "Assimilation of computer technology into organizations is a process that has unique characteristics which management does not have a substantial base of experience to draw upon for guidance. Perhaps the most important unique characteristic is the pace of penetration of the technology in the operations and conventional information systems." [Richard L. Nolan, "Thoughts About the Fifth Staae," Data Base , Fall 1975.] The pace of the assimilation of computer technology into the data processing organization is represented by the S-shaped "Data Processing Learning Curve." The Data Processing Learning Curve is approximated by the growth of the data processing budget and reflects the staged evolution of the data processing environment along four growth processes: Growth process programs #1_,. The po rtf ol io of computer The programs and procedures which are organization in its business activities. The appl i cat ions . used by the Appl i cat ions Portfolio represents the cumulative end product of the data processing organization. Growth process #_2 . The data processing organization and technical capabi 1 i ties . ~ The organization structures and technical c a p a b i 1 i t i e s found within the data processing department which are required to develop and operate application systems. These include: Data Processing Management Structure Hardware and Software Resources Systems Development and Operations Organizations Growth process #_3 . Data processing pi a n n i n g and ma nagement control systems . The set of organization practices used to direct, coordinate and control those involved in the development and operation of application systems, including: Data Processing Planning Project Management Top Management Steering Committees -20- Chargeout Performance Measurement ASSIMILATION OF COMPUTER TECHNOLOGY OCCURS IN FOUR STAGES o BUILDING THE APPLICATIONS PORTFOLIO BUILDING THE DP ORGANIZA- TION BUILDING THE DP MANAGEMENT PLANNING AND CONTROL DEVELOPING USER AWARENESS Functional Proliferation i i 1 Consolidation 1 Data Base Cost-reduction I of Existing on-line Applications I Applications ! / 1 Middle Applications Specialization User-oriented Layering i for Technological Programmers L-Management and Learning •'Fitting" Lax More Lax / 1 Formalized Formal _j*^ 1 Controls Planning and Management Superficially 1 Held Control Effectively "Hands-ofT" Enthusiastic I Accountable Accountable StageJ: Initiation Stage II: Contagion Stage HI. Control Stage IV: Tim* Integration Figure 2-1 Process! nq Learn i n a Curve. Growth process #4_. The user . The members of line and staff departments who must use the applications systems in order to perform their jobs. The nature of Data Base Management Systems (DBMS) dictates that they interface with each of the four growth areas. First, the DBMS acts as the data manager for all types of application systems. Second, the DBMS introduces a -21- sl o Vnr"«%«i»^. "ndit c a ii s i m "- ated by the ^ technology to be ne management considerations be summarized into four key conversions, identified by the panel can concepts : Sfy^I "-^.CONVIimOM ARE A HATTER OF Conversion from a non-data environment is a part of data processing within an matured and is typically base to a data base the natural evolution of Processing department ^icT^f 1 ' " * A data progressed to a Stage III environment to C resDond th to h L 9h h ma1ntenanCe C0StS and an ^abiTity io respond to ad hoc inquiries and reouests fmm user management for intenrated reports This parent "tV*"*', f0rCeS the data J'ocesI g rIs?ruc?ur P tL * T 1 °l • data base technology to restructure the applications portfolio words, conversion to a DBMS of how soon the data base environment shouM t.pain to be constructed, not whether h J a k environment should be ' impl emented " dat * baSe In other is primarily a question KEY CONCEPT #2: CHOOSE APPLICA I IO M CARFFIII I V IHE DBMS CONVERSION The initial application used fo r conversion to data -22- base technology represents an important learning experience for the entire organization. As such, the initial application should: -be a non-trivial application -demonstrate the "power" of the DBMS facilities -be simple to avoid overextension caused by attempting to do too much, too fast KEY CONCEPT #3: TREAT THE INITIAL AND SUBSEQUENT DBMS CONVERSIONS SIMILAR TO OTHER SYSTEMS PROJECTS Al tho techn i n v o 1 the r be mi 1 arge and j same exerc Becau conve coord ccmmi ugh a ology vemen i sk e n i m i z pro u s t i f pro i sed se o r s i o n i n a t e ttees data to t of xposu ed by ject i c a t i ject thro f th , sp CO , sen base the all ar re of manag woul d on pro mana ughout e maj e c i a 1 n v e r s i i o r ma con orga eas thi s ing be cedu geme th or i eff on nage versi o n i z a t i in the conve the co manage res sh nt m e pr mpac t orts a c t i v ment , n i on sys rsio nver d. oul d echa ojec caus shou i t i e and ntroduce and req terns env n effort s i o n as The same be us ni sms s t life ed by a 1 d be s with user are s a u i r e s i ronm can any o pi an ed. houl d cy data made stee as . new the ent , best ther ni ng The be cl e. base to ring . KE_Y_ CONCEPT #4 : PLAN AND STRUCTURE FOR FUTURE DBMS CONVERSIONS NOW; DBMS CONVERSIONS WILL BE_ A WAY; OF rTFT Certainly, once a DBMS is successfully installed, the conversion of applications to that DBMS will continue. However, the mature organization should also plan on converting to another DBMS at some point in time. The second DBMS may be just an enhanced version of the first DBMS, or it may be a totally new software package. In either case, it is certain that a mature data processing organization will want to take advantage of new DBMS facilities and efficiencies; therefore, DBMS conversions will become a way of 1 i f e . To prepare for these continued conversions, the data processing organization can take several steps to minimize their impact: minimize the application system processing logic, program code and data base design dependencies on the features of a particular DBMS ■23- . institutionalize the Data Administration function . fully document all business system functions on an integrated dictionary Es as The overall DBMS conversion philosophy developed by follows" 9 Management Objectives panel can be summari the zed Appreciate the technology , but recog nize that DBMS conversion j_s a management problem: — — — H£1 ^- The panel approached the topic of data base conversion in a chronological manner. As such, the following sections thP ?!:? anize ? ^ reflect management considerations du ?nq the life-cycle of data base conversion efforts: • CONVERSION TO A DATA BASE ENVIRONMENT • MINIMIZING THE IMPACT OF FUTURE CONVERSIONS . CONVERSION FROM ONE DBMS TO ANOTHER DBMS 3.2 CONVERSION TO A DATA BASE ENVIRONMENT A certain level of maturity is necessary before conversion to a data base environment is feasible In general, data base technology is not appropriate for dat, conversi on . Th nn,!? llow1n9 sectl '° n s discuss the impact of con to a DBMS on the four data processing growth pro namel y : 3 b v u versi on cesses , Applications Portfolio Data Processing Organization User Awareness . Data Processing Planning and Management Control Sy stems -24- 3.2.1 Impact On the Application P ent data subst portf the D conve evol u a p p 1 i techn devel are c parti prior a p p 1 i entry above folio base a n t i a 1 olio t BMS. T r t i n g tionary cation ology , opment onverte cul ar ity fo cation - 1 evel topi wi ng se en rest o ta he the ap sy to i s d to appr r c P vi ronm ructur ke adv severa app proach stems a re suspen the n oach onvers ortf ol i c a t i o cs ctio are ns . wTTl ing of the antage of t 1 potentia 1 i c a t i o n , in whi are dev vol u tionary d e d until e ew environ sel ected , ion must i o . With n is of Tscussed i ortfo l io . era I gen organ he en 1 ap portf ch el ope app x i s t i ment . an o be i n c r i t i i zat hanc proa olio new d roac ng a R rder dev thi s cal n greater detail Convers ly nece ion's a ed capab ches th range or r using h , in p p 1 i c a t i ega rdl es ing i n d el oped order i m p o r t a r^ ion to a s s i t a t e a p p 1 i c a t i o n i 1 i t i e s of at permit from an epl acement data base which new on systems s of the i c a t i n q a for the ing, the nee ! The — i n the Approaches To A p p 1 i c a t i o n Conversion Two basic approaches exist for conversion of the application portfolio to a data base environment: revol utionary and evolutionary . Actually, these two approaches represent opposite ends of a continuum of approaches. No single approach is universally best. In fact, more than one approach may be operable within a given conversion; i.e., certain application systems may be converted on a revolutionary basis while other systems are converted in an evolutionary manner. In the revolutionary approach to conversion, sometimes called resystemization , one rewrites and restructures existing appl i cation systems as necessary to operate under the new DBMS. Generally, one should avoid exclusive use of this approach for the following reasons: Risks overextension caused by attempting too fast. too much, Delays in one sub-project may impact others. Insufficient resources may be available for developing new systems during the conversion. At the opposite extreme of the revolutionary approach is the evolutionary approach in which all new systems are developed under the new environment. Existing systems are not converted but rather are replaced at the end of their normal life cycle. This approach reduces the risk of overextension and the impact of delays in sub-projects. However, there are disadvantages to an evolutionary approach: -25- Complex interfaces with existing, conventional systems are generally entailed. Local inefficiencies and redundancy typically result. Current organizational deficiencies and may be perpetuated. constraints Just as the conversion portfolio may be approa revolutionary manner, so may existing application system application system may be con at one time, or theconve The latter approach, in whi functions are converted gr advantage of early availabil data and the new features of flexibility in scheduling However, the gradual appro redundant development and increased management to pro bridges. of ched the I verte r s i o n ch t adual ity the D the ach data vide the i n conver n oth d to t may t he re ly, us of bo BMS. conver has t stora and co e n t i r an e si on er wo he n ake p p o r t i i ng b th c Furth si on he d ge, ntrol e a vol u of rds , ew 1 ace ng r i dg ross ermo i s i sad and the ppl i tion a the envi i n and es, -fun re, pr vant r con cations ary or single, entire ronment phases . update has the c t i o n a 1 greater o v i d e d . age of e q u i r e s version One temporary measure that may be employed to avoid or postpone conversion is to extract data from existing master files in order to build a transient, integrated data base. This data base is not maintained but instead is recreated on a cyclic basis. The data base is used for cross-functional reporting and analysis. This approach provides early availability of c ross- functional data and lends itself to a specialized interrogation language. At the same time, there are certain disadvantages to this approach: Availability of data is achieved at the expense redundancy and reloading. of Problems of timeliness and consistency may be created. Basic application limitations are perpetuated. Maintenance Moratoriums. There __ never sufficient resources, nor is it appropriate, to permit continued maintenance and enhancements of application systems during conversion to the data base environment. Moreover, requires a relatively stationary target. Thus, a on maintenance (or more accurately on may be declared during conversion. the conversion moratori urn enhancement) -26- resu mana for prim degr the mana main norm proc dura unde The decl It of ag gement. the con ary motiv ee of u other h gement, tenance o a 1 b u s i n e s s i n g ma t i o n of r which i a r a t i o reemen Senior v e r s i o ator b ser re and , senior nly so ess f nageme the t may n of t am and n . ehi n si st if man 1 on unct nt m mora be m a ma ong u user Howe d the ance the ageme g as ions, ust j tori u o d i f i in ten a ser, s manag v e r , i conve to a m conver nt wil it In o i n 1 1 y m and ed or nee m e n i o r ement f sen r s i o n ai nte si on 1 tol does e i t h e dete agree cance orato , and supp ior m , the nance i s erate not r cas rmi ne to t lied. ri urn dat ort anag re w mor driv a m i nt e, u the he c must be a proces is n e c e s ement is ill be a tori urn. en by or a tor i u erfere ser and scope ircumsta the sing sary the some On user m on with data and nces A common device for invoking moratoriums is a steering or priorities committee. Composed of data processing and user management, the steering committee is responsible for approving projects and establishing priorities. The steering committee does not manage, nor does it relieve management of its business responsibilities. Rather, it provides a forum for discussion and has power derived from its membership and sponsorship. Analysis of Opportunities . Certain application systems indicate bTtter opportunities for conversion than others. The following types of applications represent good opportunities for conversion: An application system using many different master files and/or many internal sorts, indicating the need to represent complex data structures and to support multiple paths between data. An application with a requirement for on-line inquiry and/or update of interrelated data. A DBMS would still be applicable, although not required, if the data were not interrelated. An application system with chronically heavy maintenance backlogs, suggesting redundant data and/or inflexibility with respect to its data structures . An application system requiring a broader view of data (either more detail or greater cross-functional breadth) . An application which crosses functional or organizational boundaries (e.g., project control). -27- An application which cannot support needs . basic business An application which provides systems. data used by other Certain types of applications represent opportunities for conversion. For example: poor A purchased application which is maintained by a third party supplier. An application which uses historical data and is processed infrequently. which A recently installed application system effective in satisfying user needs. which is An analysis of the characteristics of the existing applications based on the above considerations will yield a preliminary ordering for conversion of the application portfolio. As the conversion is planned in more depth the preliminary ordering will be revised and refined to reflect such factors as precedence relationships regardina conversion, level of effort required, and the availability of resources. Sel ec to a data entry-1 eve a p p 1 i c a t i o mistakes a conversion risk to th will for v i s i b i 1 i ty compl eted considered devel opi ng reducing r ti ng the Entry-level base environment," a 1 application. In a n woul d be sel ect nd learning how to c It woul d have a 1 e business. However ce initial convers , contains some elem quickly. The fac in identifying technical compe i s k and visibility: Applications. In converting "Fey n idi ed a onver ow pr , the ion ent o tors the tence decTs eal s th t and o f i 1 e real of a f r i s list best wh ion is selecting the world, the initial e vehicle for making how to manage the and not present any ities of the world system which has k , and which must be ed bel ow shoul d be opportunity for ile simultaneously The application should be trivial . representative and non- It should be a good DBMS application (though not necessarily the best) . It represents a relatively low risk to the business. -28- It provides sufficient opportunity for learning. It is either an old system or technically obsolete. It provides eventual visibility management controls. as a vehicle for It is "owned" by a vocal, important, but neglected (by data processing), segment of the business. 3.2.2 Impact On the EDP Organization. This section discusses the following topics relating to the impact of conversion on the data processing organization: Organizational considerations. Technical aspects: tools and methodologies. . Data processing personnel skill requirements. Organizational Considerations . Converting to a data base environment qenerally entails reorganization of the data processing function in order to provide the technical and administrative means for managing data as a resource. A key organizational consideration is the need to establish a Data Base Administration (DBA) function within data processing. Conversion will also impact the applications development and computer operations functions within data process i ng . Data base administration . The Data Base Administration TMA)"TiTnction is responsible for defining, controlling, and administering the data resources of an organization The many responsibilities of the DBA function are not discussed in detail here since they are covered extensively in the literature. However, some of the major responsibilities include the following: . Data base definition/redefinition. DBA must have primary responsibility for defining the logical and physical structure of the consulting responsibility. data base, not merely Data base integrity. DBA is responsible for protecting the physical existence of the data base and for preventing unauthorized or accidental access to the data base. Performance monitoring. DBA must monitor usage of the data base and collect statistics to determine the efficiency and effectiveness of the data base in satisfying the needs of the user community. -29- Conflict mediation. DBA must mediate the conflicting needs and preference of diverse user groups that arise because of data sharing. Many alternatives exist for locatinq the DBA function within the overall corporate structure. Three such alternatives include the following: Within the data processing organization, to avoid an application orientation or a on computer efficiency, DBA should, in gen report to Applications Development or Operations, respectively. Rather, DBA sho to the highest full-time data processing e Corporate level. When located at th corporate level, DBA can take a broad vi as a corporate resource. Furthermore, DBA position to resolve conflicts between u When DBA resides at this location, some of technical aspects of the DBA function are performed within the data processing organ In order n emphasis eral , not Computer ul d report x e c u t i v e . e highest ew of data is in a ser areas, the more typical ly i z a t i o n . . Matrix organization. This structure is patterned after the aerospace industry where a given project draws upon all functional areas. In this case, the DBA staff would report functionally to DBA but would also report directly to a project manager. This organizational strategy has the advantaaes of recognizing the integration required for a data base, puts DBA at an equal level with other functional areas, and serves to increase communication during application development. Within the DBA function, the two basic organizational strategies are functional specialization versus application area specialization. Functional Specialization. This strategy organizes DBA according to functions performed, such as data base design, performance monitoring, data dictionary, and so on. This approach has the disadvantage of ensuring that no one person is knowledgeable about all aspects of DBA support for a particular application system. Application Area Specialization. In this approach, one person within DBA is responsible for performing all DBA functions for a particular application area, including both application development and operation. This approach has the disadvantage of developing expertise within functional areas of DBA -30- more slowly. Furthermore, unless controlled, activities within DBA may become fragmented. However, this approach results in an interesting and challenging job and facilitates attracting and keeping capable personnel. A ppl i cations devel opment . Conversion to a environment wTTT aTTect the applications function within the data processing organization ways. The most fundamental impact upon devel opment orientation to a data base devel opment in several appl i cations will be the change from an applications to a data orientation. Conversion to a data base environment should also broaden the scope of the application developers. Specifically, the developers need to understand the basic business processes and to develop application systems that cross organizational boundaries. The application development methodology will have to be modified by delimiting the relative responsibilities of both DBA and applications development. Moreover, the basic approach to application development may be revolutionized as a result of conversion to a DBMS. Specifically, instead of a rigorous approach to application development, the DBMS may permit an iterative or convergence approach. With this approach, user requirements are not defined in detail before developing the application system. Rather, user requirements are defined at a more general level and a system is quickly built using the DBMS. When presented with the system outputs, the user specifies any required changes, which are then incorporated into the system. This process is repeated until the application system satisfies user needs. Note that this approach to application development requires a DBMS in which data base definition, creation, and redefinition and report writing are quickly and easily accompl i shed . Computer operations . Conversion to a data base environment wTTI Trnplct tfie Computer Operations function in two ways. First, many of the responsibilities of computer operations function will be transferred to the newly established DBA function. Second, the characteristics of the application systems may change. Specifically, the DBMS may facilitate the development and operation of on-line applications as opposed to the more traditional batch systems. Consequently, the computer operations function may have to reorganize to operate within this more dynamic environment. Technical Aspects -- Tools and Methodologies . Because the subject of DBMS selection has been covered adequately in the literature, it was not addressed by this panel. However, the following are some of the tools and methodologies typically required in making effective use of -31- the DBMS after its installation: Data dictionary/directory. A tool for organizing, documenting, inventorying, and controlling data. It provides for a more comprehensive definition of data than is possible in the DDL facility of most commercial DBMS's. As such, it is essential for management of data as a resource. Data base design and validation tools. Used to facilitate the design process and to validate the resultant design prior to programming. Included in this category are such tools as hashing algorithm analyzers and data base simulation techniques. Performance monitoring tools. Useful in analyzing and tuning the physical data base structure. These tools provide statistics on data base usage and operation. Application development tools. Used to facilitate the development of application systems, including such tools as terminal simulators which operate in batch mode and test data base generators. Data base storage structure validation utilities. Used to verify that a stored data base conforms to its definition or to assess the extent of damage of a damaged data base. Examples include a "chain walker" utility. Query/report writer facility. Enables users to access the data base and extract data without having to write a procedural program in a conventional programming language. Data base design methodology. Needed to standardize tbe approach to data base design and to provide guidance in using the data base design, modeling, and monitoring tools. Application development methodology. Specifies the standardized approach to developing application systems; i.e., the activities to be performed during the development process and the corresponding roles and responsibilities of each of the various project participants. Of particular importance is the need to define the points in the development process at which DBA and applications development functions must interface and the relative responsibilities of each with respect to application development. -32- Documentation methodologies. Needed by DBA to document data definitions uniformly and to document data base design decisions. Data Processing Personnel Skill impact of conversion to a data base requirements will be considered in this Requi rements . The environment on skill section. Types of skills . The following types of skills in a data base environment: are needed . Data Base Administration. DBA should be staffed with individuals who are strong technically, interface well with people, and collectively are knowledgeable about the DBMS itself, the tools necessary to support it, the application development process, and the corporation and its data. . Logical data base design. Within DBA there is a need for individuals possessing the ability to recognize and catalog data elements, to group related data elements, identify relationships between groups, and to use the data description 1 anguage . . Physical data base design. Within DBA there is a need for individuals knowledgeable with respect to organization techniques, data compression, trade- offs in data base design, simulation, and modeling techn i ques . DML programming. DBA should include individuals with knowledge of the DML and its associated host language, data base navigation, and the currency concept . Acquisition and training . Obviously, the required skills may be developed internally or acquired externally. Hiring the required personnel has the advantage of bringing experience and new ideas into the data processing organization. However, individuals knowledgeable with respect to DBMS are scarce and hence expensive. Moreover, individuals brought in from the outside typically have little, if any, knowledge of the business. Developing skills internally has the advantage of building DBMS skills on top of knowledge of the business. Furthermore, control can be exercised over what is learned and when. Finally, it is generally less expensive and disruptive than hiring. -33- When skills are developed internally, there are several possible approaches to training: In-house. In this approach, sta possessing the necessary skills teac to others by means of courses or jo This approach may fit well with initi development and has no cash cost, mere act of having to teach their sk enhances the knowledge and understa teachers themselves. There disadvantages to this approach: it time of the most capable personnel wh more effectively used elsewhere; it ca where the required skills do not exi as a closed system, it excludes differ vi ew. ff pe h these i n t p r al appl Moreove ills to n d i n g are r e q u i r en they nnot b st inte i n g p o i rsonnel skills ojects . i c a t i o n r , the others of the several es the may be e used rnally; nts of Vendor. This approach utilizes the courses offered by DBMS and support software vendors. Vendor courses may be a relatively inexpensive approach particularly when courses are bundled as part of the purchase/lease price. Furthermore, the internal staff are likely to benefit from the the vendor. However, the courses thinly-disguised sales pitch. may expertise be only of a Other approaches, include: Additional approaches to training -independent educational organizations -colleges or universities -videotape/cassette courses may resul t Turnover. Conversion to a data base environment ,„ UJ ,,,,,, in employee turnover. The new DBMS may be perceived by the staff as being threatening and, hence, may be resisted. This resistance to change may be overcome somewhat by involving the staff in the series of decisions leading to the acquisition of a DBMS. If required skills are obtained through hiring, the existing employees are likely to resent Finally, as , . - - - - ■ r market val ue and it becomes increasingly expensive to retain the staff, mese three factors -- resistance to change, resentment of new hires, a.nd increased employee market value -- tend to increase turnover following conversion to a DBMS environment. wniwuyn luring, trie e x i s i i n g employees are like the high salaries paid to the new employees, the skills of the staff increase, so does their -34- On the other hand, certain factors tend to decrease turnover. Specifically, conversion to a data base environment involves new opportunities for individual growth and excitement such as new technology, new hardware and software, and major development efforts. Properly exploited, these factors can increase job correspondingly decrease turnover. satisfaction and the data 3.2.3 Impact On Planning and Control Systems. With exception of the chargeout mechanism, conversion to a base environment will not affect the basic mechanisms for planning and control. However, recognize that conversion is itself a pr ocess to b£ managed . This entail s applying justification procedures for conversion, planning conversion, establishing review and approval and monitoring progress. the check points, Planning the Conversion , requires the involvement processing management: Planning for the conversion of senior, user, and data Attempts to convert to a data base environment without senior management support runs a high risk of failure. If senior management has not formally authorized DBMS studies or incorporated DBMS planning into corporate plans, the probability of successful conversion is remote. Conversion will have a significant impact on user departments in the form of disruption of normal data processing services, restructuring «of application systems, and a change in orientation on the part of users from ownership to sharing of data. Consequently, user involvement in planning the conversion is critical. Conversion to a DBMS generally affects the structure, system development methodology, personnel skill requirements, and hardware/software configuration of the lead time necessary infrastructure for environment must be accordingly . data processing function. The to develop the appropriate operating in a data base appreciated and planned for Given senior management support for the conversion, one strategy for obtaining the required involvement in the planning process is to establish a steering committee for the data base as mentioned in the earlier section on Maintenance Moratorium. This steering committee contains representatives from both user departments and from data processing and is responsible for controlling the evolution -35- the Data base Steering Committee of the data base. As such, is subordinate to the data processing Strategic Steering Committee, which is concerned with the evolution of the entire data processing function within the enterprise. of the target data represents a model Given the appropriate participation, a necessary first step in converting from a non-data base to a data base environment is the development of an architectural plan for the data base. This plan describes the intended structure base. Conceptually, a data base or image of the organization which it serves. In order for the data base to represent an accurate image of the organization, it is necessary for the data base structure to reflect the fundamental business processes performed in the organization. Consequently, the designers of the data base must first understand the key decisions and activities required to manage and administer the resources and operations of the enterprise. This typically entails a cross-functional study of the enterprise in order to identify the business processes and information needs of the various user departments. The architectural plan permits planning and scheduling the migration of application programs, manual procedures, and people to a data base environment. This implementation plan must incorporate review and approval checkpoints that enable management to control and monitor the conversion process . Controlling the Conversio n. The actual conversion to a data base environment is effected by a project team composed of representatives from user departments, applications development, and data base administration. At formally established checkpoints during the conversion, the data base steering committee reviews the progress of the project. Items reviewed and analyzed include the following: . Projected benefits vs. actual benefits Data qual i ty (i.e., availability) completeness, timeliness, and Projected operating and development costs vs. actual costs Actual costs of collecting, maintaining, and storing data vs. benefits realized -36- Project performance (i.e., performance of the project team against the conversion schedule and budget) Based on the review, the data base steering committee takes the appropriate approval action (e.g., go/no go) with respect to the conversion activity. Chargeback Considerations . The costs of operating in a data base (i.e~ shared data) environment are extremely difficult to charge back to individual users in an equitable manner. At best, complex job accounting systems can only approximate actual resource usage. Moreover, the chargeback algorithm must not be dysfunctional with respect to its impact on the various user departments. Conversion data base environment frequently requires that a department supply data which it does not itself use. chargeback algorithm must reward, not penalize, behavior on the part of the user department. to a user The such an appropriate Some considerations in developing chargeback algorithm include the following: . Consider capitalization of the costs of conversion instead of treating such costs as current expense in order to avoid inhibiting user departments from undergoing the conversion. . User departments typically have little control over the costs of conversion. Consequently, consider treating such costs as unallocated overhead, since allocation will have little effect on the decisions or efficiency of the user departments. . Because ongoing costs of collecting, maintaining, and storing data are difficult to associate with individual users, consider developing percentage allocation factors for these costs based on periodic reviews of data base usage. Alternatively, consider treating these costs as overhead. Resource usage for retrieval and processing purposes are easier to approximate, and such costs should be charged directly to the users. Consider incorporating a reverse charging mechanism into the chargeback algorithm in order to compensate users who supply data which they do not use. 3.2.4 Impact of Conversion On User Awareness. The most fundamental impact of conversion to a data base environment is the required change in orientation on the part of users. No longer are files and applications "owned" by a particular user department. Rather, data must be viewed as a corporate resource to be shared by all user departments. The requirement for sharing constrains the freedom of a user to change arbitrarily and unilaterally the definition of the data . Sharing of data will impact users in a second way. Following conversion to a data base environment, users may be required to supply data which they themselves do not use. As already discussed, the chargeback algorithm must be structured to reward such behavior. Furthermore, suppliers of data in general must be infused with a sense of responsibility (not ownership) for the data in order to maintain data quality. Conversion to a data base environment user community in other ways: may impact the Planning the conversion. User participation in developing both the architectural plan and implementation plan is necessary in order to obtain user commitment and to ensure that the resultant data base satisfies user needs. Disruption of operations. Normal data processing services are likely to be severely disrupted as a result of such factors as limited availability of personnel and maintenance moratoriums. Furthermore, the conversion may disrupt and strain user operations as new and old applications are operated i n p a r a 1 1 e 1 . Resolution of inconsistencies. Creation of the data base typically entails merging of application- oriented files. During this process, inconsistencies in both data definitions and data values are identified. These inconsistencies must then be resolved by the relevant user departments. Structure of user department. The structure of a user department may no longer be effective following conversion; e.g., the user department may be designed around a particular application system. Restructuring application systems during conversion may precipitate user reorganization. -38- New organizational roles. Conversion may cause new organizational roles to evolve in user departments. For example, in order to provide coordination between Data Base Administration and the user departments, a "user data administrator" may evolve. The user data administrator serves as the focal point for participation and involvement on the part of the user department both during and subsequent to the conversion. Systems analysis. By providing such tools as a high-level query language and/or a generalized report writer, the DBMS enables non-technical users to access the data base directly; i.e., users are less dependent upon programmers to satisfy simple requests for information, availability of data may result the systems analysis function to user departments. This increased in the migration of from data processing Personnel skill requirements. Conversion may impact the skill requirements of user personnel. For example, conversion to a data base environment may also result in operating certain application systems online, requiring that the user department acquire or develop terminal operations skills. 3.3 MINIMIZING THE IMPACT OF FUTURE CONVERSIONS The initial conversion to a data base environment is not likely to be the only data base-related conversion that an enterprise will undergo. Rather, conversions of one form or another are likely to be a way of life. However, there are certain measures that the data processing organization can take to minimize the impact of future conversions, including: Institutionalization of the Data Base Administration function. Insulation of programs from a particular DBMS. DBMS independent data base design. 3.3.1 Instit utionalization of the DBA Function. A well- DBA function will minimize the impact of future conversions. Specifically, the DBA function establ ished data base shoul d take action as follows -39- Maintain data definitions and up-to-date data dictionary. relationships in an Document -the structure and contents of all data bases independently of the data description language of the DBMS. Develop methodologies and standards for data base design which are independent of any particular DBMS. 3.3.2 DBMS In dependent Data Base Desian. The the ANSI/X3/S a conceptual extensively amounts to de form independ conceptual mo DBMS selected are predicate clearly dist data. Shoul impl ementatio structural co applications interest, use whether a DBM PARC St data mo in res sign an ent of del to for i m d on th i n g u i s h d the n , it n v e r s i o access of the S or co udy Grou del . Th earch p d docume any part the data pi ementa e charac abl e f r DBMS is p o s s i n s r e q u i i n g the concept n v e n t i o n p on DB e topic apers . n t a t i o n i c u 1 a r descr t i o n , t teri sti om the be ble to red of data ual dat al file MS l n t r o d u has al so In prac of the da DBMS. In i p t i o n la he design c s of the natural s changed focus more the data base . A a model s are to b recent ced th been t i c a 1 ta ba transl nguage d e c i s i DBMS tructu subseq cl ear base s a is ap e used work of e idea of pursued terms it s e in a a t i n g the of the o n s which are more re of the uent ly on and point to the the of propriate 3.3.3 Insulate Programs From the DBMS. Two sets of circumstances may motivate a~n organization to attempt to insulate its application programs from any one DBMS. On the one hand, an organization may be unwilling to commit completely to the use of a particular DBMS. Rather, it may desire to keep its options open with respect to converting to a different DBMS at a later date. Alternatively, a large multi-division corporation may desire to develop common application systems for the divisions; yet it may find that the data processing organizations within autonomous divisions have installed different DBMS's. It is possible to build an interface between application programs and the DBMS in order to isolate the programs from the DBMS. That is, the programs do not interact directly with the DBMS. Rather, standard program requests for DBMS services are translated into the required DML statements either at compilation time or at execution time. Thus, application programs are insulated from changes in the DBMS as long as an interface module can be developed to translate program requests into the DML statements of the new DBMS. Similarly, a single application will execute under any number of DBMS's as long as the appropriate interface modules exist. Furthermore, the multi-division corporation retains the flexibility of standardizing on a -40- single DBMS at a future date. The negative aspects of this approach include reduced system efficiency and the possibility of ending up with a pseudo-DBMS whose capabilities represent the lowest common denominator of the various DBMS's for which interfaces are built or planned. Nevertheless, a number of corporations worldwide have adopted or are adopting this approach. 3.4 CONVERSION FROM ONE DBMS TO ANOTHER DBMS As a data processing organization goes through the experiential learning necessary to assimilate data base technology, the functions and features of the data base management system package currently installed will tend to be more highly utilized. Users will have positive experiences with the facilities offered by the DBMS and will subsequently place greater burdens on those facilities. Also, the technical capabilities of the DBMS will be increasingly utilized by the data processing staff in order to meet user requirements. In short, the tendency to use the full functions of the DBMS over time will place a strain on the capabilities of the DBMS. This is manifested by either decreasing systems processing efficiency or increasing effort necessary to develop systems which meet user needs. These increased costs are recognized by both users and data processing personnel who then initiate a search for increased DBMS capabilities and, thus, begin data base conversion effort. This second type of data base conversion can be characterized by either a complete change in data base management system packages or an upgrade in the version of the DBMS currently installed. This section discusses the impact of the conversion from one data base system environment to another on each of the four growth processes previously discussed. The section is organized as follows: Reasons to go through the conversion. Economic considerations of the conversion effort. Conversion activities- and their impacts. Developing a strategy for the conversion. -41 3.4.1 R e a s o ns for Conversion, discussed, "tTTe" _ _ As has been previously most prevalent reason to undertake a conversion from one DBMS to another DBMS-1 to DBMS-2 Conversion is to install a better DBMS . A better DBMS is usually defined as having: Improved functions (more complete). Better performance. Improved query capability. Development of richer data structures. More efficient usage of the computer resource through decreased cycles and/or space. Improved or added communication functions. Availability of transaction processing. Distributed processing capability. Another major reason to undertake a DBMS-1 to DBMS-2 Conversion is to standardize DBMS usage within the company. Many large corporations are finding that the DBMS selections made several years ago to meet specific application needs have resulted in the installation of several DBMS packages within the data processing organization. The impact of multi-DBMS usage in a single data processing environment is major . For exampl e: Application programs are constrained to the design and processing characteristics unique to each DBMS. Data files are structured to be accessed by a single DBMS. Design and programming personnel develop the skills necessary to implement systems associated with a single DBMS. The multi-DBMS environment results in a substantial investment in data processing personnel technical skills and reduces the potential for integrating applications that operate on different DBMS's. For these reasons many companies are now developing standards for data base management system usage. Those standards are usually application systems to be developed under a single DBMS. Exceptions may exist where the application to be developed is stand-alone in nature with a -42- low potential for integration with other systems. The last major reason for DBMS-1 to DBMS-2 conversion is that such a conversion is dictated by a hardware change. Many of the commercially available DBMS's Tre ofTeTed b"y large mainframe vendors. As such, a move from one hardware vendor to another will necessitate a change in DBMS usage. This can become quite a complex effort in that the source code and data base storage structures of all programs will require changes. If there is a history of hardware conversions in the company, the wise data processing manager should select a DBMS that is not hardware dependent. 3.4.2 Economic Considerations. A i n t r o d conver other to a D DBMS-1 basis in the the ba base c import is e i t a hard uced si on hi g BMS- to a s a com si s onve ant her wa re l n eff h-r i 1 to DBM ny o pany of c rsi o if t bett cha DBMS^ the ha conv rdwa the p ort shou sk syst DBMS-2 S-2 conv ther sys An ec osts and n . The he major er DBMS nge~ th~e on shoul ersi re cha nae . revi 1 d b ems conv ersi terns onom ben econ rea or _s cos d be ous e an proj ersi on s dev ic j efi t omic son tand sect alyze ect . on ef houl d el opm u s t i f s as just for t a r d i z t an i nc d ben 1 uded key i on d an The fort be ent icat soc i i f i c he d a t i o eTTt i n cone is th d ma same . As j u s t i eff or ion s a ted a t i o n ata n of s ass the j ept at the naged conce a re f i e d o t i s houl d wi th is pa base DBMS u oci a te u s t i f i that was data base like any p t applies sul t the n the same justified be made on the data r t i c u 1 a r 1 y conversion sage . For d wi th the cation for The economic justification for a DBMS-1 to DBMS-2 conversion should be based on a succinct, articulation of the COSTS and BENEFITS di rectly associated with the conversion effort. Costs should be identified on an incremental basis and be classified into three categories: One-time conversion costs. Incremental costs for each planned application to be converted to the new DBMS. On-going DBMS support costs. Benefits should likewise be identified on an incremental basis within the same time frames as the associated costs. Benefits are divided into two categories: Discernible/definable cost savings maintenance, and operations. in devel opment , -43- Intangible cost savings. A more complete description of the types of economic considerations to be addressed is contained in DATA BASE DIRECTIONS : THE NEXT STEPS , National Bureau of StTFdlFdT Special Tub 1 i ca t i on 45 1 . Above ill, the justification for _a data base conversion shoul d be devel oped and communicated to management in the" same manner that any~other project is justified . 3.4.3 Conversion Ac tivi ties and Their Impact. The impact of a DBMS-1 to DBMS-2 conversion effort can be felt on each of the four growth processes previously discussed. Many of the types of impacts are the same as those previously identified in a conversion t£ data base technology. Users and data exDeriln^^r^rnr* Sh0Uld reco 9 n ^that many of the same experiential learning processes occur in subsequent data base conversions as they do in the initial effort. Impact On Application Portfol io . The impact of subsequent data base conversions on application portfolios occurs in three areas: Application programs. Data bases . Ca tal ogued modul es . Application programs . Because application programs are buffered from actual data storage structures by the DBMS, the unique characteristics of each DBMS will have a major impact on application programs in the following areas: and mappings (model, . DBMS "call" structures. Programs view of data structure , content) . Application program logic. Data communications. Data bases . Physical data storage structures and logical data relationships are implemented via unioue DBMS utilities and are patterned after distinct DBMS requirements. As such, data bases developed under one DBMS are not readily accessible by other DBMS packages. Specifically, data bases are impacted by the vagaries of data base management systems in the following ways: -44- . Data base definitions in both the DBMS and in the Data Dictionary. Data content and storage format. Use of data base design and simulation aids. Conversion aids. Catalogued modul es . Though processed just as any other program, catalogued modules differ from application programs in their function and method of development. The specific types of catalogued modules which are impacted by a change in DBMS are: Catalogued queries. Catalogued report definitions. Catalogued transaction definitions. Im previou data b process the Dat structu process ex peri e have a be req it s h o u will 1 by DBA the dat DBMS-2 pact On sly d i ase en i n g or a Base re ha i n g en nee, th major i u i r e d 1 d be r i k e 1 y e personn a proce conver s the Data Processing Organization scus vi ro gani Admi s vi ro e co mpac as t ecog xi s t el . ssi n ion sed nment z a t i o n i s t r al rea nment nvers t on he f u n i z e d as t Othe g env will in the , the m n struct a t i o n org dy been during ion from it. Only notions o that a he new DB r o r g a n i z i ronment be minima secti on ajor im u r e is t a n i z a t i o i n t e g r a the i one DBMS procedu f the DB substant MS techn a t i o n a 1 as a res 1 . on co pact he im n. B ted n i t i a to a ral f MS ch ial ology struc ul t o nver on pi em ecau i nto 1 noth i ne- ange 1 ear i s ture f th As si on to the entatio s e this the data e r will tuning Howe n i n g c a s s i m i 1 change e DBMS- was the data n of DBA data base not will ver , urve ated s i n 1 to The major organizational impact throughout both data processing and users areas is likely to be in the technical and functional education required before, during, a"nd after tKe conversion effort. Data processing and user personnel in all areas of systems development and operation will have to be trained on the new aspects of the DBMS. Training programs for all people should be identified and initiated in advance of the conversion implementation. Another major area of impact on the data processing organization from a data base conversion is the modifications in documentation necessary to accommodate the new DBMS environment. Changes in documentation will occur in the following areas: -45- DBMS functional and technical support (reference) documentation . Functional and technical descriptions of any application systems converted onto the new DBMS. Physical and logical descriptions of any data bases converted. . User-oriented descriptions of application systems processing characteristics. . System development methodology documentation that references particular aspects of data base or application development. Impact On Data Processing Management Control Systems. As previously discussed, Data Processing Management Control Systems comprise those sets of procedures regularly used to control both systems development, and operations functions. The conversion from one DBMS to 'another is not going to modify the conceptual framework used to control the data processing environment. However, specific changes will affect the mechanics of control : Data processing budgeting and user chargeback . The chargeback algorithm used to charge users for data processing services is likely to change because of modifications in: DBMS overhead (cycles). DBMS space requirements. methods of implementing logical relationships. ownership of data items. differences in efforts required to design and implement application systems. differences in methods used to structure ad-hoc queries and periodic reports. methods of charging end-user cost centers for the one-time costs of conversion. The time-frame of allocating these charges can also be important (one lump sum vs. periodic payments). Systems devel opment methodology. 46- There will be changes in the time frame and types of effort required in systems development. Conceptual approach to developing systems may change due to total effort or time frame required to generate sample reports on test data bases. Design procedures in the methodology not likely to change if DBMS facilities are similar; only jargon should change in documentation. standards by are regul arly functions and monitors and new processing Data processi ng performance measurement . Systems development and operations which data processing personnel measured should change due to new especially to a new learning curve. Computer operations performance standards will change due to technol ogy . In tegri ty control . Adequate data base backup should be carefully analyzed and managed during the conversion process. Operational restart/recovery procedures will change due to new DBMS functions or utilities. Processing of data exceptions may differ. Securi ty control . Differences in methods of data access security should be recognized. Data manipulation restrictions may vary from DBMS-1 to DBMS-2. Pr i vacy control . Where appropriate, special care should be taken that all privacy disclosures are logged during a data base conversion per recent government regulations. Impact On User Areas . A key note of data base conversions Ts that the conversion shoul d be as transparent as possible to user areas . This axiom holds that the processing Tmpac t on user areas should be held to a minimum and that the* necessary technical capabilities to support the conversion should reside in the data processing area. -47- The impact of the data base conversion effort should be readily apparent to users regarding: Functional improvements (e.g., new query language). Increased data content (e.g., "while we are changing, let's add ..."). Increase in sharing of data will highlight data inconsistencies, validations, and format errors. Possible planned disruption of services during conversion period. Data ownership changes. User mental images or expectations may change. Archival data capabilities may change (e.g., meeting the needs of IRS, EEO, etc.). 3.4.4 Developing a Conversion Strategy. The well-managed data processing installation should carefully articulate a data base conversion strategy and plan before initiating any conversion effort. Specifically, the ttJ u o s- O. c o •r- ro to I S- tO CD > QJ C J- o 3 (_> •r- rtj Li- -P (T3 O +-> O) o -111- Since this step seeks to create a data structure of the source data without system-dependent information, one can consider the mapping between the input and the output of the reformat process to be generally one-to-one. While this step looks simple functionally, its actual application and implementation can be quite complex. For example, an application program may use the high order bits of a zoned decimal number for its own purposes, knowing that these bits are not used by the system. Such specifications of nonstandard item encodings present a difficult problem in data conversion. of the unload Note, however, that the use of a common data form provides additional benefits, such as easing the portability problem. The load process is the counterpart process and needs no further clarification. The restructuring process undoubtedly represents the most complex process of a generalized data conversion system. The languages for this mapping process can differ widely (for example, some procedural and other nonprocedural) and the models used to represent the data in the conversion system are also quite divergent. (For example, some use network structures; others use hierarchical structures). More will be said on this topic later in this section. Let us now turn to discuss the issue of implementation briefly. Generally, there are two techniques: an interpretive approach or the generative approach. In the interpretive approach, the action of the system will be driven by the descriptions written in the system's languages via the general interpreter implemented for the particular system. In the generative approach, the data and mapping descriptions are fed into the compiler(s) which generates a set of customized programs executable on a certain machine. Later in this section we'll discuss the merits of each of these approaches. Turning our attention to the tools that have been developed for data conversion, we shall first discuss currently available tools and then the research and development work in progress. Available Conversion Tools . Currently, available tools have limited capabilities. Because it is impossible in this short report to provide an exhaustive survey of all the vendor-developed conversion tools, we will highlight the spectrum of capabilities available to the user by providing examples from specific vendor shops. -112- The repertoire of vendor conversion tools begins at the character encoding level of data conversion with the provision of hardware/firmware options and continues through the software aids for conversion and restructuring of data bases. Depend devel op s Probably th foreign f i the direct such as E within COBO its capabi some do not process mi Behymer and a general c knowl edge t transl ati on i n g on a d i oftware to e most prev 1 e process reading or BCDIC tape L. Althoug 1 i t i e s are handle un xed mode Bakkom [DT onversion b here are no tool s . v e r s i t ol s v al ent i n g a i wri ti n s or ha r neve 1 abel e data 11], w ridge vend y of c a r i e s file co d . Th i g of a Honeyw e 1 a t i v e rthel es d tape types . h i c h wa with a or sup ondi ti from n v e r s i s type partic ell Se 1 y wi s 1 im s , wh As id s a i m e partic ported ons , the vendor t on tool i of facil ul ar cl as ries 200/ despread i t e d . F o i 1 e o t h e e from t d toward ul ar vend general need o ven s a C ity al s of f 2000 f facil r exam rs ca he wor achie or , to i zed to dor. 0B0L 1 ows i 1 es i 1 es ity, pie, nnot k of ving our file In co have been data base conversion Honeywel 1 . and the whole data adopted . base into type conve allows the but not include t h fields (I which are restructur sophistica ntrast t devel o e n v i r o aid i Becaus fact th process The fi the I-D- r s i o n s a data ba opt imal 1 e genera -D-S/II al 1 ocate ing of ted capa o the ped t nment s th e of at th ing rst S/II nd po se to y. t i on requ d i n the b i 1 i t abo hat • e I the e us shop step form inte be Addi of t ires step I-D i es ve f have One -D-S 1 arg ere , a i s at , r me proc tion he "Pr 1 b -S/I of I il e t thei exam /II m e vol annot co- to r maki chan i essed al s a d d i t ior & ut no I (C -D-S/ ransl r mai pi e i g r a t umes affo exist eform ng t sm ad in t teps i o n a 1 Head t fi oexi s II. a t i o n n app of a ion a of rd to ence at th he n apti o he I i n I-D er" c lied tence tool 1 icat da id pr data shut appr e I-D ecess ns ., -D-S/ this -S/II hai n in) ) to s , t i ons ta o v i d e i nvo down oach -S/I ary This II m migra poi point and the ool s i n a base d by 1 ved hi s was data data step ode, tion nter ers , the more Some data base restructuring tools specific to a particular DBMS have been developed by DBMS users. One example of this type of tool is RE0RG [Rll], a system developed at Bell Laboratories for reorganization of UNIVAC DMS-1100 data bases. RE0RG provides capabilities for logical and physical reorganization of a data base using a set of commands independent of DMS-1100 data management commands. A similar capability has been developed at the Allstate Insurance Company. -113- In addition to the above, there are also software companies and vendors who will do a customized conversion task on a contractual basis. Data Conversion Prototypes and Model s . Over the past seven years, a greaT deal of research on the conversion problem has been performed, with the results summarized in Figure 6-4. The University of Michigan, the University of Pennsylvania, IBM, SDC , and Bell Laboratories initiated projects, as well as a task group of the CODASYL Systems Committee. In many cases, interaction and cross- fertilization between these groups led to some consensus on appropriate architectures for data conversion. The individual achievements of these groups is discussed below: The CODASYL Stored - Data Descri pti on and Trans! ation Task Grou task grou Descri pti o data tran invest i gat annual Wor ref ormul at Task Grou devel opmen 1 evel s of the group an example cl ass of stored-dat at and d i 1 evel s . p_ In 1 p ( o r i n Lang si ation ion of kshop i ed as p and t of i mpl erne speci 1 angua logic a defi n str ibut 970, the C g i n a 1 1 y uage Task The g the area i n Houston the Stored presented a detai 1 ed ntat i on [S f i es the d ge for des al and p i t i o n 1 a n g ed to the ODASYL cal 1 ed Grou roup n the [SL1]. -Data a g model L4,DT2 ata co c r i b i n h y s i c a uage a access Systems the p) to st presented 1970 SIGM In 1972 Descri pti eneral a for desc ]. The m nversi on g and tr 1 struct 1 1 ows dat path , en Commi ttee Stored udy the p i ts 0D (then , the g on and Tr pproach r i b i n g da ost recen model and a n s 1 a t i n g ures [SL a to be cod i ng , a formed a Structure robl em of initial SIGFIDET) roup was a n s 1 a t i o n to the ta at al 1 t work of presents a wi de 8]. The described n d device The Un i versi ty of Mi chi gan . The nonprocedural approach to stored-data definition set forth by Taylor and Sibley [SL3,6] provided one of the major foundations for the development at the University of Michigan (see Figure 6-4) of data translators. In concert with Taylor's language, Fry, et al. [DTI] initiated a model and design for a generalized translation. The translation model was implementation of the Michigan tested in a prototype Data Translator in 1972 [UT2,4], and the results of the next implementation, Version I, were reported by Merten and Fry [DT4]. In 1974, the work of the Data Translation Project of the University of Michigan focused on the data base restructuring problem. Navathe and Fry investigated the hierarchical restructuring problem by developing several levels of abstractions, ranging from basic restructuring types to low level operations [R6]. -114- Eh s S o H Eh a, o Eh CO J o >h H a. CO > 03 a O Q o 03 O •• 03 H < CO W O 60 m cd o 3 to 60 m §^ H cd 4-> t> aj §9 rH (I) CU m 13 O -P 2 CO I H C ft o O -H O •H U fii -POO ft co co cu 13 CU s o o o CO C\J 3 co -p Eh Q M 2 Cm H CU be cd C 3 o 6C ■H rf P> 3 cd J > -P ■P O a) P 4J o cd M ft ft O >h 2= Eh ^ M C3 CO H 03 03 £S H 2 w E < 0) J3 1 60 D 3 -p o •H Eh O M > o s ft cm ft H J D < >h 03 J s CU o c H CO CO Eh < CO 3 03 CO o JO o £3 •H •H o a o O to CO -p OHM w •H M CO O O Eh g -P CU CO §3 cd > ft CU Eh o x: O H ■H 13 a i — i £h fc Cm o c cd J CO w •H cd O CO a O cd M I 1 1) -p s ft ft cd CU ft -p CO 13 H cd -h o c— o\ CU U 3 H -P cd o o 3 •h u 60 -P O CO CU 13 O 2 60 r—i CO >> J3 -H - ft ft ro ftj O cd co +3 «■— I H ON -p M ft co ■h cd Sh Eh 13 JS cu o M cd i—i aj £3 Sr< o a 3 cu O 13 M O cd M cu - . C co _=r CU m i_l 60 CU CO Cm a o o CU O M 2 13 CM 0\ o cu -p U H o o ■H M p Cm ft cd H H i — i 13 W cd -3- CU S3 •H Eh ■P cd -p a O M G - ■H -P CU -3- fH 3 * -P cd ct 1 ro 10 p CU Eh cu cd co D 60 £3 •H M 3 -P o 3 M P 1 13 VO O 03 2 i—i Fi gure 6-4 Historic Context of Data Conversion Efforts -115- ac eg o (X ft M a co w o ■p CO 1-3 Ih a 5 (D > a S CO pq o ft O < H >H jgs H <^ M > m J cc >H ft m > a H a § ft ft o SH 2 Eh 3 M O CO H ft W ft O > H Cfl PC ft X CU O s •rH V cd -P c 4! &•-? CU U"\ H H ft EH 6 O M i — i CU s !h o a o -H tn^~- s o -p a h li 2 a) CO •H o ft — ' 5 Sh O 3 X! ft l — 1 -p 3 p -p o °3 U CU cu M co O CO 3 -P E-i ■*~3 co Sh (3 CO I ' o o -P o cu Sh -p co CO -p -P Sh ft CO > h ft ro 3 -P 3 3 ■H H cd o -p o U EH U -P O -H o Q Eh O 3 -P ^ ^ cd CO I 1 CD cd ft -p -P ft 0) P en 3 bO cd H D 1) C cd a « ^ e 3 ■H M § O ft H P C ■H fc ft cd cd 60 P O E r-\ ft ■H cd > -H CO ft ^ -p § o o CU CU Tj ■p 3 0) 6 3 ft O -H r-H -P (U cd !> rJ CU CO 6H T5 3 ft cd ft •P U 9 3 -P>— • cu t— « dH fn -P Eh 3 cd O O 73 ' ' ft cd 3 ^ cu ti cu P 3 cd O ft 1 — 1 o O CO Ti ft cu a ft CO +j a cu i — ■ o ^ T3 cu M) o cd U O S -P ■H !n cd ■3 ft -d t) cu cu M O ft Tj ■P •H CU & cd U 3 3 P> O O c— Figure 6-4 (continued) Historic Context of Data Conversion Efforts -116- Later, Navathe proposed a methodology to accomplish operations using a relational normal form for the int representation [DT12]. Version II of the Michigan Translator was designed to perform hierarc restructuring transformations, but the project did implement it. Instead, the research was directed int complex problem of restructuring network type data b To address this problem, Deppe developed a dynamic model--the Relational Interface model--which simultane allowed a relational and network view of the data [UR3]. This model formed the basis of the Version design and implementation of generalized restruct capabilities [UT8,9,10]. Another component necessary the development of a restructurer was the formulation language in which to express the source to target transformations. This language, termed Transl Definition Language (TDL), evolved through each trans version beginning with a source-to-target data item "e list" in the Version I Translator to the ne restructuring specifications of Version IIA. Whil initial version of the TDL was quite simplistic, the cu version, the Access Path Specification Language [DT16 provides powerful capabilities for transforming network bases . these ernal Data hical not the ases . data ousl y base IIA ur i ng for of a data a t i o n 1 ator quate twork e the rrent ,R9], data The work at of Penns descri pti 1 anguage storage (TDL) [SL the logi SDDL. I mappings, Fol 1 owi ng 1 anguage- programs reports was prov transl ato system, a rsi ty of Pennsylvania. " t Mich' ^ U n i v e tne Univer sTT y - o ylvania (see F on approach and (SDDL) for defin devices, and a 2,DT2] and three cal , storage , an n order to de a first or from this work driven "generat to perform the on the uti 1 izati ided by Winter r developed by nd appl ied it to i gure devel i ng st tran 1 evel d phys scribe der c , Ram ive" conv on of s and Ramire conve igan, 6-4) oped a orage si atio s of ical , the al cul u irez transl ersion genera Dick z, the rting Concur Smith at al so stored- of data n descr data ba are desc source- s 1 angu [DT3,6] ator whi One 1 i zed tr ey [TAI y i n s t a 1 IBM 7080 rent the took data on wi t Un i v a defi sec i pti on la se struc ribed usi to-target age was impl emen ch create of the a n s 1 a t i o n ] . U s i n led it on files. h the ersi ty data n i t i o n ondary nguage tures , ng the data used . ted a d PL/1 first tool s g the their IBM Research , San Jose In 1973, another major data translation research endeavor was initiated at the IBM Research Laboratory in San Jose, California. Researchers in this project — initially Housel , Lum, and Shu, later joined py bhosh and Tayl or— adopted the general model as specified in Figure 6-1 but made several innovations. First, in the belief that programmers know well the structure of the data in a buffer being passed from a DBMS to the application program, the group concentrated its effort on designing a -117- data description language appropriate for describing data at this stage. Second, regardless of the data model underlying any DBMS, the data structure at the time it appears in the buffer of an application program will be hierarchical. The general architecture, methodology, and languages reflecting these beliefs is reported in Lum et al.[DT14]. In addition, the group in San Jose felt that, while it is desirable to have a file with homogeneous record types, it is a fact of life that many of today's data are still in COBOL files in which multiple record types frequently exist within the same file. As a result, the group concentrated on designing a data description language which can describe not only hierarchical records (in which a relational structure is a special case) but also most of the commonly used sequential file structures. This language, DEFINE, is described by Housel et al-[SL7]. The philosophy of restructuring hierarchies is further reflected in the development of the translation definition language CONVERT, as reported by Shu et al [R2]. This language, algebraic in structure, consists of a dozen operators, each of which restructures one or more hierarchical files into another file. The language possesses the capability of selecting records and record components, combining data from different files, built-in functions (e.g., SUM and COUNT), and the ability to create fields and vary selection of the basis of a record's content ( a CASE statement) . A symmetric process occurs at the output end of the translation system. Sequential files are created to match the need of the target loading facility. The specification of this structure is again made in DEFINE. A prototype implementation, originally but renamed XPRS, is reported in [DT15]. called EXPRESS System project re System Deve avoid the (i.e., i n d e 1 i ke) they i nvol ved . and load systems . reformatter to a standa f i 1 e had reformatted on the prob Deve! opment Corporati o ported by Shoshani [R 1 opment compl e xes , po chose In part ( genera However s from rd form to be to and 1 em of n An by Shoshani [R3,4] Corporation in 1974-1 xities of storage str inter chains, invert e to use existing facil icular, they advocated te) facilities of d , when such facil it the source ( e.g . , in and from the standard used. Given that from a standard form, logical restructuring other restruct was performed a 975. In orde ucture specific tabl es and ities of the sy the use of ata base manag ies do not e dex sequential form to same o data bases ca they concent of hierarchical uring t The r to a t i o n the stems query ement xi st , file) utput n be rated data -118- bases in this form. The the res Transl ati simple, speci fyi n fields) exampl e , of sourc r e p e t i t i o level in source an In a d d i t INVERSION rel ations extensi ve field v a 1 anguage 1 ocal , t combinati further "semantic i n c o n s i s t data base language u tructuring on Langua For the g a m a p p i n of the sou whi 1 e a DI e items to n of a sou the tar d target f ion, t h e r operator, hips to field res 1 ues coul speci fi cat here is t ons that d component a n a 1 y s i e n c i e s be sed in the above project fo functions (called CDTL- ge) was designed to be most part, it provides g from a single field (or co rce to a single field of the RECT would specify a one-to target items, a REPEAT woul rce item for all instances get hierarchy. In both ca ields need to be mentioned a e are more global operations which causes parent/depe be reversed. The system a tructuring operators, wher d be manipulated according ions. Since most of these o he possibility that they cou o not make sense globally. of the system was built s," which checks fo fore proceeding to genera r specifying -Common Data conceptual 1 y functions for mbination of target. For -one mapping d specify the of a 1 ower ses , onl y t he s parameters. , such as the ndent record Iso supported e individual to prescribed perators are Id be used in Therefore, a to perform r possi bl e te the target Bell Laboratories Th ADAPT under drive 1 angu struc compu the the t data c r i t e data . (A D deve n by age on ture t a t i o n target ransfo to p ria ca At a Parsi 1 opment , two hig e d e s c r i b of the s whi 1 e p data . rm a t i o n s roduce t n be spec ng an i s h-1 ev es th data a r s i n The which he t i f i ed e Bell d Tran a gen el 1 an e phys and g the second are t arget to ap Labs sf orm eral i guage i cal to p sourc lang o be data ply t data a t i o n zed s [DT and 1 rovid e da uage appl i E o the tran syst trans 17]. o g i c a e var ta a is us ed t xtens sour si ati em) , 1 a t i o With 1 fo ious nd g ed to o th i ve v ce an on sy curre n sy the f rmat tests enera desc e so a 1 i d a d ta stem ntly stem i rst and and ting ribe urce ti on rget Two processing paths are available within the ADAPT system: a file translation path and a data base translation path (see Figure 6-3). A separate path for responds to real-world considerations: conversions do not require the capabilities high overhead involved in using a data path . file translation many types of and associated base translation Rel ated Work Additional research effort examines the development and acceptance of a standard interchange form. An interchange form would increase the sharing of data bases and provide a basis for development of generalized data translators. The Energy Research and Development -119- Administration (ERDA) has been supporting the Interl aboratory Working Group for Data Exchange (IWGDE) in an effort to develop a proposed data interchange form. The proposed interchange form [GG2] has been used by several ERDA laboratories for transporting data between the laboratories. Additional work on development of interchange forms has been pursued by the Data Base Systems Research Group at the University of Michigan [UT14]. Navathe [RIO] has recently reported a technique for analyzing the logical and physical structure of data bases with a view to facilitating the restructuring specification. Data relationships are divided into identifying and nonidenti fyi ng types in order to draw an explicit schema diagram. The physical implementation of the relationships in the schema diagram is represented by means of a schema realization diagram. These diagrammatic representations of the source and target data bases could prove to be very useful to a restructuring user. 6.2.2 Application Program Conversion. So far, we have concentrated on the data aspects of the conversion problem; it is necessary to deal as well with the problems of converting the application programs which operate on the data bases. Program conversion, in general, may be motivated by many different circumstances, such as hardware migration, new processing requirements, or a decision to adopt a new programming language. Considerable effort has been devoted to special tools such as those to assist migration among different vendor's COBOL compilers, and general purpose "decompilers" that have been developed to translate assembly language programs to equivalent software in a high level language. While progress has been made developing special purpose tools for a limited program conversion situation, little progress has been made in obtaining a solution to the general problem of program conversion. With this fact in mind, this section focuses on the modifications to application programs that arise as a consequence of data restructuring/conversion. Probl em Statement . There are three types of data bases which can at fectap plication programs: alterations to the data base physical structure, for example, the format and encoding of data, or the arrangement of items within records changes to the data base logical structure — either : a. the deletion or addition of access paths to accommodate new performance requirements, or -120- b. changes to the semantics of data, for example, modification of defined relationships between record types or the addition or deletion of items within records migration to a new DBMS, perhaps encompassing a data model and/or data manipulation language different from the one currently in use The actual impact application programs is independence provided by t Data independence and i problem are discussed el incomplete data independe of program conversion is r schema changes. In fact, management systems provi insulation from a variet data base, protection from the semantic level--is changes 1 ikely to have a programs include: of a he D ts sewh nee equi whe de y of 1 og mi n pro these f unct ata B rel at ere and t red i reas appl modi ical imal . found da ion a se ions [GG3 hat n re most icat f ica chan E ef ta of t Man hi p ]. ther spon com ion tion ges- xamp feet base he a agem to t We efor se t mere pr s to -par 1 es on change mount of ent Sys he conve a ssume e some d o data ial data ograms the phy t i c u 1 a r 1 of sem appl ic s on data terns . rsion here egree base base wi th s i c a 1 y at antic a t i o n Changes in relationships between record types, such as changing a one-to-many association to a many-to- many association or vice-versa. Deletion or addition of data items, record types, or record relationships. Changing derivable information ("virtual items") to explicit information ("actual items") or vice-versa. . Changes in integrity, authorization or deletion rul es . Various properties of data base application programs greatly complicate the conversion problem. For instance, many data base management systems do not require that the record types of interest (or possibly even the data base of interest) be declared at compile time in the program; rather these names can be supplied at run time. Consequently at tne compile time, incomplete information exists about what data the program acts on. Other troublesome problems occur when programs implicitly use characteristics of the data which have not been explicitly declared (e.g., a COBOL program executes a paragraph exactly ten times because the programmer knows that a certain repeating group only occurs ten times in each record instance). Complexity is introduced whenever a data manipulation language is -121 intricately embedded in a host language such as COBOL. The interdependence between the semantics of the data base accesses and the surrounding software greatly complicates the program analysis stage of conversion. Because of these considerations, substantial research has been devoted to alternatives to the literal translation of programs. In particular, some currently operational tools utilize source program emulation or source data emulation at run time to handle the problem of incomplete specification of semantics and yet still yield the effects of program conversion. Current Approaches . In this section, we discuss two main techniques currently employed in the industry. These techniques are commonly used but unfortunately not documented in the form of publications. DML Statement Substitution The DML substitution conversion technique, which can be considered an emulation approach, preserves the semantics of the original code by intercepting individual DML statement calls at execution time, and substituting new DML statement calls which are correct for the new logical structure of the data base. Two IBM software examples which provide this type of conversion methodology are 1) the ISAM compatibility interface within VSAM (this allows programs using ISAM calls to operate on VSAM data base), and 2) the BOMP/DBOMP emulation interface to IMS. This program conversion approach becomes extremely complicated when the program operates on a complex data base structure. Such a situation may require the conversion software to evaluate each DML operation against the source structure to determine status values (e.g., currency) in order to perform the equivalent DML operation on the new data base. Generalization of this approach requires the development of emulation code for the following cases: maintain the run time descriptions and tables for both the original and new data base organizations, intercept all original DML calls, and utilize old-new data base access path mapping description (human input) and rules to determine dynamically what set of DML operations on the new data base are equivalent to each specific operation on the source data base. Although a conceptually straightforward approach, it has several drawbacks. The drawbacks can be categorized as degraded efficiency and restricti veness . Efficiency is degraded primarily because each source DML statement must be mapped into a target emulation program, which uses the new DBMS to achieve the same results. The increased overhead in program size and/or processing requirements can be significant. -122- emu! a capab the upon permi seman c o n t i can b wh i c h e q u i v task chang progr The d tion i 1 i t i model the s s i b 1 tics nue t e qui th al enc of d e i n ams w rawback approach es of th ing of ol d pr e new da of the s o execut te compl e data e. Ther etermini the data ill be a of res i n h i b e new the o ogram ta str ource e i n t ex , ev str efore , ng if model n exte trie its DBMS Id m sem uctu prog he s en uctu i n a ch ) w nsi v t i v e n the u and/ ethod anti c res t ram i ame m for re some ange ill e tas ess come til i za ti or data s. Addi s limit hat must f the so anner . the 1 im changes instance in data support k. s about b on of the structur t i o n a 1 1 y s the support urce prog Note that ited sit preserve s , just t structure a set ecause incre e thr depend sets all of ram is the r ua ti on sema he 1 im (give of so the ased ough ence of the to ul es of nti c ited n no urce some tech requ data reco sour reco same not refl segm appl Bridge times r ni que , i rement base t nstruct ce pro nstruct result modi f i ect and ent mu i c a t i o n Program The ef err the s are hat p i on gram ed p s tha ed . upda st b prog ed to as sourc support ortion o is done is the ortion o t would Of cour te , and e prepa ram . sec the e a ed by f the by m n al f the occur se , a each red ond met Bridge P ppl icati reconst source eans of lowed t source if the reverse simul a before hod i rogram on p r u c t i n data b "bridg o ope data b source m a p p i ted s it is n use Metho rogram g from ase ne e prog rate ase to data ng is ource need today is d . In this ' s access the target eded. Data rams." The upon this effect the base were required to data base ed by the This approach suffers from the same types of disadvantages inherent in the emulation approach. Efficiency problems for complex/extensive data bases and programs performing extensive data accessing can make this method prohibitively expensive for practical utilization. This technique is generally found as a "specific software package" developed at a computer installation rather than as a standard vendor supplied package. Current Research . Differing from the emulation and bridge program approaches, current research aims towards developing more generalized tools to automatically or semi- automatical 1 y modify or rewrite application programs. The drawbacks of the existing approaches described above can be avoided by rewriting the application programs which would take advantage of the new structure and semantics of a converted data base and by using a general system to do the conversion rather than using ad hoc emulation packages and bridge programs. -123- Research on application program conversion is still in its infancy. Consequently, very few published papers on this subject exist. This section describes a handful of works in the order of the dates of publication. Mehl and Wang [PT6] presented a method to intercept and interpret DL/1 statements to account for some order transformations of hierarchical structures in the context of the IMS system. Algorithms involving command substitution rules for various structural changes have been derived to allow the correct execution of the old application programs. This approach works only for a limited number of order transformation of segments in a logical IMS data base. Since it is basically an emulation approach, it has the drawbacks discussed in the previous section. appl i chang attem autom progr main appl i data execu trans trans these to d trans A pape cation es resu pt was a t i c o ams du points : cati on variab t ion p 1 a t i o n forma ti operat escribe 1 ati on r by S program 1 ting f made r semi- e to da 1) the program 1 e rel a r o f i 1 e , operat ons are ors. Th the o statemen u [P con rom to i autom ta ba nee i n c 1 u t i o n s etc . ors req e ide perat ts is T12] v e r s i a da d e n t i ati c se ch d fo ding P ; an to ui red a of ions al so give on a ta b fy t conv anges r ex the a rogra d 2) det to a the u of so prop sag s rel a ase tr he task e r s i o n . The tensive n a 1 y s i s m-subpr the ermi ne ccount se of a urce qu osed . enera ted ansfo s req of pa per ana of p ogram use wh for t com e r i e s 1 mod to da rmatio ui red appl stres lysis rogram str of da at he eff mon 1 and t el of ta base n . An for the i c a t i o n ses two of an 1 ogic , ucture , ta base program ects of anguage he data An in res Schi ndl code t 1 anguag 1 anguag nested that e al gebra r e 1 a t i o express new p express that a through appro ponse er [PT empl at e--DML e mac code t ach o An nal e ion to rogram ion b a 1 eve curre ach t to 10]. es , sta ros) . empl a ne co appl i xpres acco i s ck in 1 of nt pr o the tr a data The app which a tements Appl i tes. Th rrespond cation s i o n , t r mmodate genera to code 1 ogical ogrammin ansfo base roach re p ( rou c a t i o e cod s to progr ansfo the d ted tempi data g tec rmation o restructu is based redef i ned ghly ana n progra e templat an operat am is t rmat i ons ata base by mappi ates. Th i ndepend hnol ogy . f DB ring on se 1 ogo ms c es or i hen are rest ng i s a ence TG-1 was the quen us an b are n th map perf ruct the ppro may i ke pr propo conce ces o to as e writ d e v i s e rel a ped i ormed uring , trans ach su be ac ograms sed by pt of f host sembl y ten as ed so t i o n a 1 nto a on the and a formed ggests hi eved The work by Su and Reynolds [PT15] studied the problem of high-level sublanguage query conversion using the relational .model with SEQUEL [Z5] as the sublanguage, DEFINE [SL7] as the data description language and CONVERT [R2] as the translation language. Algorithms for rewriting the -124- source query were derived and hand study, query transformation is d translation operators which have been data base. The purpose of this effects of the CONVERT operators on Only restricted types of SEQUEL qu This work demonstrates that a genera system should separate the data mode factors from the data model and schema and an abstract representation of pro semantics of data translation operator that data conversions at the logic type which changes the data base sema level can be attempted. s i m u 1 i c t a t e appl ie work high e r i e s 1 pro 1 and i n d e p gram s s need 1 evel n t i c s ) ated . d by d to th was to -1 evel were co gram c schema endent emantic to be (espec and In this the data e source study the queries, nsidered . onversion dependent factors ; s and the sought so i a 1 1 y the the DBMS Two i Su and L approach t former wo semantics various e network) u mapped in program se (called a entities a appl ied on changes in transforme intermedia which is used for t modified generation semantics ex pi ic it t appl i c a t i o conversion to accoun as the DBM ndpend iu [P o the rk i s ( a con xi sti n sing d to an mantic ccess nd ass the a troduc d rep te re d i c t a t he tar by an of ta of b o the n pr metho t for S 1 eve ent work T13] an appl icat based ceptual g data i fferent abstra s in t pattern c i a t i o n bstract ed by th resentat presenta ed by th get data optimi rget pro oth the conversi ogram dol ogy d data con 1 . s carr d Hou ion pr on t model ) model s schem ct re erms s) th s . Tr repres e data ion i tion e exte base . zation grams . sourc on sys anal ys esc rib v e r s i o ied out ab sel [PT14] ogram conv he idea can be mo ( rel ation as . Appl i presentati of the p at can b ansformati entation b t r a n s 1 a t i s then m (cal 1 ed a rnal model This rep componen This wor e and targ tern and be is and t ed is for n at the 1 out the same time by take a more general ersion problem. The that the same data delled externally by al , hi erarchical and cation programs are on which represents rimitive operations e performed on data on rul es are then ased on the types of on operators. The apped into another ccess path graphs) and specific schema resentation is then t and used for the k stresses that the et data base be made used as a basis for ransformation . The program conversion ogical level as well Housel extends the work on application migration undertaken at the IBM San Jose Laboratory. This work uses a common language for specifying the abstract representation of source programs as well as for specifying the data translation operations. The language is a subset of CONVERT with some of Codd's relational operators [GG4]. The operators of the language are designed to have a simple semantics and convenient algebraic properties to facilitate program transformation. They are designed to handle data manipulation in a general hierarchical structure called a "form" as well as relational tables. In this system, -125- prog oper mode oper reco i nve it i base f unc the (the refe by a stat prog exi s prob ram transformation is dictated by the ations applied to the source data base 1 assumes that the inverse of these ators exists; i.e., the source data nstructed from the target data base by rse operators on the target data base. data mapping The proposed data mapping base can be applying some More precisely, s assumed that M'(T) = S where S is the source data , T is the target data base, and M is the mapping tion. Thus, program conversion is done by substituting inverse M'(T) into the specification language statements abstract representation of the source program) for each rence to the source data base. This process is followed simplification procedure to simplify the resulting ements (the target abstract representation of the ram). The author points out that the assumption on the tence of M'(T) restricts the scope of the conversion 1 em handled by the proposed approach. Presently, the Data Base Program Conversion Task Group (DPCTG) of the CODASYL Systems Committee is investigating the application group conversion problem. The group is looking into various aspects of the problems including decompilation of COBOL application programs, semantic changes of data bases and their effects on application programs, program conversion techniques and methodologies, etc . To convers progres impl erne program Current possi bl compl ex the dat underta can be all . semi -a u manual thi s ion is s has n t a t i o n conver rese e for i t y of a ha s b ken to done se A full torn at i c convers date , th stil 1 very to be m The si on are m arch i n d i some typ program c een modifi determi ne mi -automat y automati systems o ion would e wo much ade probl u 1 1 i t cates es o onver ed . what i c a 1 1 c too r sys be a rk at t befor ems u d i n o tha f da si on Furth can b y, an 1 is terns more on he e of us t ta depe er r e do d wh hard whi real a p p 1 i c a t esearch s e can s automatic nd extrem rogram c convers io nds on ho esearch ne automa at cannot to a c h i e ch provi istic goa i on tage tart app el y onver n , b w dra needs t i c a 1 be ve. de a 1 . program and more actual 1 i c a t i o n compl ex . s i o n is ut the stical 1 y to be ly, what done at Building ids for Current Research Directions . The current research has uncovered several problems which need to be investigated further before the implementation of a generalized conversion tool can be attempted. The following issues are believed to be important for future research: Semantic Descri pt i on of Based on the work Data Base and Application a se ana App Programs Based on the work by Su and Liu [PT13J and~the study of the DPCTG group, it is quite clear that a program conversion system would need more information about the -126- semant i the i n DBMS, i m p o r t a appl ica a p p 1 ica needs t of a pp changes program convers Several [Mll.SM on this c propert formation Semantic nt sour tion prog tion pro be cond 1 i c a t i o n to data s , and ion whic exi st 1,2,3,DL2 subject. i e s of the provided informat ce for rams which gram conv ucted to 1 programs , bases a 3 ) derive h account i n g wo r 9] may pro source by th ion of determi is the ersi on ) model 2) stu n d t h e i transf for t ks on vide a g and targe e schema the dat ning th real bo probl em. and descr dy the me r effect ormation he mean data ood basis t data b s of the a bases e sema ttl eneck Future i b e the a n i n g f u 1 on ap rul es fo ingf ul base for fut ases than exi st i ng is an n t i c s of of the research semanti cs semantic pi i cat ion r program changes . semantics ure works Equi val ency convers data ba perform source retr iev program rel atio not cle program o r i g i n a source without we can and tar the sou ion se . the progr e, d bee ns m ar at gene 1 int data 1 os i estab get p rce a may A co id am o el et ause ay al 1 rate ent ca ng t lish rogr nd t of S al ter nverted entical n the s e , or some have be how we d by a of the n be he o r i g the eq ams bas arget d ource the appl ope ource upda reco en ch can conve sourc recon inal u i v a 1 ed on ata . and sema i c a t i ratio data te t rds anged prove rsi on e pro struc data ence the Ta nti c on pr n on . Fo he s may in d , in syst gram . ted rel at rel at same rget Progra conten ogram the ta r exam ame da be de ata co genera em st i Nat from ions a ion be effort ts of t may or rget da pie, it ta as t 1 eted n v e r s i o I , that I I pres ural 1 y , the ta nd occ tween t they ms he s may ta a may he s and n . a t erve if rget urre he s hav Data ource not s the not ource data It is arget s the the data nces , ource e on techn trans 1 angu a us trans decom progr compi i s t produ conta i n h i b progr chang envi r the p into De compi 1 at ion lque wh formed i age or able la formati o p i 1 a t i o n am to 1 e r s is hat the ce a fun the the c has DBMS onmental rogram d a form a i n it am ed ereby nto a an ab nguag n to proc a 1 a a com dec c t i o n DBMS onver the en con u r i n g pprop Progra a dat n opera stract e 1 eve a hi ess an nguage p i 1 atio ompi 1 at ally e , data si on pr same " i vi ronme d i t i o n s the pr riate t m co a ba tion repr 1 i gher d t 1 ev n pr i on qui v mo oces nten ntal sho oces o th nvers se ap ally esent n a ord he p el a ocess to a al ent del s . t" wh CO uld b s of e new i on v pi ica equi a t i o n con er 1 roces pprop . Th high pro and d That ile b n d i t i e eas compi syst ia d tion val e and vert angu s o riat e un er o gram ata is, eing ons . ily ling em . ecompi 1 progra n t h i g then r ed fo age 1 e f retu e to co d e r 1 y i n rder 1 a that depende the unrel a The i ncorpo the pr a t i o n m i s her et urn rm . vel rni ng nvent g co nguag does n c i e s decom ted t ch rated ogram i s a f i rst order ed to The i s a the i o n a 1 ncept e can not that pi 1 ed o the anged i nto back -127- Some researchers think that this would be the preferred method to effect DML/host language program conversion. It should avoid many of the efficiency/ restrict ion drawbacks inherent in current automated methods, while being more cost effective and less error prone than current manual methods (e.g., program rewrite). to u the DML i s t orga deve DML prog Stru towa 1 eve ini t prop One 1 ik s e it to programs rel ated be exp ni zati on 1 opment rel ated rams tha ctured rd the 1 1 angu ial cone osed by e 1 y d i conve may h code ected of of str code t are tempi a devel o age i epts o the Un sadvanta r t exist ave to f i n a str because DML/host uctured shoul d converti tes mig pment/se nto whi f data b i v e r s i t y ge to this m ing data bas irst be manu uctured form of the ambig 1 anguage p programmi ng provide a bl e by the ht al so prov 1 ect ion of ch programs ase program of Michigan ethod is that in order e application programs ally al tered to pi ace at. This disadvantage uity inherent in the rograms. However, the templates designed for means for creating decompilation method, ide the needed insight an appropriate high can be compiled. Some templates have been [PT10]. Conversi on Aids A system conversion anal ysts would se feasible task. Given the inf semantics of the data, a application programs to 1) segments which are affected inefficient code in the pro execution profile [GG1] wh computation time required at and 4) detect, in some cases, on the programmer's assumptio records, record or file size together with some on-line ed speed up the manual convers in 2 and 3 would be useful target programs and the da conversion analyst to el imina programs which makes the pro automatic) extremely diffic referencing than that usuall can assist the conversi o ramifications of changes to product is the Data Correlat produced by PSI-TRAN Corpora used during a conversion proc a data base structure cha effected data base items in t generated by the compiler t needing changes. A more comp would be a much better tool, which provides assistance to em to be a practical tool and a ormation about data changes and system can be built to analyze identify and isolate program by the data changes, 2) detect grams, 3) produce a program ich gives an estimate of the different parts of the program, the program code which depends n of data values, ordering of , etc. The data obtained in 1, iting and debugging aids, would ion process. The data obtained for producing more efficient ta obtained in 4 would help the te the "implicit semantics in gram conversion task (manual or ult. A more complete cross- y produced by today's compilers n analysts in identifying programs. An example of such a ion and Documentation system tion. One technique, sometimes ess that has been initiated by nge, is to alter the names of he DDL only and use errors o locate those program segments lete cross-referencing system if it were available. -128- Opt i mi zation of Target conversi on , occur. This new access conversion . have the ch program. The time may de during progra general i ty , [PT13.14] co equivalent of do a global o converted pr related to pr mul ti pi e acces i s because redu paths may be In this situati oice of selecti efficiency of pend on the se m conversion, some program nvert small s DML statements pti mi zation or ogram. Techni ogram conversio Program As the result of data s paths to the same data may ndant data may be introduced or added in the course of data on, a conversion system will ng a path to generate the target the program during execution lection of optimized access path Also, for reasons of achieving conversion techniques proposed egments ' of programs or the separately. It is necessary to simplification to improve the ques for program optimization n need to be investigated. 6.2.3 Prototype Conversion Systems Analysis. This section analyzes tne state ot the art ot generalized data conversion systems. It summarizes what has been learned in the various prototypes. The prototypes have yielded encouraging results, but some weak points have also emerged. A section below lists some questions that remain to be answered and comments on additional features that will be necessary to enhance usability. The following section on Analysis of Architecture analyzes some implementation issues which can affect the cases where a generalized conversion system can be appl ied . Where Do We Secti"on 5". 772 have some of these tests were Stand. The prototype systems described in been used in a few conversions. While made on "toy files," a few of the tests involved data volumes from which realistic performance estimates can be extrapolated. This section will summarize the major tests that were done with each of the prototypes. The Penn Translator The translator developed by Ramirez at the University of Pennsylvania [DT3,6] processes single sequential files to produce single sequential target files. Facilities exist for redefining the structure of source file records, reformatting and converting accordingly. Conversion of the file can be done selectively using user- defined selection criteria. Block size, record size, and character code set can be changed, and some useful data manipulation can be included. The translator was used in several test runs on an IBM/370 Model 165. The DDL to generated PL/1 code expansion ratio was 1:4, so coding time was reduced. -129- A further test of the Penn Translator was conducted by Winters and Dickey [TA1]. An experiment was conducted comparing a conventional conversion effort against the Penn Translator (slightly modified). Two source files stored on IBM 1301 disks under a system written for the IBM 7080 using the AUTOCODER language were converted to two target files suitable for loading into IMS/VS. Much of the data was modified from one internal coding scheme to another. The conversion required changing multiple source files to multiple target files. The conventional conversion took seven months versus five months for the generalized approach, a productivity improvement of roughly thirty percent. Time for adapting the translator, learning the DDL, and adapting to a new operating system is included in the five month figure. Without these, an estimate of three months was made for the conversion using the generalized approach. The SDC Trans! ator The translator described in [R3,4] was Tmp 1 emenTell during 1975-1976. The translator could handle single, hierarchical files from any of three local systems--TDMS, a hierarchical system which fully inverts files; DS/2, a system which partially inverts files; and ORBIT, a bibliographic system which maintains keys and abstracts. Data Bases were converted from TDMS to ORBIT, from TDMS to DS/2, and vice-versa, and from sequential files to ORBIT. TDMS files were unloaded using an unload utility. Target data bases were loaded by target system load u t i 1 i t i e s . The total effort for design and implementation was about three man-years. The system was implemented in assembly language on an IBM/370 Model 168, and occupied about 40 K-bytes, not including buffer space which could be varied. The largest file tested was on the order of 5 million characters and the total conversion time was about 1 minute of CPU time per 2.5 megabytes of data. The work was discontinued in 1976. The Honeywel 1 Trans developed at Honeywe performed file conversio f i 1 es from IBM, Honey 8200 sequential and inde fields could be change alignment. New fields c could be permuted wit (fixed, variable, blocke compare utility was av of files with different 1 ator The prototype by Bakkom and ( one file to 11 ns well 6000, Honeywel xed sequent ial file d as well as field ould be added to a hin a record. F d, etc.) could be ail abl e for checki n f i el d organi za ti ons file tra Behymer one f i 1 e) 1 2000, Ho s. Data t justificat record and ile record changed g the cons and enc nsl ator [DT11] among neywel 1 ypes of ion and fields format and a i s t e n c y odi ngs . -130- Tests of up to 10,000 records were run. Performance of 15 milliseconds per record was typical (Honeywell Series 6000 Model 6080 computer). The prototype has been used in a conversion/benchmark environment but has not been offered commercial ly . The Michigan Transl ator Version IIB, Release 1.1 of the Michigan Translator was completed for the Defense Communications Agency in October 1977 [UT16]. It offers complete conversion/restructuring facilities for users of Honeywell sequential, ISP, or I-D-S/I files. Up to five source data bases of any type may be merged, restructured or otherwise reorganized into as many as five target data bases, all within a single translation. Data Base description is accomplished by minor extensions to existing I-D-S DDL statements. Restructuring specification is easily indicated via a high level language. Tests performed to date included a conversion of a 150,000 record I-D-S/I data base with a total elapsed time of 24 hours (500 milliseconds per record). A given translation can be broken off at any point to permit efficient utilization of limited resources and also protect against system failures. The user is provided with the capability of monitoring translation progress in real time. XPRS Test cases with functionally duplicating conventional methods. Se Each case involved at there was a requirement t file, match with instan redundant or unwanted dat structure in the output, for conditional actions b all cases, the XPRS lang adequate to repl icate the of at least fifty per debugging time was achiev several thousand record in that XPRS can restruct be delivered from dire performance comparisons w programs with custom writ the XPR earl i e veral ca 1 east t o sel ect ces in a a , and b In sev ased on uages we convers cent i n ed . Tes s. Perf ure data ct acce ere made ten prog S system ha r real con ses have b wo input fi some inst nother file u i 1 d up a n eral cases fl ags wi thi re found to ion. A p r total analy t runs wer ormance was at 1 east a ss storage comparing rams . ve focussed on versions done by een programmed. 1 es . General 1 y , ances from one , eliminate some ew hierarchical there was a need n the data . In be functional 1 y oductivity gain sis, coding, and e conducted on deemed adequate s fast as it can No detai 1 ed XPRS-generated Questions Remaining To Be Answered . Given that several prototype data translation systems are operational in a laboratory environment, there is a little question concerning the technical feasibility of building generalized systems. The remaining questions pertain to the use of a generalized system in "real world" data conversions involving a wide variety of data structures, very large data volumes, and significant numbers of people. Three major -131- questions to be resolved are 1. Are the generalized systems enough to be used in real conversions, what will it take to make them compl ete? functionally complete and if not, functional ly 2. Can the people involved in data conversions use the languages? What additional features are necessary to enhance usability? 3. Overall, what is the productivity with the generalized approach? gai n avai 1 abl e Within the next year, prototype systems will be exercised on a variety of real-world problems in data translation, and concrete answers to these questions should be available. The systems being further tested for cost- effectiveness are the Michigan Data Translator, the IBM XPRS system, and the Bell Laboratories ADAPT system. To date, preliminary results have been promising. A significant sample size on which to do analysis of productivity gain should be available at the end of the year of testing. A number of factors must be measuring the cost-effectiveness translator versus the These factors include: taken into account in of the generalized data conventional conversion approach. ease of learning and using the higher level languages which drive the generalized translators; availability of functional capability to accomplish real-world data conversion applications within the generalized translators; overall machine efficiency; correctness of results from the conversion; ability to respond in timely fashion to changes in conversion requirements (conversion program "mai ntenance" ) ; debugging costs; -132- ability to provide "bridge back" of converted data to ol d appl ications ; ability to provide verification of correctness of data conversion; capabilities for detection and control of data errors . The languages used to drive generalized data translators are high-level and non-procedural; they provide a "user-friendly" interface to the translators. Since the languages are high-level, programs written in them have a better chance of being correct. Experience to date with DEFINE and CONVERT, the languages which drive XPRS, has shown that users can learn these languages within a week; it has also shown that some practice is necessary before users start thinking about their conversion problem in non- procedural rather than procedural terms. In early test cases, the languages which drive generalized data translators have been found to be functionally adequate for many common cases. In those cases lacking a feature, a "user hook" facility is often provided. However, forcing a user to revert to a programming language hook defeats the purpose of the high level approach, and interfacing the hook to the system requires at least some knowledge of system interfaces. Thus, high level languages must cover the vast majority of the cases in order to succeed; otherwise, users will perceive little difference over conventional approaches. Facilities for detecting and controlling data errors in the generalized systems are very important, and most of the prototypes do not yet do a complete job in this area. However, the generalized packages offer an opportunity for generalized, high level methods for dealing with data errors during conversion, and it could well be that once these error packages are developed, they will contribute to even larger productivity gains than have been experienced to date. The high-level language approach to driving generalized translators should provide the ability to respond to changes in conversion requirements with relative ease. Since large conversions often take one or more years, it is not unusual for the target data base design to change or for new requirements to be placed on the conversion system. In other words, in a large conversion effort, the programs are not as one shot" as is commonly believed. In large conversions, the savings in could be significant. conversion program maintenance -133- Generalized systems can also be used to map target data back to the old source data form, assuming the original conversion was information-preserving. This capability provides a means for verifying the correctness of the data conversion. In addition, this capability can be used as a "bridge back" to allow users to continue to run programs which have not yet been converted against the data in the old format. Using a generalized system in this way allows phased conversion of programs without impacting user needs during the conversion period. In an environment where a generalized translator is used regularly as a tool for conversion, costs associated with the debugging phase should be decreased. Common, debugged functional capabilities will be utilized, whereas it is unusual in the conventional approach for common conversion modules to be developed. Thus, each new conversion system requires debugging. Usabil ity The usability of generalized data translation systems must also be evaluated. Experience to date indicates that the languages are easy to learn and use. However, it would be wrong to think that these prototypes are mature software products or that they can be used in all conversions. This section discusses some of the unanswered questions with respect to usability of the current data conversion systems. One question concerns the level of users of the generalized languages. Current prototypes have been used by application specialists and/or members of a data base support group. The systems have not yet been used by programmers, and the question remains whether programmers (as opposed to more senior application specialists and analysts) will be able to use the systems productively. There is no negative data on this point; the systems have not been used widely enough. At prese expl ici tl y t using a spec descri ption they resembl However, the which can be file with h system should desc ri ption , a system-COBO Data Transl at an interface evolve. Not nt , all the s he source da i a 1 data d e languages ar e statements writing of t tedious becau undreds of f be able to such as thos L macro 1 ibra or [DT16], it will be avail e, however, ystems re ta to be scri ption e general in the he descri se a pers i el ds . I make us e e x i s t i n ry. As e is reaso abl e as that a qui re acces 1 an 1 y ea COB ption on ma deall e of g in v i den nabl e data data a user t sed by th guage. sy to 1 ea OL Data is a man y have to y, a data an exi a data d i ced by th to expec c o n v e r s i d i c t i o n a o describe e read step These data rn and use; Division. ual process describe a conversi on sting data ctionary or e Michigan t that such on systems ry or COBOL -134- macro library link may not necessaril Data in current systems is not alwa to be converted. This is especially files. In these files, data definiti the record structures of the programs depends on a knowledge of the pr Even with existing data bases, some f may not be fully defined within description. Thus, the user can exp manual effort in developing data de documentation is incomplete, this can task, though it probably must be don a generalized package is used or not. y solve th ys fully eno true with no on often is , and a full ocedural pro i el ds and a the system ect a certai f i n i t i o n s . be a time e regardless e problem, ugh defined n-data base embedded in definition gram logic, ssociations data base n amount of If existing consumi ng of whether Another area where a user may have to expend effort is in the unload step of the data conversion process. The data description languages used to drive the read step have a limited ability to deal with data at a level close to the hardware (e.g., pointers, storage allocation bit maps, etc.). Generally one assumes that a system utility program can be used to unload source data and remove the more complex internal structures. Another alternative is to run the read step on top of an existing access method or data base management system with the accessing software removing the more complex, machine dependent structures. While acceptable alternatives in a great many environments, including most COBOL environments, some cases may exist where neither approach will work. For example, a load/unload utility may not exist, or a file with embedded pointers which was accessed directly by an assembly language program might not be under the control of an access method. For these cases, the user is faced with complexity during the unload step. The complexity associated with accessing the data would appear to be a factor for either the conventional methods or for the generalized approach. However, in cases such as those above, some special purpose software may have to be developed. Note that some research LSL8] has examined the difficulty of extending data description languages to deal directly with these complex cases and has concluded that providing the description language with capabilities to deal with complex data structures greatly complicates implementation and has an adverse affect on usability. Thus, special purpose unload programs will continue to be required to deal with some files. Analysis of Architectures . This section discusses some ot the different approaches that have been taken in implementing the prototype data conversion systems. The objective is to analyze some of the performance and usability issues raised by the prototypes. more data more the -135- Two ap generative interprets v the genera output file program gen programs a conversion . the target programs a approach , restructuri interpreted proaches have been approach in the Pe e approach in the M tive approach, a d s, and restructuri erator. From these re generated to In both the Pen language for the ge re then compiled tables are buil ng description, to carry out the d used in the nn Translator a ichigan Data T escription of t ng operations d esc ri pt ions , accompl i sh n Translator an nerator. The and run. In t t from the These tabl ata conversion. prototypes--a nd XPRS, and an ranslator. In he input files, is fed to a special purpose the described d XPRS, PL/1 is generated PL/1 he interpretive data and/or es are then In data conversion systems, as in other software, an implementation based on interpretation can be expected to run considerably more slowly than one based on generation and compilation. Initial experience with prototype data translators has shown that there is much repetitive work, strategies for which can be decided at program compilation/generation time. Also, there is a good deal of low level data handling, such as item type conversions. Thus, those implementations based largely on an interpretive approach run more slowly, and the ability to vary bindings at run time does not appear to be necessary. Interpretation was chosen in the prototypes for ease of implementation, and in the future it can be expected that a compilation-based approach or a mixture of compilation with interpretation will be the dominant implementation architecture. However, for medium scale data bases, the machine requirements of the interpretive data conversion prototypes are not unreasonable, and overall productivity gains are still possi bl e. on t can case ran been coul this comp whic and/ conv 1 ang expe conv Performance measurements with conversion systems based he generative approach indicate that generalized systems be quite competitive with customized programs. In one , the program generated by the data conversion system slightly faster than a "customized" program which had written to do the same job. However, this example d well be the exception and it would be naive to expect in general. The reason generalized packages can be etitive is that they often have internal algorithms h can plan access strategies to minimize I/O transfer or multiple passes over the source data. "Customized" ersion programs written in a conventional programming uage often are not carefully optimized, since the ctation is that the programs will be discarded when the ersion is done. -136- A second architectural difference involves the use of an underlying DBMS or not. In both the Penn Translator and XPRS, the generated PL/1 program, then executing, accesses sequential files, performs the restructuring, and writes sequential files. On the other hand, the Michigan Data Translator functions as an application program running on a network structured data base management system. Thus, the interpreter makes calls to the underlying DBMS to retrieve data during restructuring and puts restructured data into the new data base . The two approaches offer different tradeoffs. For example, the Michigan Data Translator can make use of the existing extraction capabilities of a DBMS and perform partial translations easily. In addition, since it operates directly within the network data model, a user does not have to think of "unloading" data to a file model and then reloading it back; rather, the user describes a network to network restructuring much more directly. to a data probl DBMS diffi model the i s ba r e q u i conve can b On the ot data ba transl ato em--the of the da cul t . I of the d data mode sed. Al s re more rsion sys e importa her hand se, the r i m p 1 i non-data ta conve t can be a t a b e i n 1 of the o , the u on-line terns can n t in v e , whe use o es a bas rsion diff g con DBMS se of sto be m ry 1 a n con f an sec e dat syst i cul t verte upon an rage, ade t rge d vert in underl ond o a must em , wh , for d diff which under wher o run ata ba g non ying rder be c ich m examp ers s the lying eas tap se co -data DBMS data onver ay or 1 e , w i g n i f conve DBM the f e-to- nvers bas as pa con ted i may hen t icant rsion S ma i 1 e o tape . ions. e data rt of a version nto the not be he data ly from system y al so riented This In the future • ~. 1WUJ .s.uuo wi V.UMVCISIUII buudiions. ror example, it possible to interface the "file-oriented" conversion systei to run as application programs on top of existing data ba: one can expect that data conversion systems will offer a variety of interfaces to accommodate various kinds of conversion situations. For example, it is »ms ig data base management systems. It is also possible to develop "reader programs" to load non-data base data into conversion systems based on a DBMS. In addition, more automated interfaces to data dictionary packages can be expected in order to improve usability and obviate the need for multiple data def i nitions . One possible performance problem with generalized conversion systems lies in the unload phase. For reasons of usability, generalized conversion systems usually rely on an unload utility program to access the source data, thus isolating the conversion package from highly system specific data. A potential problem with this approach is that the unload package may not make good use of existing access -137- paths or may tend to acce which assumes that the (with respect to overflow data is badly disorgan which accessed the dat considerably faster, and the only feasible way to how common this case argument that the "spe interfaced to the gene practical standpoint, the badly disorganized data more sophisticated unload as part of the generalize ss the source data in a fashion data has recently been reorganized areas, etc.)* In cases where the ized, a customized unload program a at a lower level might run for very large data bases might be unload the data. It is not clear is, and one can usually make the cial" unload software could be ralized package. However, from a unloading phase on a very large, base is a performance unknown, and utilities may have to be developed d packages. Summar for maj or c Expectation packages b acceptabl e conventiona to improve areas of d data dictio program an more--are c wi 1 1 be rea y Detailed performance and productivity figures onversions should be available in about one year, s are that machine efficiency of the generalized ased on a generation/compilation approach will be (no worse than a factor of 2) when compared with 1 conversion programs. Additional enhancements usability can be expected, especially in the ata error detection and control and interfaces to nary software. If the savings in conversion alysis and coding times — often fifty percent or onfirmed, then the generalized conversion systems dy for extensive use." 6.3 OTHER FACTORS AFFECTING CONVERSION In this section we look at the conversion problem from two aspects. First, we address the quest ion--What can we do today to lessen the impact of a future conversion? Second, we look to the future to see what effects future technology and standards will have on the conversion process. 6.3.1 Lessening the Conversion Effort. In order to identify guidelines for both reducing the need for conversion and for simplifying conversions which are required, one must consider the entire application software development cycle because poor application design, poor logical data base design, inadequate use of and inappropriate DBMS selection could each lead to an environment which may prematurely require an application upgrade or redesign. This redesign could, in many cases, require a major data base conversion effort . -138- The set of guidelines specified below is not intended as a panacea. Instead, it is meant to make designers aware of strategies which make intelligent use of current technology. It is doubtful that all conversions could be avoided if a project adhered strictly to these proposed guideliines. However, adherence to the principles set forth by these guidelines could certainly reduce the probability of conversion and, more importantly, simplify the conversions that are required. With respect to application design and implementation, the more the application is shielded from system software and hardware implementation details, the easier it becomes for a conversion to take place. For example, a good sequential access method hides the difference between tapes, disks, and drums from the application programs which use the access method. The logical data base design should be specified with a clear understanding of the information environment. A good logical data base design reduces the need to restructure because it actually models the environment it is meant to serve. Introduction of data dependencies in the data structure should, if possible, be kept to a minimum. An analysis of the tradeoffs between system performance and likelihood of conversion should definitely be made. Selecting the wrong or non-optimal data base management system, given the application requirements, is also problem which can lead to unnecessary and large efforts. The prospective user of a DBMS example, carefully evaluate the data characteristics of a proposed DBMS. a key conversion shoul d , for independence The underlying principle of the guidelines which follow is that decisions can be made at the system design and implementation stages which are crucial to the stability of theapplications. Appl ication Design Guide! ines. the r the 1 base such will base , what answe under at th Requirements equi r ong-t desi as wh the what a"re red a stand e out ements erm eff gn as at func data are th the per t this the set in Analysi s ana 1 y si s ectivenes wel 1 as t i o n s doe base ser e possibl formance stage. I i n format i order Many of stage of s of the appl i c a t s the a ve and h e future constrain t is esse on e n v i r o to lesse the d syste appl i ion p ppl ic ow wi uses ts of n t i a 1 nment n th eci si m dev c a t i o rogra a t i o n 11 th of the that asm e pr ons made el opment n system ms) . Que requi re ey use th the data appl i c a t i the de uch as po obabil i ty during affect ( data s t i o n s , who e data and are signer s s i b 1 e that > on -139- frequent conversions will be necessary. and Phys v iew infl seco the i s envi envi set exi s Requirements analysis should focus o should minimize constraints be in ical environment since it can disto of the application system's tru uence of the physical environment sho ndarily, in order that the designe resulting compromises to the logical not intended to imply that considerat ronment is unimportant. Indeed, ronment is ignored, the effect could of requirements that are impossibl ting physical and cost constraints. n information needs g imposed by the rt the desi gner' s e objectives. The uld be considered r be f ul ly aware of requirements. This ion of the physical if the physical be development of a e to meet within Program Design Guide! ines Three otivate this discussion oT appli m They underlying principles cation program design. are design for maintainability design for the application data independence Keeping sight of all of these during the design of the application program will lessen conversion effects by rendering the application as free as possible from physical consi derat i ons . Designing for maintainability implie application should be written in a high-level a syntax that permits good program structure programming techniques such as top-down prog implementation should be used throughout. The be modular with relatively small, functio programs. The programs should all be well organized for readability. Design reviews walkthroughs also help to expose errors in design and "holes" in the application log stage. It has been well documented that thes ease making program modifications. s that the language with Structured ram design and system shoul d nal 1 y oriented commented and and program the overall ic at an early e steps hel p One error which is often made in designing programs in a DBMS environment is to let the capabilities of the DBMS drive the design rather than the application. This design error can yield programs which are unnecessarily dependent upon the features of a specific DBMS. For example, in System 2000 one can use a tree to represent a many-to-many relationship instead of using the LINK feature. The parent/child dichotomy that results is an efficient but -140- arbitrary contrivance that cannot easily be undone later on. ine key principle here is to concentrate on what results are desired rather than on the implementation details of achieving these results. Simplicity and generalization of the design will provide a very high level of interface to the application programmer which will, in turn, minimize the total amount of software, provide the greatest degree of portability, maintainability, devide independence, and data independence. of Of extreme importance in program design is the notion data independence; i.e., insulating the application program from the way the data is physically stored. Layered Design That is, designing the application as a series of layers each of which communicates with the system at a different level of abstraction. One can visualize this as an "onion," ire ;er he level of abstraction. One can visualize this as an "onion with hardware as its core and layers of successively moi sophisticated software at the outer layers. The use interacts with the outermost skin of the onion, at tl highest level of abstraction. 1 aye to u than i ntr the main that abov subj prog cri t If application programs are written rs of the onion, then these programs nderstand and, therefore, easier to programs written at lower laye oduction of a new mainframe will requ software which references the p frame. However, since the layers a physical machine and device indep e some level, only the software bel ect to modification. To the exte rams stay at the outermost layers ical layer) reduced conversion effect at the outermost are smaller, easier modify or convert rs. For example, ire conversion of articulars of the re constructed so endence i s real i zed ow that level is nt that application (i.e., above the s can be achieved. We can thus summarize the goal follows: of program design as to provide the highest interface to the program to maximize character!' sties possibl e appl i cat ion program independence from the of the mainframe, peripherals, and data base organization -141- . to maximize portability of the application program through the use of high-level languages to maintain a clean program/data interface Programming Techniques The previous sections of this chapter have focused on the design decisions which should be made to alleviate the conversion problem. However, regardless of how noble these goals are, poor implementation decisions can go a long way towards diminishing the returns of a good design. Equally important to intelligent design is a set of programming techniques and standards which prohibit programmers from introducing dependencies in code. For example, a "clever" programmer may introduce a word size dependency in a program by using right and left shifts to effect multiplication and division. Of course, there are no hard and fast safeguards against using tricky coding techniques; an effort must be made to make the programmer conscious of the consequences of this kind of coding. In particular, a programmer should not be allowed to jump across layers of the onion, such as using an access method to read or write directly data bases. Data Base Design . Perhaps the most costly mistake a designer can maTe is an error in the data base design because it has a direct effect on the information that is derivable and the application programs that are created. Incorrect or unanticipated requirements can lead either to information deficient data bases or overly complex and general design. An inadequate logical design has the potential for complex user interfaces or extremely long access time. A poor physical design can lead to high maintenace and performance costs. Unfortunately, data base design is still an art at the present time. Two surveys report the results in the area to date. Novak and Fry [DL26] survey the current logical data base design methodology and Chen and Yao [DL34] review data base design in general. The work of Bubenko [DL31] in the development of the CADIS system and the abstraction and generalization techniques of Smith and Smith [DL29,30] show promise. An accurate logical design can still be unnecessarily data dependent. Dependencies are inadvertently or deliberately introduced in the interest of improving system performance. In essence, "purity" is compromised to gain processing efficiences. Since optimization is a worthwhile goal, insisting on absolute purity may be unreasonable. However, the data base designer should at least be aware of contrivances and, therefore, be in a position to evaluate the relative effects a design decision may have. Designers should become sensitive to their decisions by asking: "How -142- will the data model be affected by a future change in performance requirements? Have I done a reasonable job in insulating applications from data structure elements that are motivated strictly by performance considerations?" Some examples of induced data dependencies in logical data base design which may impact upon conversion are: The use of owner-coupled sets in DBTG to implement performance-oriented index structures or orderings on records. Storing physical pointers (or data base keys) in an information field of a record . Combining segment types (in DL/1) to reduce the amount of I/O required to traverse a data base. DBMS Utilization and Selection . Selection of a DBMS on conversion requirements. Of is to consider products can have a major impact importance in evaluating a DBMS exhibiting the highest level user interface. A set of the poi f unctio level" the DB (sel ect tupl es restric The DB s i g n i f i by-reco high 1 f u n c t i nt of ns , t and "1 MS pr , retr wh i c h ted to MS wi cantl y rd app evel DBMS ons and a view of hat is, t ow 1 evel " o v i d e s us ieve, upda satisfy record-a th the " more desi roach . is characterized by both a high degree of data independe the application. With res he DML, the distinction betwe has traditionally centered on er operations on sets of te, or summarize a! 1 the rec some conditions) or whethe t-a-time processing ("navig high-level" set operation app rable than the navigational powerful nee from pect to en "high whether records ords or r one is at ion" ) . roach is record- DBMS prospects should evaluate the data independence characteristics of a proposed product. Systems are preferred which support an "external schema" or "subschema" feature which permits the record image in the application program (the user work area) to differ significantly from the data base format. However, the subschema concept is only one aspect of data independence. In general, it is necessary to determine in what ways and to what extent the application interface is insulated from performance or internal format options. For instance, will programs have to be modified if: -143- a decision is made to add or delete an index? the amount of space allocated to an item is increased or decreased? chains are replaced by pointer arrays? Other conversion related questions about DBMS products include the following: Are there adequate performance and formatting alternatives? Are there too many (i.e., unproductive or incomprehensible) tuning options? Are there adequate performance measurement techniques and tools to guide the exercise of these choices? Does the system automatically convert a populated data base when a new format option is selected? Aside from tuning, does the DBMS gracefully accommodate at least simple external changes such as adding or deleting a record or item type? Are there other useful high level facilities associated with or based on the DBMS, such as a report writer, query processor, data dictionary, transaction monitor, accounting system, payroll system, etc.? Is there a utility for translating the data base into an "interchange form;" i.e., a machine independent, serial stream of characters? Is the vendor committed to maintaining the product across new operating system and hardware releases/upgrades? Conversely, is the vendor prepared to support the product in order released of the operating system, so the user will not be forced to upgrade? What hardware environments are currently supported and what is the vendor's policy regarding conversion to another manufacturer's mainframe? What programming language interfaces are available? Gan the same DBMS features be used if there is a migration, say, from COBOL to PL/1? -144- user How intelligent is the system's technique for organizing data on the media? Specifically, will performance deteriorate at an inordinate rate as updating proceeds? How often will reorganization (cleanup) be required? Does the DBMS have a built- in reorganization utility? How does the determine the optimal time to reorganize? Are the language facilities and data modeling facilities of DBMS adequate for the anticipated long term requirements of the enterprise? What is the risk of having to convert to a new DBMS? Likewise, are the performance characteristics and internal storage structure limitations adequate to meet the long term requirements (response times data base sizes) of the enterprise? Are there facilities to assist the user in converting data from a non-DBMS environment or from another DBMS? For instance, can a data base loaded from one or more user defined files? be 6.3.2 Future Technologies/Standards Impact. In this section we discuss trends in computer hardware technologies, DBMS software directions, and standards development, and consider their impact on data and program conversion. We intend to make the reader aware of what to expect in terms of conversion problems rather than give a complete assessment of future technologies. Therefore we discuss only technologies and standards that will impact conversion problems. The first three parts discuss the areas of hardware software, and standards and their impact on conversion some detail. The last part summarizes the our assessment without going into detailed major points reasoni ng . i n of Hardware and Architectural Technologies . The cost and performance of processor HTgic iTPTa memory continue to improve at a fast rate. As a result, overhead costs are more acceptable, especially when such costs save people's time and work, and provide user oriented functions that do not require a computer expert. In particular, one can now trunk about using generalized conversion tools not only when it is required as a result of hardware or software changes Dut also as a result of a changing application that requires a new more efficient data base organization. What could have been a prohibitive cost for a data base conversion tne past, may not be a major factor in the future. i n -145- At the contributes accentuates more cost maintaining Improvements same time, the cost/performance improvement to the proliferation of data bases and therefore the need of generalized conversion tools. The effective is the process of accessing and data, the more data is collected on computers, in hardware (as well as software) technologies create more need for data and program conversion. In addition, the emergence of new technologies, such as communication networks, add another level of sophistication to the way that data can be organized and used. Distributed data bases, where multiple data bases (or subsets of data bases) may reside on different machines, require tools for the integration and the correlation of data. Invariably, data will need to move from system to system dynamically, possibly moving between different hardware/software systems. In this environment, generalized tools for dynamic conversion will become a necessity. In recent years, two promising approaches to data management hardware technologies have been pursued. One is thi backend data manag<_ both approaches can help simplify the conversion problem. nagemeni naraware lecnnuiuyies nave ueen pu.oucu. ^i.v. >- le specialized data management machine and the other is the ickend data management machine. As will be explained next, The specialized data management hardware is based on the idea of using some kind of an associative memory device, a device that can perform a parallel access to the data based on its content. Such a device eliminates the necessity for organizing the internal structure of a data base using indexes, hash tables, pointer structures ,, etc . , which are primarily used for fast access. As a result, the data can be essentially stored in its external logical form, and the data management system can use a high level language based on the logical data structure only. The conversion process is simplified since data is readily available in its logical organization. Referring to the terminology used in previous sections, the functions of unloading and loading of the data base can be greatly simplified. Also, no restructuring will be required because of a change in data base use, since the physical data base organization can be to a large degree independent of its intended use. In addition, the program conversion problem is simplified as a result of the program interfacing to the DBMS using a high level logical language. Similar benefits can be achieved if backend machines are used. A backend machine is a special processor dedicated to managing storage and data base on behalf of a host computer. The primary motive for the backend machine is to off-load the data management function from the host to a specialized machine that can execute this function at much lower cost. From a conversion standpoint, the separation of -146- data management functions from the host promotes the need for a high level logical interface that provides the advantages discussed above. Another advantage is that it is possible to migrate from one host machine to another without affecting the data bases and their management, alleviating the need for data conversion if the same backend machine used with the new host. i s Mass storage devices, such as the video disks, make storing very large data bases, in the order of 10 to the 10th power characters, cost effective. Converting large data bases of this size compounds the cost considerations merely by the processing of this large amount of data. As a result, such data bases will tend to stay in the same environment for longer periods of time. The use of specialized data management machines or dedicated backend machines in conjunction with these mass storage devices can help postpone the need for data base conversion. Finally, minicomputers DBMSs now exist we should mention the growing use of supporting data management functions, on many minicomputers, with more forthcoming. The proliferation of minicomputers which support data bases can only increase the needs for generalized conversion tools. Software Devel last years in the techn i ques the data ba called "dat users need organ i zatio rel at ions hi and manipul model only using spec machines di the unload DBMS is p simpl if icat in the that cl e se from a indepe not b ns of th ps. Th a t i o n la The e ial ized scussed and load rovided ion in p opment Trends . data ma arl y sep its p h y s ndence" e expos e data b is 1 ed nguages ffect of data m previous f u n c t i o at the rogram c nagement arate th ical org was intr ed to th ase , bu to the d that dep this t r anagemen ly; name n s since 1 ogica o n v e r s i o Much of the work over the area has concentrated on e logical structure of anization. This concept, oduced to emphasize that e details of the physical t only to its logical evelopment of data access end on the logical data end is similar to that of t machines and backend ly, the simplification of the interface to the 1 level only, and the n for similar reasons. At the user end of the spectrum, it seems reasonable to assume that the diversity of data models (network, relational, hierarchies and other views that may be developed in the future) will be required for many more decades. This is especially true since there are problem areas that seem to map more naturally into a certain model. Furthermore, it is often the case that users do not agree on the same model for a given problem area. Obviously, this state of affairs only accentuates the need to generalize conversion tools that can restructure data bases from one -147- model to another. Even with the development of large scale associative memories, data structures will likely provide economic rationales for their contrived use. Another possibility is the use of a common underlying data model However, this that can accommodate any of the user views, approach will still require some type of a conversion process between the common view and each possi bl e user vi ews . dynamic of the Standards Devel opment . There is much work and controversy fn devel opi ng standards for DBMS. Standards that are oriented to determine the nature of the DBMS are hard to bring about even in a highly controlled environment because of previous investment in application software and data base development, and because of disagreement. For example, there is still much controversy whether the network model proposed by the CODASYL committee is a proper one. It seems reasonable to assume that there will always be non- standard DBMSs. Further, even if such a standard can be adopted, different DBMS implementations will still exist, resulting in different physical data bases for the same logical data base. In addition, one can safely assume^ that restructuring because of application needs will still be necessary, and that changes in the standard itself may require conversion. A standard that is more likely to be accepted is one that affects only the way of interfacing to a DBMS. In particular, from a conversion standpoint, a standard interchange data form (SIDF) will be most useful. A SIDF is a format not unlike a load format for DBMSs. Any advanced DBMS has a load utility that requires sequential data stream in a pre-speci f i ed format. If a standard for this format can be agreed upon, and if all DBMSs can load and unload from and to this format, then the need for reformatting (as described earlier) is eliminated. The conversion process can be reduced to essentially restructuring only, given that unload and load are part of the DBMS function. A preliminary proposal for such a standard was developed by the ERDA Inter-Working Group on Data Exchange (IWGDE) [GG2]. However, it is only designed to accommodate hierarchical structures. Consideration is now being given to the extension of the standard to accommodate more general structures (i.e., networks and relations). We believe that there are no technical barriers to the development of a SIDF, and that putting such a standard to use would alleviate a major part of the data conversion process. Summary . The rationale for the points summarized below appear in the previous parts of this section. We will only state here our assessment of the impact on conversion probl ems . -148- Hardware development will increase the need for generalized conversion tools (in proliferation of minicomputers, and mass storage devices). part ic ul ar, computer networks, Iost^rre , 'ac n cl n ptabl d e !: are ""* Wl ] 1 "" ke version Special hardware DBMS machines will simplify the fSncllonV a P nT 6SS (i " P»^*1cul.r. forNSSd unload junctions, and program conversion) because thpv promote interfacing at the logical level V Software advances will not eliminate the need for conversion but can simplify the conversion in a way similar to DBMS machines. process Multiplicity of logical models is likely to modlls! 9 t0 the need ° f conv ersion tool exi st , s between EvSn da w1th Wl 2\?*nrff 1 H n,1 r£ te -' th ? conve ^ion problem. LJ h * standard, the implementations would be different and non-standard DBMS will likely exist. iuUii: r.\i^v n i'. load and unload '-«'o e n ha . n „ 9 d e -149- DL26 BIBLIOGRAPHY (DL) LOGICAL DATA BASE DESIGN NOVAK, D. and FRY, J., "The State of the Art of Logical Data base Design," Proceedings of the Fifth Texas Conference on Computing Systems , IEEE, Long Beach, 1976, pp. JU-TT'. DL29 SMITH, J. M., and SMITH, D. C. P., "Data Base Abstractions: Aggregation and Generalization," ACM Transactions on Data Base Systems 2 ,2(1977) : 105-133 DL30 DL31 DL34 SMITH, J. M., and SMITH, D. C. Abstractions: Aggregation," ACM 20,6 (1977 ) :405-13 . P., "Data Base Communicati ons of the BUBENKO, J. A., "IAM: An Inferential Abstract Modeling Approach to Design of Conceptual Schema," Proceedings of_ the ACM - SIGMOD International Conferen "c~e~~o~n ~Management of DajU, ACM, N.Y., 1977, pp. 62-/4. CHEN, P. P., and YAO, S. B., "Design and Performance Tools for Data Base Systems," Proceedings of the Third International Conference" 19TT, Bases, ACM, N.Y. PP on Very Large Data (DT) DATA TRANSLATION DTI DT2 DT3 DT4 FRY, J. P., FRANK, R. L., HERSHEY, E. A., Ill Developmental Model for Translation," Proc . SIGFIDET Workshop on Data Description , Access ACM, N.Y. , "A 1972 ACM Control , A. L . Dean ( ed. ) pp and 77^1TJ6T SMITH, D. C. P., "A Method for Data Translation Using the Stored Data and Definition Task Group Languages," Proc . of tJie 1972 ACM SIGFIDET Workshop o n Data Description , Access and Control , AC M , N . Y . , pp.TOT-TTT: RAMIREZ, J. A., "Automatic Generation of Data Conversion Programs Using a Data Description Language (DDL)," Ph.D. dissertation, University of Pennsylvania, 1973. MERTEN, A. G., and FRY, J. P., "A Data Description Approach to File Translation," Proc . 1974 ACM SIGMOD Wo rkshop on Data Description , Access and Control , ACM, N.Y.T~PP^~^91-20b. ■150- DT5 HOUSEL, B., LUM, V., and SHU, N., "Architecture to an Interactive Migration Systems (AIMS)," Proc 1974 ACM SIGFIDET Workshop on Data Desc r i pti o7r~AcceTs~ and Control , ACM, N . Y .~pp7T57 -169 . DT6 RAMERIZ, J. A., RIN, N. A., and PRYWES, N. S. "Automatic Conversion of Data Conversion Programs using a Data Description Language," Proc. 1974 ACM ^ IG ^ ID P Worksh QP on Data Dejcripr i o n7~Ac c eTs~and~~ control , ACM, N . Y . ~p pT707^T; DT7 FRANK, R. L., and Yamaguchi, K., "A Model for a Generalized Data Access Method," Proc . of the 1974 National Com P u 5 e !: Conference , AFIPS Pre"sT,~~MTn tVTTe N.J . , pp. 4J/-444. DT8 TAYLOR, R. w., "Generalized Data Structures for Data Translation," Proc . Thjrd Texas Conference on Computing Systems , Austin, Texas, 1974, pp. "6-3-1 TT DT9 DT11 DT12 DT14 DT15 UNIVAC, UNIVAC 1100 Series Data File Converter Programmer Reference UP-8070, Sperry Rand Corporation, March, 1974. DT10 YAMAGUCHI, K. "An Approach to Data Compatibility A Generalized Access Method," Ph.D University of Michigan, 1975. dissertation, The BAKKOM, D. £., and BEHYMER, J. A., "Implementation of 3 Prototype Generalized File Translator " Proc ACM SI GM0D International Conf. on ManagemeTTTTf king ( ed. ) ■■*-"—-■■ — * 1975 Data, W. F "A"C117~N.YT, pp. 99-1107 NAVATHE S. B., and MERTEN, A. B., "Investigations into the Application of the Relation Model of Data to Data Translation," Proc . 1975 ACM SIGMOD International Conf . on Ma nagel Te~nt~cT Data, W. F King led.), ACM, N . YTT pp. 123-13 8" DT13 BIRSS E. W and FRY, J. P., "Generalized Software for Translating Data," Proc. of the 1976 National Computer Conference ~ ~ N.J. , pp." 889-899. Vol. 45, AFIPS Press, Montvale, LUM, V. Y., SHU, N. C, Methodology for Data IBM R & D Journal , Vol 483-497 and HOUSEL, B. C, "A General Conversion and Restructuring " 20, No. 5, 1976, pp. HOUSEL, B. C, et al . , "Express: A Data Extraction, Processing, and Restructuring System," Transactions ■151- on Data Base Systems , 2,2, T3"4^T74. ACM, N.Y. , 1977, pp DT16 SWARTWOUT, D. E., DEPPE, M. E., and FRY, J. P., "Operational Software for Restructuring Network Data Bases," Proc . of the 1977 National Computer ConferenceT ~VoTT T6T IFTFS Press, Montvale, N . J . , pp. 499-5U8. DT17 GOGUEN, N. H., and KAPLEN, M. M., "An Approach to Generalized Data Translation: The ADAPT System," Bell Telephone Laboratories Internal Report, October 5, 1977. (GG) GENERAL GG1 GG2 GG3 INGALS, D., "The Execution Time Profile as a Programmi ng Tool , Compil ers , ed. by pp. 108-128. 1 n Design and Optimization of R. Rustin, T J 7e"ntice Hal 1 , 1972", ERDA Interl aboratory Working Group for Data Exchange (IWGDE) Annual Report for Fiscal Year 1976, NTIS LBL-5329. DATE An Introduction Data Base System, Addison -Wesley, 1975. GG4 CODD, E. F., "Relational Completeness of Data Base Sublanguage," In Data Base Systems, Caurant Computer Science Symposia Series, Vol- 6. Prentice Hall. 1972. (M) 6, Prentice Hall, MODELS-THEORY Mil CHEN, P.P.S., "The Entity-Relationship Model - Towards a Unified View of Data," Transactions on Data Base Systems 1 , 1 ( 1976 ): 9-36 . (PT) PROGRAM TRANSLATION PT1 PT2 SHARE AD-HOC COMMITTEE ON UNIVERSAL LANGUAGES, "The Problem of Programming Communication with Changing Machines: A Proposed Solution," Comm . ACM , Aug., 1958, pp. 12-18. SHARE AD-HOC COMMITTEE ON UNIVERSAL LANGUAGES, "The Problem of Programming Communication with Changing Machines: A Proposed Solution, Part 2," Comm . ACM Sept. , 1958, pp. 9-16. ■ 152- PT3 SIBLEY EH., and MERTEN, A. G., "Transferability and Translation of Programs and Data," Information l^ 5 tems, COUJS LV, Plenum Press, N. Y., 1972 PT4 PT5 PT6 PT7 PT8 PT13 PP YAMAGUCHI, K., and MERTEN, A. G., "Methodology for Transferring Programs and Data," Proc 1974 ACM-S1GF2DET Workshop on Data DescTTp-ti bT T~Access and Control \ ACM, N . Y . ~p"p~rT4T^T^-^ S HOUSEL, B. to a 1974 ACM- SIGFIDET Access and Contn C, LUM, V. Y., and SHU, M . ; "Architecture ;2-,? n .. Interactlve Migration System (AIMS)," Proc. T Workshop oji Data Description — ' £l, ACM, N.Y.,~pp7 157-170. ' MEHL, J W., and WANG, C. P., "A Study of Order Transformation of Hierarchical Structures in IMS Data Bases," Proc. J_974_ ACM-S IGFIDET Workshop on T7^l4u SCriP Access and- TcT7rFrVr ,-TrM~T7YT- PP HOUSEL, B. C, and HALSTEAD, M. H Machine Language Decompilation, ACM AnjT_ujn Conference , ACM, N.Y , "A Methodology for Proc . of the 1974 pp. 254-260. HONEYWELL INFORMATION SYSTEMS, "Functional Specification Task 609 Data Base Interface Package " ?00- e 7 n 3 S ! C -oS55 Uni ' Cati0nS A9GnCy Contract °CA H6Ckage ' KI 197 Z 5 ER ACM'N,t!on, n i Sl r at1 ; 9 Data BdSe P ™cedures," Proc. j^b ACM National Conference , ACM, N.Y., pp. 359^67. SCHINDLER S "An Approach to Data Base Application Restructuring," Working Paper 76 ST 2 3 Data R*7* Systems Research Group! The University of Michigan Ann Arbor, Mich. 1976. »'"' PT11 DALE A G and DALE, N . B., "Schema and Occurrence 1" m S f a ; Sf0r . ati0n 1n H ^rarch1cal Systems," 5Sf: pT^l -lb r 8 atl ° na1 - C ° nfere ""- °n Han.se-ent of PT9 PT10 SCHINDLER, S. "" SU to S Ja N ta E BaI; rh ' "^l l ' cat1 °" P -"°9^ Conversion Due to Data Base Changes," Proc . of the 2nd Internat ional Conference" 8-10, 19 143-157. on VLDB , Brussels, Sept. SU, S Y. W., and LIU, B. J., " A Methodology of 5£t Villi** Pr °;- am Anal * sis and Conversion Based on Data Base Semantics," Proceedings of the international Conference on ManagemenT~of Data -153- 1977, pp. 75-87. PT14 PT15 HOUSEL, B. C Conversion, "A Unified Approach to Program and Data Proceedings of the Third International Confe rence on Very Large Data Bases , ACM, N.Y., 1977, pp. 377-TT5T SU, S. Y. W., and REYNOLDS, M. J., "Conversion of High-Level Sublanguage Queries to Account for Data Base Changes," Proc . o£ NC_C, 1978, pp. 857-875. (R) RESTRUCTURING Rl R2 R3 R4 R5 R6 R7 R8 FRY, J. P., and JERIS, D., "Towards a Formulation of Data Reorganization," Proc . 1974 ACM / SIGMOD Workshop on Data Description , Access and Control , ed. by R. RustTnT ACM, N.Y. , pp. 83-10TT7 - SHU, N. C, HOUSEL, B. C, and LUM, V. Y., "CONVERT: A High-Level Translation Definition Language for Data Conversion," Comm . ACM 18,10, 1975, pp. 557-567. SHOSHANI, A., "A Logi cal -Lev el Approach to Data Base Conversion," Proc. 1975 ACM/SIGMOD In ternational IT2^122. Conf. on Management of Data , ACM, N.Y PP SHOSHANI, A., and BRANDON, K., "On the Implementation of a Logical Data Base Converter," Proc . International Conference on Very Large Data Bases , ACM, N.Y. , 1975, pp. 529-531 . HOUSEL, B. C, and SHU, N. C, "A High-Level Data Manipulation Lanquaqe for Heirarchical Data Structures," Proc . of the 1976 Conference on Data Abstraction , Definition and Structure , Salt Lake City, Utah, pp. 155-169. NAVATHE, S. B., and FRY, J. P., "Restructuring for Large Data Bases: Three Levels of Abstraction," ACM Transac tions on Data Base Systems , 1,2, ACM, N.Y. , 1976, pp. 138^T5F: NAVATHE, S. B., "A Methodology for Generalized Data Base Restructuring," Ph.D. dissertation, The University of Michigan, 1976. GERRITSEN; ROB, and MORGAN, HOWARD, "Dynamic Restructuring of Data Bases With Generation Data Structures," Proc . of the 1976 ACM Conference , ACM, N.Y. , pp. 281^28~6. -154- R9 SWARTWOUT, D., "An Access Path Specification Language for Restructuring Network Data Bases," Proc . of the 1977 SIGMOD Conference , ACM, N.Y., pp. FB^TOl" RIO NAVATHE, S. B., "Schema Analysis for Data Base Rll Restructuring," Proc. 3rd on Very Large Data BaseT T appear in TODS. Inte r n a t i o n a 1 TT7 Conference r9"7r: — To EDELMAN, J. A., JONES, E. E., LIAW, Y. S., NAZIF, Z. A., and SCHEIDT, D. L., "REORG - A Data Base Reorganizer," Bell Laboratories Internal Technical Report, April, 1976. (SL) STORED-DATA DEFINITION SL1 STORAGE STRUCTURE DEFINITION LANGUAGE TASK GROUP (SSDLTG) OF CODASYL' SYSTEMS COMMITTEE, "Introduction to Storage Structure Definition" (by J. P. Fry); "Informal Definitions for the Development of a Storage Structure Definition Language" (by W. C. McGee); "A Procedural Approach to File Translation" (by J. W. Young, Jr.); "Preliminary Discussion of a General Data to Storage Structure Mapping Languaae" (by E. H. Sibley and R. W. Taylor), Proc . 1970 ACM - SIGFIDET Workshop on Data Description , Access and Control , ed. by E. F. Codd, Houston, Tex. , Nov. T9T0, pp. 368-80. SL2 SMITH, D. C. P., "An Approach to Data Description and Conversion," Ph.D. dissertation, Moore School Report 72-20, University of Pennsylvania, Philadelphia, Pa., 1972. SL3 SL4 SL5 SL6 TAYLOR, R. W., "Generalized Data Base Management System Data Structures and Their Mapping to Physical Storage," Ph.D. dissertation, The University of Michigan, Ann Arbor, Mich., 1971. FRY, J. P., SMITH, D. C. P., and TAYLOR, R. W., "An Approach to Stored-Data Definition and Translation," Proc . 1972 ACM - SIGFIDET Workshop on Data Description , Access " " Denver, Col o. Nov. and Control TT727 pfK T3-55 . ed. by A. L. Dean, BACHMAN, C. W., "The Evolution of Storage Structures," Comm. ACM 15,7 (July 1972), pp. 628-34. SIBLEY, E. H. and TAYLOR, R. W., "A Data Definition and Mapping Language," Comm . ACM 16,12 (Dec. 1973), pp. 750-59. -155- SL7 SL8 HOUSEL, B., SMITH, D., SHU, N., and LUM, V., "Define A Non-Procedural Data Description Language for Defining Information Easily," Proc . of 1 975 ACM Pacific Conference , San Francisco, CA, April pp. 62-7tn 1975, The Sto red-Data Definition and Translation Task Group, "Stored-Data Description and Data Translation: A Model and Lanquage," Information Systems 2,3 (1977): 95-1 48'. "~ ~ (SM) DATA SEMANTICS SMI SM2 SM3 SCHMID, H. A. , and SWENSON, of the Relational Model," C o nf erence , May 1975, pp. J. R., "On the Semantics Proc . , ACM- SIGMOD 1975 7TTT2 33: ROUSSOPOULOS, N., and MYL0P0UL0S, J., "Using Semantic Networks for Data Base Management," Proc . Very Large Data Base Conference , Framingham, Mass., Sept. 1975, pp. 144-172. SU, STANLEY Y. W., and LO, D. H., "A Multi-level Semantic Data Model," CAASM Project, Technical Report No. 9, Electrical Engineering Dept., University of Florida, June 1976, pp. 1-29. (TA) TRANSLATION APPLICATIONS TA1 WINTERS, E. W., and DICKEY, A. F., "A Business Application of Data Translation," Proceedings of the 19 76 SIGMOD International Conference o^n Mana gement -by of Dat a , June Ed. F977, p P J. 189 B. Rothnie, Washington, D.C 196. (UR) UM RESTRUCTURING UR1 LEWIS, K., DRIVER, B., and DEPPE, M., "A Translation Definition Language for the Version II Translator," Working Paper 809, Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1975. UR2 LEWIS, K., and FRY, J., "A Comparison of Three Translation Definition Languages," Working Paper DT 5.1, Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1975. UR3 DEPPE, M. E., "A Relational Interface Model for Data Base Restructuring," Technical Report 76 DT 3, Data •156- Translation Project, The University of Michigan, Ann Arbor, Michigan, 1976. UR4 DEPPE, M. E., LEWIS, K. H., and SWARTWOUT, D. E., "Restructuring Network Data Bases: An Overview," Data Translation Project, Technical Report 76 DT 5, The University of Michigan, Ann Arbor, Michigan, 19 76. a UR5 DEPPE, M. E., and LEWIS, K. H .. , "Data Translation Definition Language Reference Manual for Version IIA Release 1," Data Translation Project, Working Paper 7 6 DT 5.2, The University of Michiaan, Ann Arbor, M i c h i g a n , 1 9 7 6 . UR6 SWARTWOUT, D. E., MARINE, A. M., and BAKKOM, D. E., "Partial Restructuring Approach to Data Translation," Data Translation Project, Working Paper 76 DT 8.1, The University of Michigan, Ann Arbor, Michigan, 1976. UR7 SWARTWOUT, D. E., WOLFE, G. J., and BURPEE, C. E., "Translation Definition Lanquage Reference Manual for Version IIA Translator, Release 3," Data Translation Project, Working Paper 77 DT 5.3, The University of Michigan, Ann Arbor, Michigan, 1977. (US) UM STORED-DATA DEFINITION US1 DATA TRANSLATION PROJECT, "Stored-Data Definition Language Reference Manual," The University of Michigan, Ann Arbor, Michigan, 1972. US2 US3 US4 DATA TRANSLATION PROJECT, "Revised Stored-Data Definition Language Reference Manual," The University of Michigan, Ann Arbor, Michigan, 1974. DATA TRANSLATION PROJECT, "University of Michigan Stored-Data Definition Language Reference Manual for Version II Translator," The University of Michiqan Ann Arbor, Michigan, 1975. BIRSS, E. W., and FRY, J. P., "a Comparison of Languages for Describing Stored Data," Data Translation Project, Technical Report 76 DTI, University of Michigan, Ann Arbor, Michigan, (UT) UM TRANSLATION Two The 1976. -157- UT1 "Functional Design Requirements for a Prototype Data Translator," Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 19 72. UT2 "Design Specifications of a Prototype Data Translator," Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1972. UT3 "Program Logic Manual for the University of Michigan Prototype Data Translator," Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1973. UT4 "Users Manuals for the University of Michigan Prototype Data Translator," Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1973. UT5 "Functional Design Reauirements of the Version I Translator," Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1973. UT6 "Program Logic Manual for the University of Michigan Version I Data Translator," Working Paper 306, Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1974. "Design Specifications: Version II Data Translator," Working Paper 307, Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1975. BIRSS, E., DEPPE, M., and FRY, J., "Research and Data Reorganization Capabilities for the Version 1 1 A Data Translator," Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1975. UT9 BIRSS, E., et al . , "Program Logic Manual for the Version 1 1 A Data Translator," Working Paper 76 DT 3.1, Data Translation Projects, The University of Michigan, Ann Arbor, Michigan, 1976. UT7 UT8 UT10 BODWIN, J., et al . , "Data Translator Version 1 1 A Release 1 User Manual," Working Paper 76 DT 3.2, Data Translation Project, The University of Michigan, Ann Arbor, Michigan, 1976. UT11 BODWIN, J., et al . , "Data Translator Version 1 1 A Release 2 User Manual," Working Paper 76 DT 3.4, Data Base Systems Research Group, The University of Michigan, Ann Arbor, Michigan, 1976. UT12 KINTZER et al., "Michigan Data Translator Version -158- IIB Release 1 User Manual," Technical Paper 77 DT 8 Data Base Systems Research Group, The University of Michigan, Ann Arbor, Michigan, 1977. UT13 BURPEE, C. E., et al . , "Michigan Translator Program Logic Manual Version IIB Release 1," Working Paper 77 DT 3.7, Data Base Systems Research Group, The University of Michigan, Ann Arbor, Michigan, 1977. UT14 BAKKOM, D., et al . , "Specifications for a Generalized Reader and Interchange Form," Working Paper 77 DT 6.2, Data Base Systems Research Group, The University of Michigan, Ann Arbor, Michigan, 1977. UT15 DeSMITH, D., and HUTCHINS, L., "Michigan Data Translator Design Specifications Version IIB," Working Paper 77 DT 3.8, Data Base Systems Research Group, The University of Michigan, Ann Arbor, Michigan, 1977. UT16 KINTZER, E., et al . , "Michigan Data Translator Version IIB Release 1.1 User Manual," Technical Paper 77 DT Data Base Systems Research Group, The 8.1 University of Michigan, Ann Arbor, Michigan, 1977. UT17 BAKKOM, D. E., and Schindler, S. J., "Operational Capabilities for Data Base Conversion and Restructuring," Technical Report 77 DT 6, Data Base Systems Research Group, The University of Michigan Ann Arbor, Mich. , 1977. (Z) RELATIONAL SYSTEM Z5 CHAMBERLAIN, D. C. and BOYCE, R. F., "SEQUEL: A Structured English Query Languaqe," Proceedi ngs of the ACM - SIGMOD Workshop on Data Description, Accel's Tnd" Control , ACM, NTTT~ T^lT. •159- ■160- 7. PARTICIPANTS The following is a list of attendees, participants and contributors to the workshop. Edward Arvel Conversion Experiences Data Sciences Group 890 National Press Building Washington, D.C. 20045 Marty Aronoff Management Objectives National Bureau of Standards Tech B258 Washington, D.C. 20234 Robert Bemer Standards Honeywell Information Systems P.O. Box 6000 Phoenix, AZ 85005 John Berg Proceedinqs Editor National Bureau of Standards Tech A259 Washington, D.C. 20234 Edward Birss Conversion Technology Hewlett Packard General Systems Division 5303 Stevens Creek Blvd. Bldg. 498-3 Santa Clara, CA 95050 Don Branch Standards Advisory Bureau for Computino Room 828, Lord Elgin Plaza 66 Slatter Street" Ottawa, Ontario K1A 0T5 CANADA Jean Bryce Sta nderds M. Bryce & Associates, Inc 1248 Springfield Pike Cincinnati, OH 45215 -161- Milt Bryce Chairman, Standards Jim Burrows Chairman, Conversion Experiences Richard G. Canning Management Objectives Lt. Michael Carter Conversion Experiences Joseph Collica Conversion Experiences Elizabeth Courte Conversion Experiences Ahron Davidi Conversion Experiences Peter Dressen Conversion Technology Ruth F. Dyke Conversion Experiences Larry Espe Management Objectives M. Bryce & Associates, Inc. 1248 Springfield Pike Cincinnati, OH 45215 Director, Institute for Computer Sciences and Technology National Bureau of Standards Administration Bldg., Room A200 Washington, D. C. 20234 Canning Publications, Inc. 925 Anza Avenue Vista, CA 92083 Air Force Data Systems Design Ctr AFDSDC/SDDA, Building 857 Gunter AFB, AL 36114 National Bureau of Standards Tech. A254 Washington, D. C. 20234 Bell Laboratories 3B210 Six Corporate Plaza Piscataway, NJ 08854 Blue Cross of Massachusetts 100 Summer Street, 12th Floor Boston, Massachusetts 02106 Honeywell Inforamtion Systems P. 0. Box 6000 Phoenix, Arizona 85005 U.S. Civil Service Commission 1900 E Street, N. W., Room 6410 Washington, D. C. 20415 Nolan, Norton and Company One Forbes Road Lexington, Massachusetts 02173 -162- Gordon Everest Management Objectives University of Minnesota 271 19 Avenue South Minneapolis, MN 55455 El izabeth Fong Standards National Bureau of Standards Tech B21 2 Washington, D.C. 20234 James P. Fry Chairman, Conversion Technol ogy 276 Business Administration University of Michigan Ann Arbor, MI 48109 Al Gabon' ault Standards Sper ry Univac P.O. Box 500 Mail Station C1NW-12 Bl ue Bel 1 . PA 19424 Mr. Rob Gerritsen Management Objectives Wharton School University of Pennsylvania Philadelphia, Pennsylvania 19174 Richard Godlove Management Objectives Monsanto Company 800 North Lindbergh Boulevard St. Louis, Missouri 63166 Nancy Goguen Conversion Technology Bell Laboratories 6 Corporate Place Piscataway, NJ 08854 Seymour Jeffery Host Director, Center for Programming Science and Technology National Bureau of Standards Tech A247 Washington, D.C. 20234 Samuel C. Kahn Management Objectives Information System Dep. - Planning E. I. duPont de Nemours & Co. Wilmington, Delaware 19899 Mike Kaplan Conversion Technology Bell Laboratories 8 Corporate Place Piscataway, NJ 08854 ■163- Anthony Klug Standards Computer Sciences Department University of Wisconsin Madison, Wisconsin 537 06 Henry Lefkovits Standards H. C. Lefkovits & Associates, Inc P.O. Box 297 Harvard, MA 01451 H. Eugene Lock hart Management Objectives Nolan, Norton & Company One Forbes Road Lexington. Massachusetts 02173 Thomas Lowe Host Chief Operations Engineering Division Center for Programming Science and technology National Bureau of Standards Tech A265 Washinaton. D.C. 20234 Gene Lowenthal Conversion Technology MR I Systems Corporation P.O. Box 9968 Austin^ Texas 78766 Vincent Lum Conversion Technology IBM Research Corp., K55/282 5600 Cottle Road San Jose, CA 95103 Mr. John Lyon Management Objectives Colonial Penn Group Data Corporation 5 Penn Center Plaza Philadelphia, PA 19103 Halaine Maccabee Conversion Experiences Northeastern Illinois Plan. Comm. 400 West Madison St. Chicago . Illinois 60606 (NIPC) William Madison Contributor Universal Systems, Inc. 2341 Jefferson Davis Highway Arlington, Virginia 22202 •164- Daniel B. Magraw General Chairman State of Minnesota Department of Administration 208 State Administration Building St. Paul , MM 55155 Robert Marion Conversion Technology Defense Communications Agency 11440 Issac Newton Square. North Reston . VA 22090 Steven Merritt Conversion Experiences GAO, FGMS-ADP, Room 6011 411 G Street , N. W. Washinqton, D.C. 20548 Jack Minker ACM Liaison Cha i rman Department of Computer Science University of Maryland College Park, Maryland 20742 Thomas E . Murray Management Objectives Del monte Corp . P.O. Box 3575 San Francisco, CA 94119 Shamkant Navathe Conversion Technology New York University Grad. School of Business 600 Tisch Hal 1 40 West 4th Street New York , New York 10003 Mr. Jack Newcomb Management Objectives Department of Finance & Admin. 326 Andrew Jackson State Off. Bldq Nashvil le T TN 37219 Richard Nolan Chairman, Management Objectives Nolan. Norton and Company One Forbes Road Lexington. Massachusetts 02173 T. Wil 1 iam 01 le Management Objectives Consul tant 27 Blackwood Close West Byfleet Surrey, KT14 6PP ENGLAND Mayford Roark Keynoter Ford Motor Company The American Road Dearborn, MI 48121 -165- C. H. Rutledge Standards Marathon Gil Company 539 South Main Street Findley, OH 45840 Mr. Michael J . Samek Management Objectives Celanese Corporation 1211 Avenue of the Americas New York , NY 10036 Steve Schindler Management Objectives and Conversion Technology Mr. Richard D. Secrest Management Objectives University of Michigan 276 Business Administration Ann Arbor, MI 48109 Standard Oil Company (Indiana) P .0 . Box 591 Tulsa, OK 74102 Philip Shaw Standards IBM Corporation 555 Bailey Avenue San Jose, CA 95150 Arie Shoshani Conversion Technology Lawrence Berkeley Laboratory University of California Berkeley, CA 94720 Edgar Sibley Management Objectives Dept. of Info. Systems Mgmt. University of Maryland Colleqe Park, MD 20742 Prof. Diane Smith Conversion Technology University of Utah Merrill Engineering Bldg. Salt Lake City, UT 84112 Alfred Sorkowitz Conversion Experiences Department of Housing & Urban Devel 451 7 Street, S.W. , Room 4152 Washington, D.C. 20410 Prof. Stanley Su Conversion Technology Electrical Engineering University of Florida Gainesvil 1 e . FL 32611 ■166- Donal d Swa rtwout Conversion Technology 276 Business Administration University of Michigan Ann Arbor, MI 48109 Robert W. Taylor Conversion Technology IBM Corporation IBM Research. K55/282 5600 Cottle Road San Jose, CA 95193 Jay Thomas Standards Al 1 ied Van Lines 2501 West Roosevel t Broadview, IL 60153 Road Ewart Willey Standards Prudential Assurance Co 142 Holborn Bars London EC1N 2NH ENGLAND Major Jerry Winkler Standards U.S.A.F. -AFDSC/GKD Room 3A153, Pentagon Washington, D.C. 20030 Beatrice Yormark Conversion Technology Interactive Systems Corporation 1526 Cloverfield Blvd. Santa Monica, CA 90404 A Contributor is one who could not attend but either submitted a paper or added to the Workshop in a manner that deserves our acknowledgement and thanks *U S GOVERNMENT PRINTING OFFICE: 1980 328-117/6522 167- NBS-114A (REV. 9-76) U.S. DEPT. OF COMM. BIBLIOGRAPHIC DATA SHEET 1. PUBLICATION OR REPORT NO. NBS SP 500-64 .2. Gov't Accession No, 4. TITLE AND SUBTITLE DATA BASE DIRECTIONS - THE CONVERSION PROBLEM Proceedings of a Workhop held in Ft. Lauderdale, FL, Nov. 1-3, 1977 3. fecijjtertt's Accession Ho, 5. Publication Date September 1980 $. tag Otgahization Code 7. AUTHOR(S) John L. Berg, Editor 8. Performing Organ. Report No. 9. PERFORMING ORGANIZATION NAME AND ADDRESS NATIONAL BUREAU OF STANDARDS DEPARTMENT OF COMMERCE WASHINGTON, DC 20234 Ift-^j^cJ/Tafik/Work daitNo. 11. Contract/Grant No. 12. SPONSORING ORGANIZATION NAME AND COMPLETE ADDRESS (Street, City, state, ZIP) National Bureau of Standards Department of Commerce Washington, DC 20234 Assoc, for Computing Machi 1133 Ave. of Americas New York, NY 10036 ner t 13. Type of Report & Period Covered Final 14. Sponsoring Agency Code 15. SUPPLEMENTARY NOTES | | Document describes a computer program; SF-185, FIPS Software Summary, is attached. 16. ABSTRACT (A 200-word or less factual summary ol most significant information. If document includes a significant bibliography or literature survey, mention it here.) What information can help a manager assess the impact a conversion will have on a data base system, and of what aid will a data base system be during a conver- sion? At a workshop on the data base conversion problem held in November 1977 under the sponsorship of the National Bureau of Standards and the Association for Computing Machinery, approximately seventy-five participants provided the decision makers with useful data. Patterned after the earlier Data Base Directions Workshop, this workshop, Data Base Directions - the Conversion Problem , explores data base conversion from four perspectives: management, previous experience, standards, and system technology. Each perspective was covered by a workshop panel that produced a report included here. The management panel gave specific direction on such topics as planning for data base conversions, impacts on the EDP organization and applications, and minimizing the impact of the present and future conversions. The conversion experience panel drew upon ten conversion experiences to compile their report and prepared specific checklists of "do's and don'ts" for managers. The standards panel provided comments on standards needed to support or facilitate conversions and the system technology panel reports comprehensively on the systems and tools needed with strong recommend- ations on future research. 17. KEY WORDS (six to twelve entries; alphabetical order; capitalize only the first letter of the first key word unless a proper name; separated by semicolons) Conversion; data base; data-description; data-dictionary; data-directory; data-manipulation; DBMS: languages; query 18. AVAILABILITY ^Unlimited I I For Official Distribution. Do Not Release to NTIS [~%1 Order From Sup. of Doc, U.S. Government Printing Office, Washington, DC 20402 I I Order From National Technical Information Service (NTIS), Springfield, VA. 22161 19. SECURITY CLASS (THIS REPORT) UNCLASSIFIED 20. SECURITY CLASS (THIS PAGE) UNCLASSIFIED 21. NO. OF PRINTED PAGES 178 22. Price $5,50 USCOMM-DC CHECK THEM OUT How do those automated checkout counters, gas pumps credit offices, and banking facilities work? What every consumer should know about the modern electronic sys- tems now used in everyday transactions is explained in a 12-page booklet published by the Commerce Depart- ment s National Bureau of Standards. Automation in the Marketplace (NBS Consumer Informa- tion Series No. 10) is for sale by the Superintendent of Documents, U.S. Government Printing Office, Washington D.C. 20402. Price: 90 cents. Stock No. 003-003-01969-1 "AUTOMATION IN THE MARKETPLACE" A Consumer's Guide D/H CAKE MIX .83 WALNUTS CAN .59 JELL0 PUDDING .30 ► UNIVERSAL PRODUCT CODE see booklet 1 GREEN PEPPER .34 LASER SCANNER see bklt CHRRY TOMATO .79 ► ELECTRONIC CASH REGISTER see bklt 1 CUCUMBERS .34 ► HANDLING OF UNC0DED ITEMS see bklt ► ELECTRONIC SCALES see bklt GREETING CARD .60 ► WEIGHTS & MEASURES ENFORCEM'T see bklt DELICATESSEN 1.35 2.19b @49/bBR0CC0 1.07 ► SPECIAL FEATURES OF COMPUTER CHECKOUT SYSTEMS see bklt DRUG 4.49 T ► BANK TELLER MACHINES see bklt ► COMPUTER TERMINALS see bklt ► CONSUMER ISSUES see bklt ► THANK YOU BE INFORMED see bklt 1 ; :Wj (please detach along dotted line) ORDER FORM PLEASE SEND ME COPIES OF Automation in the Marketplace at $.90 per copy. S.D. Stock No. 003-003-01969-1 (please type or print) NAME I enclose $ Deposit Account No.. Total amount $ — (check, or money order) or charge my Make check or money order payable to Superintendent of Documents. MAIL ORDER FORM WITH PAYMENT TO Superintendent of Documents U.S. Government Printing Office or any U.S. Department of Washmgton, D.C. 20402 Commerce district office ADDRESS. CITY STATE .ZIP CODE FOR USE OF SUPT. DOCS. Enclosed To be mailed .later Refund Coupon refund Postage NBS TECHNICAL PUBLICATIONS PERIODICALS JOURNAL OF RESEARCH— The Journal of Research of the National Bureau of Standards reports NBS research and develop- ment in those disciplines of the physical and engineering sciences in which the Bureau is active. These include physics, chemistry, engineering, mathematics, and computer sciences. Papers cover a broad range of subjects, with major emphasis on measurement methodology and the basic technology underlying standardization. Also included from time to time are survey articles on topics closely related to the Bureau's technical and scientific programs. As a special service to subscribers each issue contains complete citations to all recent Bureau publications in both NBS and non- NBS media. Issued six times a year. Annual subscription: domestic $13; foreign $16.25. Single copy, $3 domestic; $3.75 foreign. NOTE: The Journal was formerly published in two sections: Sec- tion A "Physics and Chemistry" and Section B "Mathematical Sciences." DIMENSIONS/NBS— This monthly magazine is published to in- form scientists, engineers, business and industry leaders, teachers, students, and consumers of the latest advances in science and technology, with primary emphasis on work at NBS. The magazine highlights and reviews such issues as energy research, fire protec- tion, building technology, metric conversion, pollution abatement, health and safety, and consumer product performance. In addi- tion, it reports the results of Bureau programs in measurement standards and techniques, properties of matter and materials, engineering standards and services, instrumentation, and automatic data processing. Annual subscription: domestic $11- foreign $13.75. IMONPERIODICALS Monographs— Major contributions to the technical literature on various subjects related to the Bureau's scientific and technical ac- tivities. Handbooks— Recommended codes of engineering and industrial practice (including safety codes) developed in cooperation with in- terested industries, professional organizations, and regulatory bodies. Special Publications— Include proceedings of conferences spon- sored by NBS, NBS annual reports, and other special publications appropriate to this grouping such as wall charts, pocket cards, and bibliographies. Applied Mathematics Series— Mathematical tables, manuals, and studies of special interest to physicists, engineers, chemists, biologists, mathematicians, computer programmers, and others engaged in scientific and technical work. National Standard Reference Data Series— Provides quantitative data on the physical and chemical properties of materials, com- piled from the world's literature and critically evaluated. Developed under a worldwide program coordinated by NBS under the authority of the National Standard Data Act (Public Law 90-396). NOTE: The principal publication outlet for the foregoing data is the Journal of Physical and Chemical Reference Data (JPCRD) published quarterly for NBS by the American Chemical Society (ACS) and the American Institute of Physics (AIP). Subscriptions, reprints, and supplements available from ACS, 1 155 Sixteenth St., NW, Washington, DC 20056. Building Science Series — Disseminates technical information developed at the Bureau on building materials, components, systems, and whole structures. The series presents research results, test methods, and performance criteria related to the structural and environmental functions and the durability and safety charac- teristics of building elements and systems. Technical Notes— Studies or reports which are complete in them- selves but restrictive in their treatment of a subject. Analogous to monographs but not so comprehensive in scope or definitive in treatment of the subject area. Often serve as a vehicle for final reports of work performed at NBS under the sponsorship of other government agencies. Voluntary Product Standards— Developed under procedures published by the Department of Commerce in Part 10, Title 15, of the Code of Federal Regulations. The standards establish nationally recognized requirements for products, and provide all concerned interests with a basis for common understanding of the characteristics of the products. NBS administers this program as a supplement to the activities of the private sector standardizing organizations. Consumer Information Series— Practical information, based on NBS research and experience, covering areas of interest to the con- sumer. Easily understandable language and illustrations provide useful background knowledge for shopping in today's tech- nological marketplace. Order the above NBS publications from: Superintendent of Docu- ments, Government Printing Office, Washington, DC 20402. Order the following NBS publications— Fl PS and NBSIR's—from the National Technical Information Services, Springfield, VA 22161. Federal Information Processing Standards Publications (FIPS PUB)— Publications in this series collectively constitute the Federal Information Processing Standards Register. The Register serves as the official source of information in the Federal Govern- ment regarding standards issued by NBS pursuant to the Federal Property and Administrative Services Act of 1949 as amended, Public Law 89-306 (79 Stat. 1127), and as implemented by Ex- ecutive Order 1 1 7 1 7 (38 FR 12315, dated May 1 1 , 1 973) and Part 6 of Title 15 CFR (Code of Federal Regulations). NBS Interagency Reports (NBSIR)— A special series of interim or final reports on work performed by NBS for outside sponsors (both government and non-government). In general, initial dis- tribution is handled by the sponsor; public distribution is by the National Technical Information Services, Springfield, VA 22161, in paper copy or microfiche form. BIBLIOGRAPHIC SUBSCRIPTION SERVICES The following current-awareness and literature-survey bibliographies are issued periodically by the Bureau: Cryogenic Data Center Current Awareness Service. A literature sur- vey issued biweekly. Annual subscription: domestic $35; foreign $45. Liquefied Natural Gas. A literature survey issued quarterly. Annual subscription: $30. Superconducting Devices and Materials. A literature survey issued quarterly. Annual subscription: $45. Please send subscription or- ders and remittances for the preceding bibliographic services to the National Bureau of Standards, Cryogenic Data Center (736) Boulder, CO 80303. U.S. DEPARTMENT OF COMMERCE National Bureau of Standards Washington, DC. 20234 OFFICIAL BUSINESS Penalty for Private Use, $300 PENN STATE UNIVERSITY LIBRARIES SPECIAL FOURTH-CLASS RATE BOOK