Digital Archiving Previous   Contents   Next Issues in Science and Technology Librarianship Summer 2003 DOI:10.5062/F4ZC80TC Digital Archiving: Journey from Books to Analytical Informatics Marie Scandone Database Product Manager marie_scandone@bio-rad.com Deborah Kernan Marketing Communications Manager deborah_kernan@bio-rad.com Bio-Rad Laboratories, Inc. Informatics Division 3316 Spring Garden Street Philadelphia, PA 19104-2596 Abstract It can be an enormous undertaking to obtain spectral information from a number of different analytical instruments, present it in a digital format and archive the data. Sadtler Research Laboratories has been producing quality spectral information for the analytical laboratory since 1947. The history is fascinating and the process is unusual. The journey that Sadtler Research Laboratories has taken to become Bio-Rad Laboratories, Informatics Division is a part of the history of the evolution of chemical information. Along the way, Bio-Rad changed their method of spectral data delivery but retained focus on the quality of the analytical information. This article examines the transition from print to digital media. History Sadtler Research Laboratories started publishing its spectral collections in 1955. Over the years, libraries all over the world built a reference resource using the "green books" for their spectral databases. In 1980, they produced the first of their digital databases but distributed the books as well. However, in 1996, Bio-Rad ceased all hard copy publications, realizing that in science and technology, the digital revolution was on-going. Industry demanded that due to speed and delivery, digital media were the norm for data. This journey from paper to digital data has not always been easy but the rewards for the library and the user are enormous. Benefits of Digital Archiving Some of the benefits in the migration to digital data are: Shelf space conservation, a valued commodity in constant demand in the library Improved access to reference data Easy to cross-reference or cross-link data Easy-to-use reference resource that includes on-line training tutorials Easy-to-upgrade resource Standard format for spectral data Greater searching capability via a quick "searcher friendly" system Intuitive user interface Ability to incorporate laboratory data generated by students and faculty Simplification of teaching and research Barriers in the Digital Age Some of the barriers which still exist, even for digital data are: Diverse needs from user types (academic, forensic, polymer); "one size doesn't fit all" Content available but at moderate to high cost; generating new spectral content is costly Scaling product to variable funding sources Migration issues for old or "heritage" data on the part of users Wide variations in skill with computing, software learning curve, and consumer's time available to learn software Constant changes in instrument operating systems As information sharing evolved, there has been great pressure for development beyond a LIMS system to bring diverse analytical information, consisting of multiple spectroscopic data types, to a variety of workgroups. Traditionally, there has been no common interface to make using and sharing spectral data easy. For example, an infrared laboratory shared information with the NMR workgroup, but only reported results. However, as the amount of information available increased, the accessing, processing and examining of spectral, structural and chemical information has become a necessity. Knowledge Management System In Bio-Rad, as the spectral database grew, so did the problems in management of that data. With an enormous database of various types of spectral data, chemical structures, and chemical and physical properties, a management system was needed to manage all that disparate data. In the 2000s and beyond, digital data has led to efforts to integrate software with knowledge bases and smart algorithms to offer advanced "smart systems" to users. This has ushered in the concept of management beyond data, to knowledge management. A knowledge management system of spectral data must allow the user to: Build analytical databases Analyze analytical databases Access analytical databases Create knowledge Manage knowledge Communicate knowledge Report information Archive instrument data files Share instrument data files Compare instrument data files Search instrument data files An informatics system was needed to address the need for knowledge management. Therefore, Bio-Rad developed its present software, the KnowItAll(r) Informatics System. It is the culmination of 55 years in the field of spectroscopy. It is a software package that provides increasingly efficient solutions for the analytical informatics consumer that combine the means necessary to build, analyze, and access analytical databases with the ability to create, manage, and communicate knowledge from those databases. The KnowItAll Informatics System can manage all the data produced in the laboratory from spectral data to structure data to text and web links. It allows archiving of all digital information generated in the analytical laboratory and it is not dependent on instrumentation or operating systems. It examines data created using a variety of software packages and manages it one unified system. Cornerstone of Spectral Data Management The cornerstone of any management of spectral data is the spectral database itself. Spectral data are valuable tools in confirming the identity and quality of a compound. It is expensive to create spectral data, so it becomes a valuable asset of any laboratory. Putting all these data into one place, with a single, integrated user interface can aid product development and provide new ways of analyzing samples as well as make use of predictive models. By making real data and predictive data readily available and easy to use, a user can exploit new scientific developments. Management of the data can reduce the time and costs to bring products to market. It can meet regulatory requirements for archiving and tracking results. It can improve laboratory workflow and convert data and information into knowledge. It can accelerate decision-making and help to properly evaluate leads and new substances before they advance in the development process. It may be internally generated or externally created. The KnowItAll Informatics System allows the user to archive, search, and manage all the data generated in the lab that gives access to high-quality reference databases. Once spectral data are archived, there must be a mechanism to search the data. Various methods of searching should be available. One search method is by identification. This may be a barcode, a name, a lot number, or any unique identifier used to mark the spectrum. The next method of searching is the spectral search. This allows the comparison of similar data in a database and, when using external databases, provides a means to match or locate spectra. A peak search provides a quick way to match points in a spectrum with spectrum in the database. It attempts to match each peak in an unknown spectrum with each peak in a reference spectrum. A property search allows the user the flexibility of using additional information collected on a chemical. A record may contain chemical and physical information. Melting points, boiling points, physical descriptions, source of data, etc. should be part of the permanent record of a chemical and in a management system and all information should be searched and utilized. Finally, if a chemical structure is defined there should be the ability to search the database for exact matches or at the very least, substructures. Therefore, the user needs the ability to edit and/or create chemical structures and to import files in different formats. The software can handle multiple techniques so it can expand the knowledge of everyday chemists to predict results using NMR instrumentation beyond the expertise of a few skilled spectroscopists. Using algorithms with the ability to predict shift values using additivity rules and refined with the use of a large database of peak values to create an expert system, prediction may be employed. A fully integrated environment for analytical techniques allows a user to transfer information from workgroup to workgroup and permits researchers to have access to all information in one place. That means creating and searching IR, Raman, NIR, Vapor Phase IR, Condensed Phase IR, 13C NMR, 1H NMR, XNMR, Mass Spectrometry, UV/Vis, and GC data and ideally through a single interface. Laboratory instruments may come with spectral processing tools but normally concentrate on one analytical technique at a time. Multiple users can readily share data while each user processes the data on his particular instrument. A new, elegant system architecture has been designed to increase access to information and share that information with less effort. With the combined power of a high-quality data set and the KnowItAll environment, researchers can search by spectrum, peak, structure or property, access reference spectra, make predictions, build databases with spectra, structures and chemical information, and even cross-reference data with other analytical techniques, such as IR, Raman, NIR, Vapor Phase IR, Condensed Phase IR, 13C NMR, 1H NMR, XNMR, Mass Spectrometry, UV/Vis, and GC data as well as generate high quality reports and laboratory forms. Combined with a large reference collection of spectra prepared using IR, NMR, and Mass Spectrometry analytical techniques, this unique software facilitates the use of multiple techniques in the laboratory. KnowItAll has the ability to manage data from multiple sources and helps users keep track of various types of spectral data. More importantly, it facilitates the knowledge management of spectral data by using the power of the data and making it accessible to all users. Conclusion Knowledge management of spectral data can assist in preserving valuable knowledge and identification of that knowledge can contribute to an organization's success. With new and powerful informatics tools, scientists can now accomplish the goal of enterprise-wide information access. Previous   Contents   Next