ThermoDex Previous   Contents   Next Issues in Science and Technology Librarianship Summer 2000 DOI:10.5062/F4RV0KPJ URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. ThermoDex: A Tool for Mining a Science Library Collection for Thermodynamic Information David Flaxbart Chemistry Librarian University of Texas at Austin Austin TX 78712 flaxbart@uts.cc.utexas.edu Abstract Searching print reference sources for thermodynamic data can be a tedious and often frustrating task for librarians and scientists alike. This article describes ThermoDex, a web-based finding aid developed at the University of Texas at Austin, that indexes over 200 thermodynamic data collections and handbooks. ThermoDex allows a user to identify specific resources that might contain particular types of data, and offers a way to rediscover underutilized sources that might otherwise be overlooked. It strives to serve both end users and reference librarians as a link between traditional print resources and the web tools preferred by most users. "I need to find the eutectic point for a mixture of naphthalene and diphenylamine. Where would I find that?" Many science librarians dread questions like this, and for good reason. The search for data in the physical sciences can be time consuming and frustrating for librarians and researchers alike, and the failure rate is probably quite high, especially for patrons who choose not to seek help. The importance of accurate property data in science and engineering has been well documented over the years. (Arny 1984; Lide 1981; Maizell 1998) If people who need data cannot find them readily, or if the data they do find are erroneous, inaccurate, or of dubious origin, potentially costly and dangerous mistakes can be made both in the laboratory and in the real world. A survey of American Chemical Society members in 1965 demonstrated that scientists did not think too highly of print data compilations at the time. One respondent neatly summarized the problems inherent in published data compilations: 1) They are never current; 2) One can't be sure of the reliability of data collected from primary sources; 3) Data are difficult to locate without knowing in advance which compilations contain which data; 4) Researchers often prefer more expedient but less effective ways of hunting for data (e.g., searching in a bibliographic index). (Weisman 1967) These observations certainly hold true today, but despite the added presence of various kinds of digital data resources, print compilations remain important sources of this information. The rapid acceptance of remote access to digital library resources has made many researchers more reluctant than ever to visit the library and to consult librarians. In addition, researchers accustomed to split-second search results from online tools are much less likely than previous generations to spend long periods hunting for printed data in their library. Paradoxically, as the volume of scientific data expands, the patience and mobility of researchers are diminishing. These factors tend to marginalize the print collections that librarians have spent decades developing and maintaining. Highly useful tools that exist only in print form are unknown to new generations of potential users, and their "old timer" users are gradually disappearing. What can libraries do to keep useful print resources on the radar screen? Librarians have created innumerable pathfinders, reference guides, bibliographies, card files, and other finding aids over the years. Some have been published, and proved useful for a time. (e.g., Northup & Cromer 1993) But ultimately a bibliography is hobbled by being frozen in time, and the frequent revisions needed to keep it current are very time-consuming. Finally, print guides, whether published or used as local library handouts, reach a fairly limited audience. ThermoDex: Background {ThermoDex} was developed at the University of Texas at Austin (UT) with these problems in mind. Initially it was a microcomputer database created to mimic and automate that old reference desk standby, the ready-reference file. Since many of the most difficult questions involved physical property data, it was decided early on to limit the project to that type of data. Its first incarnation, around 1994, was as a FileMaker database on the chemistry librarian's computer, and it was used as a reference aid. In 1995 the library migrated it to the World Wide Web, and ThermoDex was born. Steadily developed and expanded since then, it now contains records for about 200 monographic and series compilations of physical property data. ThermoDex does not contain the actual property data. It is rather a "meta-index" or finding aid in which a searcher can locate books that might contain the data sought. For example, in the question above, one would select "eutectic temperature" from a list of Properties, and (optionally) "organic" and/or "liquids" from a list of Compound types. The search engine queries the database and returns a list of possible sources of this data. The sources are listed with title and library call number, and the searcher can then consult those books individually in the library, or e-mail the librarian for help in doing so. ThermoDex serves three primary purposes: It aids librarians as they answer patron reference questions, saving time as well. It keeps print collections used and useful -- data sources that might be overlooked or forgotten, often because they are scattered within the library, come to light quickly, helping both patrons and librarians to remember that such resources still have value. It facilitates the remote use of library print collections, regardless of patron location. It helps researchers and librarians who do not have large reference collections close at hand to "use" the library in a virtual sense. Whether a chemist is down the hall or across an ocean, these data compilations are accessible, albeit indirectly, to everyone, with the mediation of reference librarians and document delivery methods. Although the books indexed by ThermoDex primarily represent the holdings of one research library, it can be used as a point of access to any library that owns those books. Description Content ThermoDex contains bibliographic records for handbooks and other data compilations that contain significant amounts of thermodynamic, thermochemical, or thermophysical property data for chemical compounds, elements, mixtures, and other substances in solid, liquid, and gas phases. The meaning of "thermodynamic" is left deliberately broad, and the file contains many kinds of data that are not strictly thermodynamic, such as electrochemical, solubility, and kinetic data. But since many of these properties are temperature-dependent, they can be treated as "thermodynamic" for the purposes of library searching, if not in a strict scientific sense.1 The database includes both well-known and obscure sources, but they have generally been selected for their usefulness and availability, and they represent a wide variety of properties and compounds. There is definitely overlap among the sources: some data are common and relatively easy to find in many places, while other data are much more difficult to locate. One criterion for inclusion is that a book must be predominantly composed of data tables or graphs, rather than text and theory. Most data-intensive books include some explanatory and theoretical background, but this should be minimal. Most indexed sources also fall into the category of secondary sources, i.e., those that gather and organize data originally reported elsewhere, usually in the primary journal literature. Some compilations are critically evaluated; however, most are not.2 ThermoDex is not meant to be the first, last, or only resort in the quest for hard-to-find data. Users are advised to consult well-known handbooks first, because one can answer a great many questions with standard tools such as the CRC Handbook of Chemistry and Physics or Lange's Handbook of Chemistry. Consequently, these resources are generally excluded from ThermoDex. On the other end of the spectrum, large chemistry series such as the Beilstein and Gmelin Handbooks (and their online equivalents, Beilstein/Gmelin Crossfire) contain so much data at the compound level that they should also be separately consulted. A third major handbook series, Landolt-Börnstein, is selectively represented in ThermoDex, as is the old National Standard Reference Data Series (NSRDS) published by the National Bureau of Standards. Fee-based online datafiles, such as those found on STN, are excluded. New books are selected for addition to ThermoDex as the library acquires them; anyone is encouraged to suggest additions to the database. Since the database has reached a level of critical mass, it is no longer growing very rapidly -- perhaps 10-20 new records are added each year. While most are part of the Chemistry Library's reference collection, items from other science branches on campus also appear. Expanding the content beyond UT's collection has always been a goal, and for a time Penn State University contributed indexing for additional titles not held at Texas. Searching The search interface to ThermoDex was designed to be as simple as possible. The search page presents two categories of pre-selected search terms: one section for Properties, and one for Compounds. (An earlier version included a free-text search box, but since keyword searching significantly reduced precision and returned many null results, it was removed.) The searcher can select, via checkboxes and a scrolling list, one or more Properties, along with one or more Compound identifiers. ThermoDex combines selected terms from within each section with a Boolean OR, and then combines the two sections with a Boolean AND. (Property 1 OR Property 2 OR Property 3 ...) AND (Compound 1 OR Compound 2 or Compound 3 ...) The user does not need to select both a property and a compound. A search on one section alone will yield hits that do not specify anything from the other section. The checkboxes represent the most commonly sought terms within each section, while the scrolling box offers a much more extensive list of terms to choose from. Multiple items within a scrolling box can be selected by pressing the Control key (Windows) or the Apple key (Macintosh) when clicking on them. Bibliographic Data The list of results shows title, Library of Congress call number and library location. The user can click on a title from the results list to see the full record for that item. The bibliographic data include the book's title, publisher, year of publication, ISBN, and OCLC number. A brief abstract describes the content, arrangement and indexing of the work, as well as an occasional remark about the book's quality, authority, or usefulness. (Some handbooks, as librarians well know, are straightforward to use. Others are simply awful, lacking any kind of sensible layout or indexes.) The list of Property and Compound indexing terms assigned to the book also appears in the full record. Sample Results List Sample Full Record Technical Details ThermoDex has undergone a number of transformations during over the years. From its origins as a FileMaker database, it was converted to an Oracle database with a JavaScript interface, and finally to its current state as an LDAP database searched by a PHP interface. The Digital Library Services Division (DLSD) of the UT-Austin General Libraries carries out the database programming. The programmers find it useful as a test-bed file to experiment with new data formats and search engines, so it has evolved along with those technologies. Management of the Database The chemistry librarian manages the content of ThermoDex via a simple administrative web interface. New records can be added, and existing records can be modified or deleted easily. In order to maintain consistent indexing terms, the Property and Compound terms are selected from a programmed list. Updates to the database occur in real time -- there is no test or pre-production version. Issues and Problems Maintaining a database like ThermoDex does raise some thorny problems. Inconsistency of chemical nomenclature. The fact that any given chemical substance can have an almost infinite variety of names -- common names, systematic names from various conventions, trade names, acronyms, etc. -- makes working with chemical information very complicated. Thermodynamic handbooks are prepared by compilers who use chemical names as they themselves see fit, and there is little or no standardization. Some recent compilations contain Chemical Abstracts Registry Number indexes to reduce confusion, but earlier books lack this feature. Users often are ignorant of the CAS Registry system anyway, and will not have a registry number in hand. Many books have molecular formula indexes, but if the formula is not known in advance, these are of little use. Most disturbing is the number of handbooks that contain no indexes at all, although these are mainly the older ones. The Compounds list in ThermoDex uses common names for substances that appear frequently in handbooks. (Examples: methane, argon, ethanol.) More complex compounds are lumped under a general heading describing the type of compound (Examples: hydrocarbons, organic). Indexing to the Compound Level For handbooks that cover hundreds or thousands of chemical compounds, it is not feasible to create indexing for each and every compound, so general compound-type headings are used instead. Creating compound-specific indexing for all the handbooks in ThermoDex would require thousands of hours of expert labor and would have to be based on a system that collated standard synonyms under CAS registry numbers, with reference to the actual data points included for each. The impossibility of doing this is unfortunate, given that most requests for thermodynamic data are indeed compound-specific, and the general headings like "organic" and "inorganic" that must be assigned to books in the name of brevity are often too broad to be useful. Inconsistency of thermodynamic terms Like chemical nomenclature, there is often little agreement on what to call certain thermodynamic properties. Non-specialists, including the creator of ThermoDex, can be bewildered by the array of confusing and overlapping terminology, symbols, and units. (Northup 1993, p.61.) Without specialized knowledge, it is often difficult to interpret a table filled with numbers, Greek characters, subscripts, and cryptic notations. (Even specialists can have trouble with them, and this can usually be blamed on the compiler, who may be the only person to whom a data table is actually clear! User-friendliness does not seem to be a consideration for some compilers, editors, and publishers of data handbooks.) This makes indexing handbooks a challenging task, and the rule of thumb is to keep it simple. The goal of ThermoDex is to point users to potential sources of data; the onus of interpretation and use of the data found is left mainly to the user. Some basic decisions on terminology were made early on (e.g., using "heat of X" in place of "enthalpy of X" in all cases), but there is definitely some overlap and inconsistency in some of the Properties headings. Experts who use ThermoDex are encouraged to send in corrections and clarifications at any time. Hierarchical headings If a handbook contains data for only one compound, such as ethanol, it makes sense to assign the compound term "ethanol". Ideally the more general terms "alcohols" and "organic" would also be assigned, and in some cases this has been done. But inconsistencies grow over time, and it is difficult to systematically review the database locating and correcting them. The current architecture of the database does not allow global subject authority revisions. Searching Tips After several years of user feedback, it is clear that the principal obstacle that users encounter in ThermoDex is the overspecificity of queries. For example, suppose that a researcher is looking for heat of formation data for 1,4-dichlorobenzene. There is, predictably, no specific compound entry in ThermoDex for 1,4-dichlorobenzene. But if one searched for "heat of formation" with "organic" several handbooks would be pulled up, some of which would likely contain this data. Unfortunately, some searchers give up before making this generalization. The Help page attempts to advise users of the best search techniques, and a related {Thermodynamics page} offers additional commentary for those hunting this kind of data. But as librarians know, most people don't consult help screens. Future Directions Over the years, users have sent in suggestions for expanding and enhancing ThermoDex. A common wish is that the database include the actual thermodynamic data. This simply isn't feasible, primarily because of license and copyright restrictions prohibiting the open redistribution of data. The costs involved would also be substantial. Another enhancement is to include links to Web-based resources, such as the NIST Chemistry WebBook. At present ThermoDex is limited to printed materials, but could be programmed to include electronic tools in the same way. This is a priority for further development. ThermoDex has proven to be an interesting and positive experiment in opening up library collections to all kinds of users. It addresses one of the most difficult and time-consuming areas of reference work. Perhaps most importantly, it attempts to use digital library technology, however simply, to link remote patrons with print collections that are underutilized but still highly useful. Footnotes 1. Thermodynamics is defined as "the study of the transformation of energy." Thermochemistry refers to "the transformations of energy associated with chemical reactions." (Young 2000) Thermophysical properties concern energy transformations in substances and systems that do not involve a change in chemical composition. 2. Critical evaluation of data is the systematic evaluation of reported data for accuracy, consistency, and clarity of presentation. In the U.S., university, private and governmental data evaluation centers carry out this work. Principal among them is the Standard Reference Data Program at the National Institute of Standards and Technology (NIST; URL: http://www.nist.gov/srd/intro.htm). Uncritical compilations gather and republish reported data without thorough evaluation for accuracy. (see Arny 1984, pp.17-25.) References Arny, Linda Ray. 1984. The Search for Data in the Physical and Chemical Sciences. Special Libraries Association, New York. Lide, David R. 1981. "Critical data for critical needs." Science 212 (4501) 1343-49. Maizell, Robert E. 1998. How to Find Chemical Information. 3rd ed. Wiley, New York. p.403-40. NIST Chemistry WebBook. [Online] Available: http://webbook.nist.gov/ [August 10, 2000] Northup, Diana, and Cromer, Donna. 1993. "Thermodynamic properties of substances: a selected annotated guide to the printed literature." Science & Technology Libraries 14(1) 57-95. Weisman, Herman. 1967. "Needs of American Chemical Society members for property data." Journal of Chemical Documentation 7(1) 9-14. Young, Robyn V., ed. 2000. World of Chemistry. Gale Group, Detroit, p.1084. This article is based on poster sessions presented at the American Chemical Society Spring National Meeting in 1998, and at the ACRL-STS program at the ALA Annual Conference in Chicago in 2000. The author owes a debt to current and past staff in the UT General Libraries Digital Libraries Services Division (DLSD) for their ongoing assistance and expertise in making ThermoDex work: Audrey Templeton, Erik Grostic, Ladd Hanson, and Mark McFarland (Division Head). Previous   Contents   Next