Biology Article Retrieval from Various Databases Previous   Contents   Next Issues in Science and Technology Librarianship Fall 2005 DOI:10.5062/F4R78C4Q URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. Biology Article Retrieval from Various Databases: Making Good Choices with Limited Resources D. Yvonne Jones Reference Librarian Rollins College Winter Park, Florida djones@rollins.edu Abstract If the top tier indexing resource in a discipline is not available on an ongoing basis for undergraduate student use, can acceptable article retrieval be obtained using other available resources or combinations of resources? This paper examines a single search term comparison of the biological research literature retrieved from BIOSIS Previews as well as eight other databases: BasicBIOSIS, ArticleFirst, ECO, ProQuest, WilsonWeb, SciFinder Scholar, HighWire, and MEDLINE. Comparison is also made with results from the recently available beta version of Google Scholar. A combination of databases yielded a result rate of 59% of the most recent eight years of BIOSIS results. A 51% return rate compared with BIOSIS could be obtained from searching only two major interfaces, the FirstSearch provider for ArticleFirst and ECO along with SciFinder Scholar. Google Scholar provided a 56% retrieval in comparison with the most recent BIOSIS results. A significant benefit of the Google Scholar search was the sizable number (N=69) of additional articles retrieved that were not found in BIOSIS. Introduction The best resource for identifying articles in a particular discipline is often the indexing and abstracting service provided by the primary professional organization in that discipline. In chemistry, the primary resource is CAS (Chemical Abstracts Service), a division of the American Chemical Society, providing access to the chemical literature since 1907 (CAS 2004a). In mathematics, the MathSciNet database, created by the American Mathematical Society, is the primary key to the mathematical literature since 1940 (American Mathematical Society 2004). In medicine, the primary source is PubMed, created and maintained by the National Institutes of Health through the National Library of Medicine and providing access to the literature from the mid 1960's (NCBI 2004). In biology, the primary source for researching general biological information is through BIOSIS (BioSciences Information Services) and its products: BIOSIS Previews, Biological Abstracts, and Biological Abstracts/RRM (Reports, Reviews, Meetings). BIOSIS was originally begun in 1926 by the Union of American Biological Societies, when it merged its Abstracts of Bacteriology and Botanical Abstracts to create an overall biology abstracting service (OCLC 2003). In their recent book, Using the Biological Literature: A Practical Guide, Schmidt et al. (2002) describe BIOSIS as "...the largest, most comprehensive biological abstracting service in the English language." Unlike the previously mentioned examples of primary resource databases, BIOSIS is no longer associated with the biological professional society, having been purchased by Thomson Corporation in 2004 (Blake 2004). In an ideal world, BIOSIS would always be the chosen resource for researching the biological literature. In the real world, where resources are available within financial constraints, the choice of BIOSIS is not as obvious. Initial costs to acquire the BIOSIS Previews database are over $4,000 per year for the current year, plus up to $26,000 for complete backfile purchases (Thomson 2004a). For small liberal arts colleges, these costs are a definite factor. The task of choosing the "best" resource transforms into the challenge of choosing the best available resource or collection of resources, given the existing library collection of databases. The database currently recommended by my college biology faculty for undergraduate student use is the ArticleFirst database. Is this recommendation indeed the best search strategy available? This paper presents article retrieval on a particular biology research question from a variety of databases currently available at a small liberal arts college in comparison with the results obtained from a search of BIOSIS. In addition, comparison is made with article retrieval available using Google Scholar. The intent is to determine whether acceptable article retrieval can be obtained without accessing the primary biology research database, BIOSIS. Methods For this study, article retrieval is compared between BIOSIS Previews (available through DialogWeb and currently requiring a librarian-mediated search at my college) and eight other databases available at the college to undergraduate students: BasicBIOSIS, ArticleFirst, ECO, ProQuest, WilsonWeb, SciFinder Scholar, HighWire, and MEDLINE. Seven of these eight comparison databases are available directly to undergraduate students without password verification or library mediation, although SciFinder Scholar does have restrictions on simultaneous use. The BasicBIOSIS database requires a password to access, but is included in this analysis because of its unique content, marketed as a "student version" of BIOSIS (Schmidt et al. 2002). Although there are several more databases available for student use at the college, these eight were selected for comparison because of their significant inclusion of biological information. ArticleFirst and ECO are both provided through OCLC's FirstSearch interface. ArticleFirst provides indexing of articles from the contents pages of over 15,000 journals from 1990 to the present. ECO (Electronic Collections Online) is an OCLC collection of over 4,800 scholarly journals. Coverage for ECO is from 1995 to the present, with significant full text access. BasicBIOSIS is also accessed using FirstSearch, but under a different subscription, requiring a password-access screen. BasicBIOSIS covers about 350 popular and scholarly science journals, considered to be a "basic core of life science journals". The dates of coverage include the current and last four years. ProQuest and WilsonWeb are general research databases providing coverage of many disciplines for student use. ProQuest Direct provides access to more than 5,000 scholarly journals, popular magazines, and newspapers. There is a specific ProQuest Biology Journals module of the database which accesses 319 full text life science journals, with average date coverage from 1998 to the present, although this varies from journal to journal. WilsonWeb OmniFile searches over 3,500 publications in various fields, including all the journals indexed in the Biological & Agricultural Index. Dates of coverage for the Biological & Agricultural Index journals begin with 1983 and go to present, with varying full text access. SciFinder Scholar is a specialized chemistry literature database (CAS 2004b), that also includes significant numbers of articles in other science areas, such as biology. SciFinder Scholar provides access to over 9,000 chemistry-related (very broadly defined) scientific journals, with coverage from 1907 to the present. HighWire and MEDLINE are both free resources available on the Internet. HighWire Press is a product of Stanford University Libraries and "hosts the largest repository of free full text life science articles in the world" (HighWire Press 2003). The date coverage for HighWire access varies widely, with some journals providing archival access and others limiting access for their most recent issue but allowing past access. Most of the literature is recent, however, from 1995 forward as online availability became more common. MEDLINE citations and abstracts are the major part of the National Library of Medicine's PubMed database (NLM 2004) and are available free on the Internet as well as frequently being included along with other database offerings. SciFinder Scholar and HighWire both provide searches of MEDLINE in addition to their own journal access. The NLM has provided MEDLINE access to the scientific literature since the 1960s. In addition to these databases, a final comparison of article retrieval was carried out using the beta version of the web search engine Google Scholar, which became available Nov. 18, 2004 (Acharya 2004). This free online service promises to bring the strength of the Google search engine to the arena of scholarly publications. The current research question offered a chance to get an idea of how successful this might be. For the research analysis presented here, a single search term was used, the Genus name for a type of periwinkle, "Nodilittorina." This search was an actual reference request that started the path of enquiry leading to the comparison presented here. Using this specific single term produced limited results, which enabled comparison of individual articles across the databases. Clearly the choice of search term made will influence the article retrieval results obtained. By confining this paper to results based on searching with a single zoological term, extrapolations about the most useful databases for other biology searches in comparison with BIOSIS are limited. It is very probable that a search term targeted at ecology, botany, medicine, or another area of biology would produce varying results. However, given the general multi-disciplinary nature of most of the databases examined, it is considered unlikely that the general trend of results in comparing the databases as a group versus BIOSIS would change significantly. Only in the case of the SciFinder Scholar database, particularly designed for chemistry information, would one expect major changes in results with the choice of a search term more or less chemistry-focused. Results and Discussion The results matrix obtained from using the single search term "Nodilittorina" in BIOSIS and the eight comparison databases is shown in Table 1. (For clarity, only half of the matrix is shown, as it is perfectly mirrored in the opposing half.) The first column of Table 1 displays the number of articles in each database that were also found in BIOSIS. Each successive column provides the articles found in the database listed at the bottom of the column and indicates any overlap of those articles with the other databases. Reading across the bottom row, labeled "Totals", provides the total number of articles retrieved from each database. Table 1. Number of articles retrieved from various databases using search term "Nodilittorina". Basic BIOSIS 5                 ArticleFirst 11 1               ECO 10 2 4             ProQuest 5 4 2 3           WilsonWeb - - - - -         SciFinder Scholar 6 - - - - -       HighWire 10 4 2 3 10 - 4     MEDLINE 4 - - - - - 4 4   TOTALS 110 5 11 10 13 2 9 17 4 Databases BIOSIS Basic BIOSIS ArticleFirst ECO Proquest WilsonWeb SciFinder Scholar HighWire MEDLINE A first reaction to the results in Table 1 might be the obvious maxim that "you get what you pay for." The results from BIOSIS clearly overshadow all other databases examined. However, as in everything statistical, closer examination is necessary. One of the reasons for the primacy of BIOSIS is that the database provides data from 1969 to the present day (Thomson 2004b). There is no doubt that if one is seeking a complete search of information in the biology area, BIOSIS is the hands-down favorite. Frequently however, a student is only interested in the most recent findings, having already obtained sufficient past background research. A more accurate comparison of the abilities of the various databases to respond to the needs of a student focused on recent research findings would be found by restricting the BIOSIS results to those within the last few years. In order to select a more recent date range for comparison with the BIOSIS results, the earliest research articles found in the comparison databases were determined. The oldest article found by any of the competing databases was a 1991 article obtained from ArticleFirst. ArticleFirst also provided two articles published in 1994. The other databases didn't retrieve any articles until a 1996 publication date. Restricting the BIOSIS findings to those from 1991 to present, the number of articles retrieved decreases from 110 to 62. Restricting to the years from 1996 to present, when all the databases provided results, limits the BIOSIS results to only 39 articles. Table 2 presents a comparison of these three different year groupings of the BIOSIS results with the results from the other databases combined. In addition to the articles found in both BIOSIS and the combined databases, these databases also retrieved some articles not found in BIOSIS. The third column in Table 2 provides the total number of articles provided by the comparison databases in each date range. There is overlap in the coverage of the databases, so that an examination of individual articles retrieved must be compared across the various databases in order to correctly eliminate duplicate articles. After examining the individual articles retrieved, it was determined that an additional fifteen articles were retrieved from four of the databases: ProQuest, WilsonWeb, SciFinder Scholar, and HighWire. When these additional articles were counted, the total number of articles retrieved using the combined databases came to 38 for the 1996+ year grouping and 41 for the 1991+ and 1969+ year groupings (Table 2). Table 2. Number of articles retrieved from three date ranges searched in BIOSIS and combined comparison databases.* Date Ranges Articles in BIOSIS Articles in BIOSIS and Combined Databases* Articles in Combined Databases* 1969+ 110 26 (24%)** 41 1991+ 62 26 (42%) 41 1996+ 39 23 (59%) 38 *BasicBIOSIS, ArticleFirst, ECO, ProQuest, WilsonWeb, SciFinder Scholar, HighWire, MEDLINE ** Percentage of BIOSIS results Table 2 demonstrates that if the most recent eight years of research are targeted, using a combination of the existing resource databases provided 59% of the BIOSIS results, a much more "acceptable" figure than the 24% retrieval when compared with the entire BIOSIS database results (1969+). Not only did the combined databases provide 59% of the content available through BIOSIS, but they also yielded additional articles. Was the content of the additional fifteen articles provided by the combined databases of comparable value to those retrieved by BIOSIS? A closer examination of the additional articles found showed that the primary reason for their retrieval was the availability of full text searching in these databases, whereas BIOSIS searches on citation and abstracts alone. An exception was the SciFinder Scholar database, where articles were retrieved from journals not searched in BIOSIS. The articles retrieved due to full text searching were typically focused on research in related molluscan species, where the references to Nodilittorina provided comparisons on various characteristics. These comparisons were often a simple listing of habitat or distribution patterns relative to the primary research species. New research findings relating to Nodilittorina were not found in these papers. The SciFinder Scholar articles retrieved because of journals indexed uniquely in that database also tended to be of marginal interest. These articles examined various radioactive isotope activities or lysozyme activity in marine species. Such a focus on measurement of various activity levels and isotope contents is consistent with a chemistry indexing database, but unlikely to be of value for general undergraduate student research. Thus, while comparable numbers of articles were retrieved by using a combination of the available resources (38 compared with 39 in BIOSIS), the additional articles retrieved outside BIOSIS were typically of lower value, due to less primary focus on the search term or narrow focus on analytical measurement and content. In comparing the usefulness of searching the combined databases, there is the additional concern that the burden of searching several databases without a single search screen is likely to cause student resistance. Is there a less burdensome strategy of searching only a few of the alternate databases, which might still provide acceptable results? If one focuses on trying to maximize the BIOSIS articles retrieved, a practical strategy can be proposed by choosing the database with the highest number of articles retrieved and then moving to each database in turn based on their additional retrieval capacity. In the search results obtained here, Table 1 shows that the ArticleFirst database had the highest number of articles also in BIOSIS (N=11). Here is the answer to why the biology faculty recommended this database to their students! It is a pleasing result that the biology department's "historical wisdom" coincided with the search results from this study. Although it is certainly possible that given different search terms differing results might be obtained, the consistency with the departmental recommendation is encouraging in this instance. It must be noted that there is only one article difference between the ArticleFirst results obtained and those from ECO or HighWire. It is quite likely that the biology faculty have not been recommending HighWire to their students as it was only created in 1995 and fairly recently has reached a level of content that provides significant searching benefits. ECO searches about a third of the number of journals covered by ArticleFirst, thus is probably considered less valuable. After ArticleFirst, one could choose either ECO or HighWire, each returning ten articles found in BIOSIS. Looking at Table 1, HighWire would be preferred as it showed less overlap with ArticleFirst. Here practicalities would intervene in a recommended search strategy. ECO and ArticleFirst are both provided at my college by the OCLC FirstSearch interface making it more convenient to combine a search of both these databases in one sitting. In this study, after removing articles previously retrieved from ArticleFirst, searching ECO added another six BIOSIS articles, bringing the total to seventeen. The next highest retrieval was from SciFinder Scholar. The six articles provided from SciFinder Scholar had no overlap with the other two databases and brought the total retrieval number from this search strategy to 23 when compared with the 1969+ or the 1991+ BIOSIS results and 20 if counting only the most recent eight years (1996+). All four of the MEDLINE articles were provided by SciFinder Scholar, so there was no benefit in adding MEDLINE to the search strategy. HighWire only provided two additional articles after the other databases had been searched and BasicBIOSIS yielded only one distinctive article, while ProQuest and WilsonWeb added no BIOSIS articles Based on these results, a practical biology search strategy could be proposed as a two-step process: first utilizing the OCLC FirstSearch interface to search the ArticleFirst and ECO databases followed by searching in SciFinder Scholar. In this paper, such a search strategy yielded 20 articles, 51% of the BIOSIS results for the most recent eight-year period. A slight improvement in results could be made by searching additional databases, but 87% (20/23) of the results provided by the combined eight databases were retrievable using this search strategy. Previous research comparing BIOSIS with other databases for their value in a specific area has used journal coverage as the comparison variable (Kawasaki 2004; McDonald et al. 1999). In this paper, the variable of comparison is the actual number of articles retrieved using a specific search term. Although results could be expected to be quite similar with these two approaches, the use of the actual search results did highlight the difference in value of databases due to their capabilities in regard to full-text searching versus only citation and abstract. After completion of these database comparisons, a final search strategy was examined comparing BIOSIS results with those obtained using the recently available beta test version of the search engine Google Scholar (Acharya 2004). Table 3 presents the results of this search. Using the single search term "Nodilittorina" in Google Scholar yielded 115 results. Examination of these results revealed five duplicates and eight obviously off-target (journal subscription prices, government general environmental reports, conference outlines, and similar items). Of the 102 remaining results, 30 were also in BIOSIS. When these results were categorized using the same three date ranges previously examined in the combined database comparison, Google Scholar provided a 27% (30/110) retrieval rate compared with BIOSIS for the years 1969 to present, a 39% (24/62) return for the years from 1991 to present, and a 56% (22/39) return for the most recent eight years. These results were quite similar in number to those obtained by using the other databases examined in combination (Table 2).   Table 3. Number of articles retrieved from BIOSIS and Google Scholar in three date ranges. Date Ranges Articles in BIOSIS Articles in BIOSIS and Google Scholar Articles in Google Scholar 1969+ 110 30 (27%)* 102 1991+ 62 24 (39%) 96 1996+ 39 22 (56%) 91 * Percentage of BIOSIS results In contrast with the results from the combined comparison databases, however, Google Scholar retrieved almost five times more additional articles. Google Scholar provided 72 other links to articles and papers not picked up in BIOSIS. Two distinct reasons for this became apparent in reviewing the Google Scholar results. Significant material from non-English journals is provided in the Google results -- Germany, Chile, Korea, and Japan, among others. The other apparent factor is the high number of relatively recent articles, 49 are from the last five years. The backlog effect seen in indexing services is evident for the BIOSIS results. Google can be much more immediate as long as the article is posted on the Internet, as increasingly more articles are. The older articles that were provided by Google Scholar were generally from citation searches, another nice feature of this search engine. Thus, a more current and truly international representation of results was provided by this search engine. A total of 102 focused scientific results for Google Scholar clearly makes this a strategy of choice. The ease of a single search as well will make this a significant tool in academic research. At present, despite the good numbers and citation link feature, the results are messy. They are not arranged in chronological order and the relevance order is unclear. Links to full text are not made seamlessly, as many subscription databases provide, and the search capacities are still rough. The quality of the articles found runs the entire gamut from entirely focused and pertinent (often the same articles retrieved from BIOSIS, although including more current material than BIOSIS as well) to marginal mentions as comparison species and the same chemistry analytical articles picked up in SciFinder Scholar. Nevertheless, the sheer weight of numbers returned, along with the elegant Google search screen, makes this an appealing tool. Conclusion A reasonable search strategy for biology article retrieval using existing resources was formulated, using the OCLC FirstSearch interface for the databases ArticleFirst and ECO and then adding a search of SciFinder Scholar if literature related to chemistry was desired. Using this search strategy with the search term "Nodilittorina" yielded 51% of the number of articles provided by the primary biology literature database, BIOSIS. While the breadth of research literature provided by BIOSIS (online access from 1969+) could not be matched by the existing database resources, the strategy developed did yield reasonable results for the more current research. Significant additional coverage was provided using Google Scholar. The beta version of this search engine yielded 56% coverage of the BIOSIS articles retrieved and over twice the total number of articles provided by BIOSIS from its most recent research years (1996+). The Google Scholar search engine offers a valuable tool, particularly for the science community, where increasingly the research literature is being made openly accessible on the Internet. Acknowledgement The author would like to thank Eileen Gregory for raising the initial question regarding the best biology search strategy, Jim Small for providing the Nodilittorina query, and the Olin Library staff for their support and encouragement. References Acharya, A. 2004. Thursday, October 18, 2004: Scholarly pursuits. GoogleBlog. [Online]. Available: http://googleblog.blogspot.com/2004/10/scholarly-pursuits.html [November 24, 2004]. American Mathematical Society. 2004. About MathSciNet. [Online]. Available: {http://www.ams.org/mathscinet/help/about.html} [November 24, 2004]. Blake, M. 2004. Thomson acquires BIOSIS publishing assets. The Electronic Library 22(2): 210. [Online] Available: {http://proquest.umi.com} [September 29, 2004]. CAS (Chemical Abstracts Service). 2004a. Overview for press and media. CAS: the world's most comprehensive resource -- for chemical and related scientific information. [Online]. Available: {http://www.cas.org/New1/casinfo.html} [November 24, 2004]. CAS (Chemical Abstracts Service). 2004b. SciFinder Scholar. [Online]. Available: http://www.cas.org/SCIFINDER/SCHOLAR/ [November 24, 2004]. HighWire Press. 2003. About HighWire Press. [Online]. Available: http://highwire.stanford.edu/about/ [November 24, 2004]. Kawasaki, J. 2004. Agriculture journal literature indexed in life sciences databases. [Online]. Available: http://www.istl.org/04-summer/article4.html [November 30, 2004]. McDonald, S., et al. 1999. Searching the right database. A comparison of four databases for psychiatry journals. Health Libraries Review 16(3): 151-156. NCBI (National Center for Biotechnology Information). 2004. PubMed overview. [Online]. Available: {http://www.ncbi.nlm.nih.gov/entrez/query/static/overview.html} [November 24, 2004]. NLM (National Library of Medicine). 2004. Fact sheet: What's the difference between MEDLINE and PubMed? [Online]. Available: http://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html [November 24, 2004]. OCLC. 2003. Biological abstracts. WorldCat Detailed Record. [Online]. Available: http://newfirstsearch.oclc.org [November 24, 2004]. Schmidt, D., et al. 2002. Using the Biological Literature: A Practical Guide. Third Edition, Revised and Expanded. New York: Marcel Dekker, Inc. Thomson. 2004a. Email correspondence from University Account Manager, Thomson Corp. 30 September 2004. [Online]. Thomson. 2004b. BIOSIS Previews. [Online]. Available: {http://wokinfo.com/products_tools/specialized/bp/} [November 24, 2004]. Previous   Contents   Next