Chemical Information in Scirus and BASE (Bielefeld Academic Search Engine) Previous Contents Next Issues in Science and Technology Librarianship Summer 2009 DOI:10.5062/F4M906KW URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. Chemical Information in Scirus and BASE (Bielefeld Academic Search Engine) Regina B. Bendig Sciences Liaison Librarian H.G. Thode Library of Science and Engineering McMaster University Hamilton, Ontario bendigr@mcmaster.ca Copyright 2009, Regina B. Bendig. Used with permission. Abstract The author sought to determine to what extent the two search engines, Scirus and BASE (Bielefeld Academic Search Engines), would be useful to first-year university students as the first point of searching for chemical information. Five topics were searched and the first ten records of each search result were evaluated with regard to the type of document, relevance, and full-text access. Results show that both search engines retrieve some useful information as a starting point for research in chemistry and that BASE is better at excluding commercial web sites. The latter also tends to have a higher number of theses indexed, a fact that would be of interest to upper level or graduate students. Both showed limited results with retrieving chemical information when searching by chemical formula and users need to have at least some familiarity with advanced search features when few or no results are obtained. Introduction The use of the Internet as the first source of information among university students is ubiquitous (De Rosa 2005). Even researchers avail themselves of search engines for finding information because of their simplicity, both in searching and display. Yet results are not always satisfactory with regard to reliability of sources and precision of results. The sheer amount of information retrieved can also be overwhelming. The two search engines under consideration -- Scirus and BASE -- provide some level of control with regard to the type of information included and allow more focused searching that can increase precision without requiring the searcher to learn much about search strategies. This investigation sought to determine whether BASE and Scirus could be recommended to first-year university students as the first point of research in finding chemical information. A number of studies have discussed, some of them in depth, the content and the various search features of BASE (Lohmeyer 2005; Pieper and Summann 2006; Summann and Wolf 2005; Summann and Wolf 2006; Pieper and Wolf 2007) and Scirus (McKiernan 2005; Tompson 2007; Fitzpatrick 2008; Jasco 2008). For the current investigation the more salient features will be briefly discussed. BASE (Bielefeld Academic Search Engine) http://www.base-search.net/ BASE, which was developed by the University of Bielefeld Library and launched in 2004, is a free, multi-disciplinary search engine of scholarly resources; the emphasis is on documents contained in digital repositories which index materials according to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard. BASE also includes selected web sites and locally digitized holdings of the Bielefeld University Library. It uses FAST software from FAST Search and Transfer, a Norwegian company, to index the harvested documents. As of May 12, 2009, the number of documents totaled 19,629,947. Many documents are freely available as full-text PDFs or HTML. The search interface is available in English, German, Polish, and Spanish with French being planned. Both a Basic Search and an Advanced Search option are available. The Basic Search page (see Figure 1) displays an uncluttered search box, familiar from Google. It is meant for a quick search without any knowledge of search strategies, although the use of such strategies is also possible. After a search has been executed, the user is given options to zero in on the results by choosing a number of refining options or facets. Figure 1: BASE Basic Search Screen The Advanced Search page (see Figure 2) offers a number of search options, such as specific field searching, refining by geographic location, year of publication, and document type. Boolean, phrase searching, and truncation are available in both simple and advanced searching. Figure 2: BASE Advanced Search Screen BASE offers two features which are unique to freely available search engines, namely a thesaurus (EUROVOC) for selecting terms in up to 21 languages and a Word Form for searching alternative forms of a term, such as singular or plural. Both of these options are available from the simple and advanced search pages and serve to broaden the search query. The results are displayed by relevance with bibliographic information gathered from metadata and a brief summary. They can be sorted by author, title, document size, and date, in either descending or ascending order. A number of refining options are available, such as keyword, year of publication, source, and document type. BASE also offers searching of individual results or of a query through Google Scholar to see citations and alternative versions of an item. Although searching is intuitive, there are some functions that would enhance this search engine, such as a "clear" button, spelling suggestions, link resolver, highlighted search terms and, in the advanced search, journal title search field, although there is a document title field. Scirus http://www.scirus.com/ Scirus is a free search engine for scientific information developed by Elsevier and launched in 2001. It also uses the FAST Search and Transfer technology. Its indexed content is much larger than that of BASE, namely 485 million pages although by far the greatest number (over 417 million (Jasco 2008)) are web sites. It contains both freely available and proprietary sources, such as journals (many from its own ScienceDirect service), institutional repositories, preprints, patents, university web sites, etc. Both Scirus and BASE clearly list their content on their web sites (Scirus 2009; BASE 2009). Scirus also offers a simple and an advanced search option. The Basic search page (see Figure 3) allows for a quick search to be later refined by facets, similar to BASE, and is familiar from Google. Figure 3: Scirus Basic Search Screen The Advanced search page (see Figure 4) gives options for specific field searching, limiting by date, information type (e.g., abstract, patent, article), file format, and content source. Choices for content source include journals, preferred web sites, and the remaining scientific web. A user can also narrow a search by selecting a subject area, such as "chemistry and chemical engineering." Boolean, phrase searching, and truncation are available. Figure 4: Scirus Advanced Search Screen Results are displayed by relevance which is a combination of closeness of terms and frequency of linking by other sites. Each record provides the title of the document, date (if available), brief summary and a link to "similar results." Results can be sorted by date, but only in descending order and only up to 1,000 records can be viewed. The date is not necessarily that of the document, but can be the date the site was last updated. Some notable features include a link resolver, suggestion for alternative spelling, saving of up to 25 searches, and e-mailing and exporting of results. Like BASE, this service lacks a "clear" button. Literature Review A number of studies have evaluated search engines, usually comparing Google or Google Scholar to determine whether other search engines offer better alternatives for scientific or academic research. Dan Lohmeyer's (2005) master's thesis focuses on BASE in relation to Google Scholar, Scirus, and Scopus as a means of discovering scholarly information. Although cautious about drawing definite conclusions from his limited searches, he states that BASE search results are impressive, particularly because among the first ten hits most are available as PDF documents. Scirus, by contrast, includes commercial web sites and links to its own journals which require payment for full-text access if a library does not have a subscription. Scirus and the other search engines outweigh BASE by having a far greater number of hits. His overall assessment of the search engines is that there is considerable overlap and that they complement each other. Dirk Pieper and Sebastian Wolf (2009) take a slightly different approach in their investigation of search engines. Their paper focuses on a comparison of Google and Yahoo with specialized search engines, such as BASE, Google Scholar, OAIster, Scientific Commons, and Scirus to determine to what extent scholarly search engines are preferable in retrieving documents from institutional repositories of research institutions. They state that each of the specialized search engines has its strengths and weaknesses, but that they are preferable to general search engines which, while indexing scholarly documents, make finding them among all the other indexed sources very difficult. In her discussion of CiteSeer, FirstGov for Science, and Scirus, Elizabeth Connor (2005) concludes that these resources can be regarded as supplemental to subscription-based databases and are preferable to general search engines. In his article on Google Scholar and Scirus, Greg Notess (2005) recommends either for searching on narrowly defined topics and for finding full-text articles otherwise not accessible. He also points out that neither Google Scholar nor Scirus comprehensively indexes material that is freely available, such as from PubMed. R. Chakravarty and S. Randhawa (2006) studied Google Scholar, Scirus, and Windows Live Academic. They concluded that Scirus was the most comprehensive for the five searches they conducted in each of the search engines, but that none of them can said to be the best search engine for academic researchers. Ford and O'Hara (2007) found that Google Scholar proved preferable to Scirus because the former's coverage of science and technology articles was more comprehensive. They noted that Google Scholar displayed various versions of an article that in Scirus would only be accessible with payment. The extent of overlap among search engines in the field of biotechnology is the focus of the paper by Rather, Lone, and Shah (2008). Among the five engines examined, Bioweb retrieved all unique items, followed by Scirus. There was more overlap among general search engines, namely Google, Altavista, and Hotbot. So far no study has sought to determine the extent of coverage of chemical information in Scirus or BASE and to assess the value of those two search engines to first-year university students. This paper is an attempt to contribute some evidence in this field for a particular user group. Methods To investigate the usefulness of BASE and Scirus as a source for chemical information for first-year university students a number of searches were conducted in May 2009 and evaluated with regard to their content (relevant/non relevant), type of document, and full-text accessibility. In undertaking this study the following assumptions were made. First-year university students: Use Google as the first source for information; Search by simple keyword; Use the default screen (simple search) to execute their searches; Do not avail themselves of help pages to learn about specific search features; View results in the default view (relevance); View the first ten results; Want full-text access. Five searches were conducted in the simple search option of each search engine. The topics were selected from assignments and questions asked at the research help desk. For comparison the terms were entered with and without quotation marks (phrase searching) to determine whether a greater level of relevance would be achieved. Searches were conducted on: 2 chemical compounds: Bisphenol A Fluorine Fluorosulfate 1 topic Haber-Bosch Process 1 molecular formula (CF3)3BCO -- Boron carbonyl compound 1 name Gerhard Ertl, 2007 Nobel Prize recipient The terms were entered as shown in Table 1. The first ten records for each topic from BASE and Scirus for a potential total of 200 records were compared as to relevance, type of document and free full-text accessibility. For this study relevance or usefulness was defined as a document: being about the topic covering substantially the topic providing an explanation/overview/definition sufficient to gain a basic understanding for further searching Discussion The number of search results varied considerably by topic (see Table 1): in BASE the highest number of hits was for Bisphenol A (913/868*) and the lowest for (CF3)3BCO (0/0*) in Scirus the highest number of results was for Bisphenol A (248,552/227,331*) and no results were retrieved for (CF3)3BCO (19 when searched as a phrase). (*The second number indicates the results from phrase searching.) Table 1: Total Number of Retrieved Records Per Searched Topic Topics Searched BASE Scirus bisphenol a "bisphenol a" 913 868 249,552 227,331 fluorine fluorosulphate "fluorine fluorosulphate" 4 1 516 22 haber bosch process "haber bosch process" 33 7 6,435 1,870 (CF3)3BCO "(CF3)3BCO" 0 0 0 19 gerhard ertl "gerhard ertl" 121 14 7,306 3,955 Some possible reasons for the high difference in results among the topics and between the search engines could be a recent surge of interest in a topic, as was the case with bisphenol A; the length or brevity of research on a topic; the indexed content such as in the case of Scirus which overwhelmingly indexes web sites; the problem of chemical nomenclature when searching for molecular formula or chemical names; of proprietary documents that are not accessible for indexing such as journals published by the American Chemical Society which, if accessible to search engines, would have provided more results for searches by molecular formula and chemical name. The greater results in Scirus could have been due, to some extent, to full-text indexing which currently is much lower in BASE. There were very few duplications in the results between these two search engines for the first ten records viewed. Only one journal article, an Elsevier publication, was found both in BASE and Scirus, but in BASE the full-text author's version (edited for publication) was freely available; there was no link to that author version in Scirus. Even though this is only one example, it supports the observations by Ford and O'Hara (2007) that Scirus links to its own publications without providing links to other versions. When the records from BASE were searched in Scirus, a number of them were found, but not among the first ten items. Some of these documents had multiple instances of the same title in Scirus. There was only one record from BASE in Scirus, but not among the ten records viewed. None of the results were direct links to Wikipedia. The types of documents retrieved were journal articles, conference papers, theses, patents, and web sites. Web sites included anything that did not fall into the other categories (university, government, institutional, commercial sites). In BASE, the total number of hits for all topics within the first ten records viewed was 34 (28*) (see Table 2a). Most of the documents were journal articles (15/11*), followed by web sites (10/9*) and then theses (8/8*). There was 1 (0*) conference paper, but no patent. Of the total hits 18 (21*) were relevant and of these 14 (13*) were freely available as full-text (see Table 3) Table 2a: Types of Documents in BASE Topics Searched Journal Articles Conf. Papers Theses Patents Web Sites bisphenol a Total: 10 "bisphenol a" Total: 10 6 8 1 1 0 0 3 1 fluorine fluorosulphate Total: 4 "fluorine fluorosulphate" Total: 1 1 3 1 0 0 0 0 haber bosch process Total: 10 "haber bosch process" Total: 7   4 2 0 0 6 5 (CF3)3BCO Total: 0 "(CF3)3BCO" Total: 0         gerhard ertl Total: 10 "gerhard ertl" Total: 10 8 1 – conf. paper 3 0 4 0 0 1 3 TOTAL 15 + 1 conf. paper (11) 8 (8) 0 (0) 10 (9) Table 2b: Types of Documents in Scirus Topics Searched Journal Articles Conf. Papers Theses Patents web sites bisphenol a Total: 10 "bisphenol a" Total: 10 2 2     8 8 fluorine fluorosulfate Total: 10 "fluorine fluorosulfate" Total: 10 8 4   2 2 4 haber bosch process Total : 10 "haber bosch process" Total: 10 1 3 2 0 0 9 5 (CF3)3BCO Total: 0 "(CF3)3BCO" Total: 10 9 1     gerhard ertl Total: 10 "gerhard ertl" Total: 10 2 2     8 8 TOTAL 13 (20) 0 (3) 2 (2) 25 (25) Table 3: Number of Relevant Documents with Free Full-text Access Topics Searched BASE Documents Viewed BASE Relevant Documents BASE Free Full Text Access Scirus Documents Viewed Scirus Relevant Documents Scirus Free Full Text Access bisphenol a "bisphenol a" 10 10 7 9 6 7 10 10 7 7 5 5 fluorine fluorosulphate "fluorine fluorosulphate" 4 1 2 1 2 1 10 10 4 6 0 1 haber bosch process "haber bosch process" 10 7 1 1 1 1 10 10 7 7 6 4 (CF3)3BCO "(CF3)3BCO" 0 0 0 0 0 0 0 10 0 3 0 0 gerhard ertl "gerhard ertl" 10 10 8 4 5 4 10 10 3 5 2 3 TOTAL 34 (28) 18 (15) 14 (13) 40 (50) 21 (28) 13 (13) In Scirus, the total number of records for all topics within the first ten records viewed was 40 (50*) (see Table 2b). Most of them were web sites (25/25*), followed by journal articles (13/20*), patents (2/2*) and theses (0/3*). Of the total 40 (50*) records 21 (28*) were relevant and of these 13 (13*) had free full-text access (see Table 3). The journals that were relevant but were not freely accessible were Elsevier publications. One of the web sites, although relevant, showed obvious bias by listing only positive references for bisphenol A. Among the web sites there were no Wikipedia articles, but one site derived information from it. One other web site, not included in the relevant count, offered essays for sale. Conclusion Scirus is a search engine that with regard to the sheer number of sources it indexes is superior to BASE. But one has to remember that the purpose of BASE is different, namely it indexes primarily material that is OAI compliant. As the number of those items grows, so will its coverage. It is, however, better at excluding sites that are of a purely commercial nature. If one compares the documents that tend to provide quality information (e.g., journals, patents, theses), Scirus and BASE appear to be not that dissimilar in coverage (27.8 million for journal/proceedings in Scirus (Jasco 2008) versus 19.7 million in BASE, the latter though includes all forms of information). Although this investigation has limitations with regard to the number of results viewed to draw definite conclusions about the usefulness of Scirus and BASE for chemical information, some general observation can nevertheless be made. Scirus can be useful as a first step in searching a topic in chemistry for first-year students, especially because results can easily be limited to scholarly sources from the results page following a simple search It can also prove useful if access to a subscription database, such as SciFinder with limited seats, is not available It is easy to use It offers a link resolver to determine if a university holds material It is an Elsevier product, hence other relevant journals are not available or only if they are indexed in open-access databases, but without full-text access Although it claims to be a search engine for scientific information, some records are of dubious value It features advertisements on the results page Even though not all topics retrieved results, BASE indexes primarily academic content For some chemistry topics it can be more useful than for others BASE contains more theses than Scirus that would be more relevant to upper level or graduate students It is easy to use, but lacks link resolver, although many of the documents are free It is a multidisciplinary search engine, not specific to science. Even though students will find relevant information in either of these databases, learning about the advanced features can certainly improve results for some topics, as the phrase search showed. It would therefore be not amiss to include these search engines in library instruction sessions. Acknowledgement I would like to thank my colleagues, Janice Adlington, Barbara McDonald, Jennifer McKinnell and Andrea McLellan, for their valuable suggestions and their encouragement in pursuing this research. I would also like to acknowledge the helpful and prompt assistance I received from Sebastian Wolf, University of Bielefeld, to my inquiries about BASE. References BASE: Bielefeld Academic Search Engine. 2009. About BASE. Content Sources. [Online]. Available: http://base.ub.uni-bielefeld.de/en/about_sources_date_dn.php?menu=2 [Accessed: May 12, 2009]. Chakravarty, R. & Randhawa, S. 2006. Academic search engines: librarian's friend, researcher's delight. In: Kumar, K.M. and Rath, P., editors. Digital Preservation, Management and Access to Information in the Twenty First Century: Proceedings: Papers presented at the 4th Convention PLANNER-2006; Ahmedabad: INFLIBNET Centre p. 496-517. [Online]. Available: {http://eprints.rclis.org/15534/1/Academic_Search_Engines.pdf} [Accessed: April 3, 2009]. Connor, E. 2005. Searching for science: a descriptive comparison of CiteSeer, FirstGov for Science, and Scirus. Journal of Electronic Resources in Medical Libraries 2(2): 35-47. De Rosa, C. 2005. Perceptions of libraries and information resources: a report to the OCLC membership. Dublin, OH: OCLC Online Computer Library Center. Fitzpatrick, R.B. 2008. Scirus: for scientific information only. Journal of Electronic Resources in Medical Libraries 5(3): 275-285. Ford, L. & O'Hara, L. H. 2007. It's all academic: Google Scholar, Scirus, and Windows Live Academic Search. Journal of Library Administration 46(3/4): 43-52. Jasco, P. 2008. Scirus. Peter's Digital Reference Shelf (June). [Online]. Available: {http://www.gale.cengage.com/reference/peter/200806/scirus.htm} [Accessed: April 3, 2009]. Lohmeyer, D. 2005. Einsatz intelligenter Suchmaschinentechnologie bei der Erschließung des wissenschaftlichen Internets am Beispiel von BASE (Bielefeld Academic Search Engine) unter Berücksichtigung alternativer Suchmaschinen (Master's thesis, Fachhochschule Köln). McKiernan, G. 2005. E-profile: Scirus: for scientific information only. Library Hi Tech News (3): 18-25. Notess, G.R. 2005. Scholarly web searching: Google Scholar and Scirus. Online (July/August): 39-41. [Online]. Available: http://onlineinc.com/online/jul05/OnTheNet.shtml [Accessed: April 3, 2009]. Pieper, D. & Summann, F. 2006. Bielefeld academic search engine (BASE): An end-user oriented institutional repository search service. Library Hi Tech 24(4), 614-619. [Online]. Available: {http://eprints.rclis.org/12746/} [Accessed: April 3, 2009]. Pieper, D. & Wolf, S. 2009. Wissenschaftliche Dokumente in Suchmaschinen. In D. Lewandowski (Ed.), Handbuch Internet-Suchmaschinen. Heidelberg: Aka Verlag p. 356-374. [Online]. Available: (http://eprints.rclis.org/15558/1/wissenschaftliche_Dokumente.pdf} [Accessed: April 3, 2009]. Pieper, D., & Wolf, S. 2007. BASE - Eine Suchmaschine für OAI-Quellen und wissenschaftliche Webseiten. Information, Wissenschaft & Praxis 58(3): 179-182. [Online]. Available: http://base.ub.uni-bielefeld.de/download/base_iwp200703.pdf [Accessed: April 3, 2009]. Rather, R. A., Lone, F. A., & Shah, G. J. 2008. Overlap in web search results : a study of five search engines. Library Philosophy and Practice. [Online]. Available: http://www.webpages.uidaho.edu/~mbolin/rather-lone-shah.pdf [Accessed: July 20, 2009]. Scirus. 2009. About Scirus. The Range of Scientific Content Scirus Covers. [Online]. Available: {http://www.scirus.com/srsapp/aboutus/#range} [Accessed: May 12, 2009]. Summann, F. & Wolf, S. 2006. Suchmaschinentechnologie und wissenschaftliche Suchumgebung. Online-Mitteilungen, 86(July): 3-18. Summann, F. & Wolf, S. 2005. BASE - Suchmaschinentechnologie für digitale Bibliotheken. Information, Wissenschaft & Praxis 56(1): 51-57. [Online]. Available: http://base.ub.uni-bielefeld.de/download/base_iwp200501.pdf [Accessed: April 3, 2009]. Tompson, S.R. 2007. Scirus – for scientific information. Issues in Science and Technology Librarianship (Winter). [Online]. Available: http://www.istl.org/07-winter/electronic3.html [Accessed: February 23, 2009]. Previous Contents Next