Introducing Database Advisor A new service that will make your research easier by Christy H ightow er, Jennifer Reiswig, and Susan S. Berteaux X ena walks up to a library workstation at 9 p.m. with a new research topic in hand: w aste-w ater reclam ation projec She's familiar with Current Contents, but wonders if there are other databases that might be better for this topic. The reference desk is closed, but there's a link from the lib ra ry 's Web site to the Da­ tabase Advisor (D B A ). She sees that this w ill te ll h e r w hich of the more than 25 scien ce d a ta ­ bases will be b e s t fo r h e r topic, and d e­ cides to try it. She sees an input box for s e a rc h te rm s and a Submit b u tto n . S h e's used to search­ ing the Web, so she types in: +"waste w a­ ter" +reclaim*. But then she remembers that not all search engines work the same way, so she scrolls dow n the Search page to see what options or help are available. First, th e re is an o p tio n to in c lu d e “backfiles” for the databases, she’s mainly interested in current articles, so she do esn ’t ts select backfiles. There is an option to choose how long to wait for results, sh e’s not sure . how long most searches take and leaves the default of one minute. There is a list of sub­ ject categories (e.g., biology, chem istry, mathematics, etc.). She follows a link to see which databases are included in each cat­ egory and d e­ cides to leave the default set­ ting of all cat­ egories. Finally, she comes to help information. A table presents tip s on c o n ­ structing qu e­ ries for DBA. She sees that truncation and B o o le a n o p ­ erators are not s u p p o r t e d , that synonyms s h o u ld b e avoided, and that queries should be kept simple. Noting this, she moves back to the input box and changes her query to water reclamation, and submits the search. The screen changes to “Results” with in­ structions to wait while results come in. The entries begin to appear, and the list changes as each new database is added so that the A b out th e a uthors Christy H ig h to w e r is web co o rd in a to r a t the Science & Engineering Library, University o f California, San Diego, e-mail:; Jennifer Reiswig is electronic services librarian a t the Biomedical Library, University o f California, San Diego, e-mail:; Susan S. Berteaux is instruction coordinator and assistant head o f the Scripps Institution o f Oceanography Library, University o f California, San Diego, e-mail: 4 1 0 / C&RL News ■ June 1998 database with the most “hits” is always at the top. After a minute, a “Results” banner appears at the top of the page, indicating the results are complete. Xena scans the page, and clicks on a link to get help interpreting the results. She learns that the “hits” represent the number of ar­ ticles in that database that would be retrieved by doing a keyword search on her terms. There are small icons indicating w hether each database contains abstracts, full text, and/or full images for articles. There is also a one-line description for each database and a link to a more detailed profile on each database. For Xena’s search, GeoRef came up first, followed by MOFR (Marine, Oceano­ graphic & Freshwater Resources). All 25-plus databases are listed in descending order by the number of hits. At the end of the list of databases are a number of entries with “Time O ut” where th e re s u lts w o u ld be. O n e o f th e s e , Greenwire (covering governmental activi­ ties on the environment), catches her eye and she realizes that if she had chosen a longer time to wait back on the search screen, she might have seen the results from this database. At the very bottom of the Results page she finds a form to Refine the search, featuring a table with her search filled in. She leaves the keywords as they are, de­ cides to include the backfiles, and sets the waiting time to its maximum: five minutes. This time Greenwire shows results, only 11 articles. Interestingly, the “backfiles” of some databases have more hits than the current files, prompting her to consider us­ ing some older material from years when this was a hotter topic. Scanning down the list, she notices some databases she hadn't heard of that might have interesting angles on her topic: BIOSES has over 200 hits, and the ABI/Inform database covering business and management topics has over 150. Based on the list she has, she clicks on GeoRef to start a search of that database. D e v e lo p m e n t p rocess: H ow d id w e g e t here? In 1996, the University of California, San Diego (UCSD) Library was providing online access to over 25 science and engineering databases via Web, Z39.50, and telnet. It was not always obvious to patrons or library staff which of these bibliographic and full-text databases should be used to research a topic. M any d a ta b a s e s also a p p e a r e d to be underutilized. The libraries needed a simple, Web-based tool to quickly advise users on the best science database for their topic. The te a m w as fa m ilia r w ith D IA LO G ’S Diallndex®, which ranks databases based on num ber of hits on terms supplied by the searcher. Science librarians proposed the concept of a Web-based frontend to all the science and engineering bibliographic and full-text databases to which UCSD has remote ac­ cess. A technical team consisting of two stu­ dent programmers and a librarian/technical co o rd in ato r w rote the source code and scripts used to query databases. Initially, 50 hours of student programmer time was bud­ geted for the project. Through October 1997, the programmers logged approximately 250 hours on the DBA project. An interface team of three science librar­ ians (the authors) was established in Febru­ ary 1997 to develop the Web-based user in­ terface, graphics, search strategies, and da­ tabase profiles. Originally, basic and ad­ vanced search options w ere envisioned. Three models with varying degrees of com­ plexity were presented to the UCSD science librarians. Because of an overwhelming pref­ erence for the most basic search tool, the advanced search options were not pursued in the final product. Each librarian on the team provided pro­ files for one-third of the databases. The team also researched and developed search strat­ egies comparable to MFLVYL® System key­ word searches for all databases covered by DBA. Although most Web databases provide basic and advanced search help, in some cases the vendor/provider had to be con­ tacted to provide the precise search strat­ egy that would achieve consistent, com pa­ rable results. In August 1997, interviews were con­ ducted with students and public services staff to obtain suggestions for im proving the functionality of the interface. Interviewees were asked about their level of Web and database experience and their expectations for a Web tool like DBA. Responses pro­ vided guidance for refining help directions, individual database profiles, navigation, and C&RL News ■ June 1 9 9 8 /4 1 1 basic screen elements. The interviews, com­ bined with input from the technical team, reinforced the logical concepts and con­ firmed the usefulness of the product. After some minor improvements, the Da­ tabase Advisor was rolled out to the UCSD community on September 15, 1997. Early user feedback indicates users are indeed discovering the existence of databases that were previously unknown to them. T e ch n ica l in fo rm a tio n The “kernel” of DBA (the script that parses user input, forks off the queries, then dis­ plays the results) is simple and small, thus portable. To run DBA, a campus needs a UNIX computer with Internet connectivity, Perl version 5 (the language DBA was writ­ ten in), and preferably the Apache Web server (which is common in academic settings). There are three types of scripts used to query databases: Web scripts, telnet scripts, and a Z39-50 script. The Z39-50 interface was made using the. client written by Harold Finkbeiner at Stanford University and is our preferred method of accessing databases. All the MELVYL databases are accessed via this Z39.50 script, which is very fast. Each of the databases or database groups (telnet, Z39.50) have separate modules. These modules can be added, modified, and removed at will. This makes DBA easy to customize. DBA is quite fast, with the majority of d a ta b a se s re tu rn in g resu lts w ithin on e minute. (One of our cool fea­ tures is that the user can con­ trol the timeout.) Those que­ ries that take a long time (like Web) have separate processes; th o se th a t are q u ic k (lik e Z39.50) are strung together. We use a small am o unt of JavaScript (just to select and deselect subject categories, and to warn off-campus users that they will not be able to connect to license-restricted d a ta b a se s from the resu lts page). We kept the graphic load to a minimum; the few graph­ ics we used were created in PaintShop Pro. DBA uses the server push feature to reorder the search results so that the databases are listed in descending order of hits. DBA works with all graphical Web browsers, although we did encounter a problem with the way Microsoft’s Internet Explorer version 3 imple­ m en ted serv er push, w hich re q u ired a work-around. Internet Explorer interpreted push to m ean a p p e n d the pages, while N etscape’s brow sers interpreted push to mean replace the pages. The lesson here is not a new one: always test your programs on all browsers and all operating systems. Another lesson is to always include a bud­ get for maintenance; it’s more time-consum­ ing than you expect. In the first month of operation, both Silver Platter and Cambridge Scientific significantly changed their search interfaces, another database changed its un­ derlying search engine, another significantly changed its backfiles, and we subscribed to three new databases that required two new scripts. We also began to include databases with limited numbers of simultaneous users that required a new error message (“Too Many Users”) and increased attention to the way DBA logs out so as not to tie up any ports. C u rre n t s ta tu s an d fu tu re p la n s When the sciences version of DBA went live in September, it was advertised via e-mail announcem ents, more than 50 links from the three science library Web sites (a link to DBA was placed next to each link for a da­ 412 / C&RL News ■ June 1998 tabase included in DBA), bulletin board dis­ plays, and by DBA “business cards” handed out at the reference desk. Currently DBA averages 75 searches a week. We expect that usage to increase, since we recently installed microcomputers with a Web interface in the library. We also collect information about the nature of the searches performed: the keywords and sub­ jects used, w hether the search is refined or not, what Web brow ser was used, etc. This will allow us to perform more detailed analy­ sis of DBA usage patterns in the future. In addition, we link to a feedback form from every page, but after six months of opera­ tion, we have only received a single re­ sponse (it was positive) this way! Work on DBA will proceed on several fronts. Twenty-five out of thirty social sci­ ence and humanities databases have already been added to a new version of DBA. We are pursuing Z39-50 access for the remain­ ing five databases and hope to be able to release the new version in summer 1998. At that time DBA will becom e relevant to ev­ ery student and researcher at UCSD. Second, in response to user input, the concept of “instant gratification” is being pursued, which would take the user directly to the results of their search w hen they d e­ cide on a database to use. Instant gratification was not initially of­ fered b e c a u s e DBA d o e s very g e n e ric searches in each database, and it was felt that the user would be able to get better results by redoing the search using the unique features of each database. If a user obtained 10,000 results from DBA, would those 10,000 items really be useful? How­ ever, in our limited user testing, this was one feature that was repeatedly asked for. This function presents some technical chal­ lenges and may take some time to imple­ ment. In te re ste d in t r y in g t h is y o u r s e lf? UCSD plans to make the DBA source code available to anyone w ho wants it under the terms of the GNU General Public License agreement. For details, contact the authors. At other sites, the UCSD search scripts would have to be modified to send your user names and passwords, and scripts for any databases new to the system w ould have to be writ- … a lw a y s t e s t y o u r p r o g r a m s on a ll b r o w s e r s a n d a ll o p e r a t in g s y s te m s . ten using the existing scripts as models. Adding a new database in a family for which UCSD a lr e a d y h a s s c r ip ts ( s u c h as SilverPlatter or Cambridge Scientific) should require very little new programming. 