A Comparison of Updating Frequency Between Web of Science and Current Contents Connect Previous   Contents   Next Issues in Science and Technology Librarianship Winter 2004 DOI:10.5062/F40C4SQB Database Reviews and Reports A Comparison of Updating Frequency Between Web of Science and Current Contents Connect Nancy J. Butkovich Head, Physical Sciences Library The Pennsylvania State University njb2@psu.edu Helen F. Smith Agricultural Sciences Librarian The Pennsylvania State University hfs1@psu.edu Claire E. Hoffman Head Librarian, Abington College Library The Pennsylvania State University ceh8@psu.edu Abstract The Libraries at the Pennsylvania State University subscribe to the online databases Web of Science and Current Contents Connect. Concern was expressed regarding the great similarity in coverage between them. A comparison of title coverage found that Web of Science was more inclusive than Current Contents Connect across all disciplines. When updating frequency was compared, new science and social science journal issues appeared in both databases the same week approximately three quarters of the time. In the arts and humanities this is true only about half the time but these data are not as conclusive due to the small sample size. Each database has unique features. Web of Science has superior title coverage, while Current Contents Connect updates faster about 25% of the time. Unexpected significant problems were noted with updates to Current Contents Connect regarding timing of the updates and the definition of a "current week." The relative importance of the advantages and disadvantages of the two databases will vary depending on institutional needs. Introduction Traditionally, distinctions were made between "indexing and abstracting" sources and "current awareness" publications. The former provided a wealth of access points to the intellectual content of journals, but this came at the expense of speed, since the indexing to the content of any particular article often did not appear for many months after the article was published (Bottle 1979). The advent of electronic publication has resulted in a blurring of the roles of these two types of indexes. Rapid indexing and publishing now appear to be standard for the "indexing and abstracting" sources as well as the "current awareness" publications. The Institute for Scientific Information (ISI) in Philadelphia, PA, produces publications of both types. Their flagship print publication, Science Citation Index (SCI), falls into the first category. This index, along with its two sister publications, Social Science Citation Index (SSCI) and Arts and Humanities Citation Index (AHCI), uses what ISI calls Permuterm Subject Indexing. Keywords within an article title are combined with other keywords in the same title in a rudimentary form of Boolean logic. This allows the user to be more precise in identifying relevant articles in a subject search. Corporate and individual author searching are also available. What makes these publications unique, is that they also index the references cited at the end of the source papers. As is typical of the indexing publications, these three sacrifice speed of publication for the extra features. The paper AHCI comes out twice a year, SSCI three times a year, and SCI six times a year. ISI's best-known current awareness publications are a septet known collectively as Current Contents (CC). Each part of this set is devoted to a particular group of subjects: Life Sciences; Agriculture, Biology, and Environmental Sciences; Physical, Chemical and Earth Sciences; Clinical Medicine; Engineering, Computing, and Technology; Social and Behavioral Sciences; and Arts and Humanities. These publications are largely reproductions of journals tables of contents. They have minimal author and title indexing. The science and technology sections are published weekly; other sections are published at least every other week. In the case of the science sections ISI claims to have a two-week lag time between the time they receive a journal issue and the information being published in a CC issue (Institute for Scientific Information 2000). Current Contents Connect, the electronic version of the print septet, is updated daily; Web of Science, the electronic incarnation of the three citation indexes, is updated every week. This is comparable in frequency to many print current awareness products, including Current Contents. Although separate subscriptions are available to each of the citation indexes and the seven Current Contents editions, the Pennsylvania State University has subscribed to all of them via the Web of Science (WOS) and Current Contents Connect (CCC) services (Table 1). Table 1. Correspondence of sections between Web of Science and Current Contents Connect. Web of Science Current Contents Connect Science Citation Index Agriculture, Biology & Environmental Sciences Clinical Medicine Engineering, Computing & Technology Life Sciences Physical Chemical & Earth Sciences Social Sciences Citation Index Social and Behavioral Sciences Arts & Humanities Citation Index Arts and Humanities Anecdotal information suggested that the two databases were more obviously similar to each other than their print predecessors were. Title coverage was compared, and Table 2 shows the results of this comparison. With the exception of one arts and humanities title, all the titles in CCC were also indexed in WOS. The opposite was not true in that an average of 10 percent of the titles indexed in WOS were not indexed in CCC. Some titles were indexed in more than one section, causing the total title number for 'All Sections" to be less than the sum of the titles from each section. Table 2. Title coverage in Web of Science and Current Contents Connect. Percentages have been rounded off to the nearest whole number. Description All Sections Sciences Social Sci. Arts & Hum. Total titles (number) 8356 5829 1740 1139 Titles in both WOS & CCC 90% 86% 92% 98% Titles in WOS only 10% 15% 8% 2% Titles in CCC only <0.1% 0% 0% <0.1% Searching and interface features aside, the other area of interest is the updating frequency of the databases. The purpose of this study was to compare the updating frequency of Current Contents Connect and Web of Science. Methods In order to compare the time of indexing for specific titles between the products, random samples were taken from the title lists in proportion to the number of titles in each subject section. The numbers of titles needed were determined using the following formula: as described by Yamane (1973). In this formula, n represents the total sample size, N is the size of the total population, and e represents the rate of error, which we chose to be 5 percent. This was the set of titles chosen by the random method. Several problems arose with the titles chosen by this random method: Some titles were not current; they had changed names, ceased publication, or merged into other titles. Particular entries could not be separated due to the abbreviated forms of some titles. This was particularly true in the case of titles that had multiple parts. In order to compare the two databases, the titles used had to be in both databases, and some of the titles on the random lists were not. Although some extra titles had been selected, there were not enough for the arts and humanities or the science sections. Additional titles were added from a list of journals most frequently cited by authors at Penn State-University Park. These titles were used in order from most to less frequently cited, compiled by averaging the number of citations to each journal for the years 1997-1999, according to data from each of the three Citation Indexes. This resulted in a set of titles chosen via the citation method. The social sciences list did not experience this problem, and at the end of the study, we actually had more titles than we needed. In order to retain the correct subject proportions, a few titles were randomly eliminated from the list in order to reduce it to the correct size. These titles were chosen for elimination using a random number table (Beyer 1987). The sample size for each method of choosing titles (random or citation) is listed in Table 3. Table 3. Sample sizes for the study. Description Random Sample Size Citation Sample Size Total Sample Size Number of science titles 219 37 256 Number of social sciences titles 76 0 76 Number of arts & humanities titles 47 3 50 The study was conducted for a total of ten weeks between 28 July 2000 and 22 September 2000. Each Friday all the titles in the study were searched in both databases in order to determine whether or not any new issues of the title had been added to the database during the previous week. This process continued for the next eight weeks. The tenth week was a "wrap-up" week. If an issue had been added to one database but not to the other, then that title was searched. Results Some journals did not publish an issue during the study period. Data regarding these are shown in Table 4. Although the science group had the largest number of titles that did not update, the arts and humanities group had the highest percentage, followed very closely by the social sciences. This severely restricted the number of issues for the updating analysis in these two groups. In fact the arts and humanities group ended up with only about half the minimum desirable number of issues for the updating analysis. Table 4. Titles not updated during study period. Description Random Titles Not Updated Citation Titles Not Updated Total Not Updated   # % # % # % Number of science titles 48 22 0 0 48 19 Number of social sciences titles 33 43 0 0 33 43 Number of arts & humanities titles 27   2   30   Update Comparisons: The percentages for the sciences and the social sciences were similar. Both showed almost three quarters of the issues appearing in both databases within the same calendar week (Tables 5 and 6). Essentially all of the remaining titles appeared in Current Contents Connect between one and two calendar weeks before they appeared in Web of Science. The percentages for the arts and humanities showed a much higher level of variability, with just over half of the issues appearing in both databases in the same week (Table 7). Sixty percent of the journals in the arts and humanities did not publish an issue during the study period, so the sample for this area is much smaller than for the sciences or social sciences. Because of this the results for the arts and humanities are not conclusive. Table 5. Update comparisons for science Description Random Titles Updated Citation Titles Updated Total Titles Updated   # % # % # % Number usable issues 300 58 214 42 514 100 Number issues updated same time 217 72 162 76 379 74 Number issues CCC updated first 82 27 51 24 133 26 Number of issues WOS updated first 1 <1 1 <1 2 <1 Table 6. Update comparisons for social sciences. Description Random Titles Updated Citation Titles Updated Total Titles Updated   # % # % # % Number usable issues 55 100 0 0 55 100 Number issues updated same time 40 73 0 0 40 73 Number issues CCC updated first 15 27 0 0 15 27 Number of issues WOS updated first 0 0 0 0 0 0 Table 7. Update comparisons for arts and humanities. Description Random Titles Updated Citation Titles Updated Total Titles Updated   # % # % # % Number usable issues 26 96 1 4 27 100 Number issues updated same time 14 54 0 0 14 52 Number issues CCC updated first 11 42 1 100 12 44 Number of issues WOS updated first 1 4 0 0 1 4 Definition of Current Week: In the course of this project, a significant and disturbing fact was noted regarding Current Contents Connect's definition of a current week. CCC allows users to limit their searches to selected date spans. Since it has traditionally been a print publication that appeared weekly it is logical to assume that one of the limit periods in the electronic version would be a week's worth of data. CCC has a limit labeled "current week", which according to the internal database help "includes journal issues and Current Book Contents for the current week. The span of dates given in parentheses defines the current week. Because Current Contents data are updated daily, this date span changes daily." However the implication of the phrase "current week" to a user is that it represents a seven day period. It was found that the "Current Week" could be anywhere from one to eight days. The date ranges defining "current week" were recorded when the searches were done. These are shown in the Table 8. Table 8. Current week definitions in Current Contents Connect by day. Monday Tuesday Wednesday Thursday Friday Saturday         28 July 27th-27th 29 July 27th-29th 31 July 27th-29th 1 August 27th-31st 2 August 27th-1st 3 August 27th-3rd 4 August 3rd-3rd 5 August 7 August 3rd-4th 8 August 3rd-4th 9 August 3rd-8th 10 August 3rd-10th 11 August 10th-10th 12 August 14 August 10th-11th 15 August 10th-14th 16 August 10th-15th 17 August 10th-15th 18 August 17th-17th 19 August 21 August 17th-18th 22 August 17th-21st 23 August 17th-22nd 24 August 17th-24th 25 August 24th-24th 26 August 28 August 24th-25th 29 August 24th-28th 30 August 24th-30th 31 August 31st-31st 1 September 31st-1st 2 September 4 September HOLIDAY 5 September 31st-1st 6 September 31st-6th 7 September 31st-6th 8 September 7th-7th 9 September 11 September 7th-8th 12 September 7th-11th 13 September 7th-12th 14 September 7th-12th 15 September 7:21am: 7th-14th 11:48am:14th-14th 16 September 18 September 14th-15th 19 September 14th-18th 20 September 7:37am: 14th-18th 11:25am: 14th-18th 11:26am: 14th-20th 21 September 14th-21st 22 September 7:30am: 14th-21st 9:00am: 21st-21st 23 September 25 September 21st-22nd 26 September 21st-25th 27 September 21st-25th 28 September 21st-28th 29 September 21st-28th 30 September There are numerous days in which the database was not updated (16 Aug., 24 Aug., 6 Sept., 13 Sept., 26 Sept., 28 Sept.). There are also several days in which the "Current Week" consisted of one day. To further complicate the situation there were several instances in which it appeared that the database was actually updated during the working day (Eastern Time) rather than at night (15 Sept., 20 Sept., 22 Sept.). There does not appear to be any day of the week in which one could consistently expect to obtain the previous seven days of data. For example, on Thursdays you could retrieve anywhere from eight days of data (24 August) or one day of data (31 August). Although the study was conducted in 2000, an examination of the time spans indicated for current weeks during September 2002, indicate that the situation still exists. The database is still apparently updated during the working day (Eastern Time), and the time period covered by a "current week" can be as little as one day or as much as seven. If a scholar regularly runs a search each week, the coverage of the material retrieved might show significant gaps, depending on the day of the week and the time of day that the search was conducted. Unlike CCC, Web of Science is consistently updated once a week. No matter how early the checks were run on Fridays during the study period, Web of Science was already showing its new update. Conclusion The online Web of Science and Current Contents Connect are more obviously similar in terms of the indexing updates than their print counterparts had been. Clearly their unique functions still exist and are reflected in the interface and searching features of the online products. The print Current Contents sections were designed for the researcher wanting to browse current issues of specific journals in their fields and consequently Current Contents Connect allows relatively easy browsing of subject areas and specific journal title table of contents. Is this feature necessary when users can now get journal contents emailed directly to them from many journal web sites and other internet resources? While there is an email notification feature in CCC, there is a restriction on the number of alerts allowed. The print citation indexes catered to scholars wanting to do subject or citation searching and logically, the Web of Science allows for the unique access to articles via the references cited by them. In terms of title coverage, the Web of Science product is significantly superior. CCC does have some advantage over WOS with regard to the frequency of updating. A major concern with Current Contents Connect is the way in which "current week" is defined. Because of the apparently erratic nature of the updates to this database, a scholar could potentially miss references to very important literature. It remains up to each individual institution to determine if the differences in the databases are important enough to subscribe to each product. References Beyer, W.H., ed. 1987. CRC Standard Mathematical Tables. 28th ed. Boca Raton, FL: CRC Press. Bottle, R.T., ed. 1979. Use of Chemical Literature. 3rd ed. London: Butterworths. Institute for Scientific Information. 2000. Current Contents: Physical, Chemical & Earth Sciences. 40(3): 1. Yamane, T. 1973. Statistics, an Introductory Analysis. 3rd ed. New York: Harper and Row. Acknowledgements Soma Nag, graduate assistant in the Life Sciences Library, Penn State, for her assistance in comparing journal title lists. Linda Musser, Head, Earth & Mineral Sciences Library, Penn State, for her comments and suggestions on the manuscript. Previous   Contents   Next