386 An Assessment of the Completeness of Scholarly Information on the Internet Chuanfu Chen, Qiong Tang, Xuan Huang, Zhiqiang Wu, Haiying Hua, Yuan Yu, and Song Chen Chuanfu Chen is Professor at the School of Information Management, Wuhan University, P.R. China; e- mail: cfchen@whu.edu.cn. Qiong Tang, Haiying Hua, Yuan Yu, and Song Chen are doctoral candidates at the School of Information Management, Wuhan University; e-mail: tangqiong01@163.com, hrice77@sina. com, yuyuan_1978@126.com, and songchen_cs@hotmail.com, respectively. Xuan Huang, PhD, at Shen- zhen Institute of Standards and Technology; e-mail: yuccaer2000@163.com. Zhiqiang Wu is an Associate Professor at the School of Information Management, PhD, Wuhan University; e-mail: wuzhiqiang518@ tom.com. This research was supported by Specialized Research Fund for the Doctoral Program of Higher Education and Fund for Fast Sharing of Science Paper in Net Era by CSTD, Ministry of Education of China. In this paper, the authors propose an assessment framework of the completeness of scholarly information on the Internet, and then obtain a list of Web pages by searching for 32 key terms in eight subjects through Google, Yahoo, and Altavista. The 2,814 sample pages are examined according to the evaluation framework. The results reveal that the over- all mean score of the completeness of online scholarly information was 2.92; only 11 percent of samples provide complete scholarly information. There is a statistically significant difference (P<0.05) in the complete- ness of those Web pages with various domain names, resource types, and subjects. In conclusion, the completeness of scholarly information on the Internet is unsatisfactory and needs to be improved immediately. Furthermore, the evaluation framework and its application developed herein could be a useful instrument for librarians, researchers, students, and the public to select Internet resources. ince information is increas- ingly being transmitted over the Internet, freely available Internet resources can provide unique content. In the past years, the Internet has quickly become an impor- tant supplementary source for academic research or even an alternative research tool. An investigation conducted by the Online Computer Library Center (OCLC) reported that approximately 84 percent of respondents (respondents were over 3,300 people in six countries) use search engines to begin an information search.1 According to Franz Barjak’s survey of more than one thousand scientists from Europe in 2003, the more productive the scientists were, the more they used the Internet for information retrieval, social communication, and dissemination.2 Moreover, research from the “Pew Inter- net and American Life Project” reported that around 73 percent of college students reported using the Internet for research more than the campus libraries.3 Other recent research has also shown students’ Assessment of the Completeness of Scholarly Information on the Internet 387 and teachers’ preference for using the In- ternet for academic research over campus libraries.4 However, when compared to tra- ditional printed resources, the lack of oversight for the quality of informa- tion on the Internet causes the quality therein to rapidly decline as the quantity increases. Researchers waste too much time when seeking reliable and valid information on the Internet.5 Moreover, many undergraduates are challenged by research tasks, especially by selecting and evaluating information; they use unevalu- ated Internet resources that fall short of what instructors expect students to use.6 Therefore, evaluating the quality of this type of information turns out to be both particularly challenging and important. It is urgent for academic libraries to select credible and worthwhile Internet resources to supplement collections to attract users’ attention. Among the existing studies of the qual- ity of Internet resources, completeness (also called coverage, comprehensiveness, integrity, or scope) is always one of the crucial elements of quality assessment. Jim Kapoun proposed coverage as one of the five criteria for Web evaluation in 1998 that are still extensively used by libraries and scholars.7 Gunther Eysenbach et al. provided an overview of 79 full articles assessing the quality of health informa- tion on the Web.8 Shirlee-Ann Knight and Janice Burn summarized 12 widely accepted information quality frameworks collated from the last decade of informa- tion science research.9 They found that completeness is one of the most frequent- ly used quality criteria. Additionally, ac- cording to Mohan Jyoti Dutta’s study, “the sources of information were judged to be more credible when the information was complete;” and respondents “used the completeness cue to evaluate the cred- ibility of the source of information in the target article.”10 Furthermore, academi- cians also want to retain the completeness (integrity) of their ope- access research papers. The UK JISC-funded (Joint In- formation Systems Committee) RoMEO (Rights Metadata for Open-archiving) project found that the largest group of respondents (67%) wanted reproductions of their open-access research papers to be exact replicas of the original.11 This restriction was one of two principle re- strictions that academicians wished to have placed on their open-access works. Therefore, completeness is an important element for assessing the quality and credibility of scholarly information on the Web, both for Internet consumers and academic authors. In this study, we defined completeness as the presence of the necessary and es- sential information for the issue in ques- tion.12 Scholarly information on the Inter- net primarily refers to online academic information that could be freely accessed by users with no strings attached. Literature Review A few academic libraries, research institu- tions, and scholars have put forth several criteria to evaluate the completeness of information on the Internet since the late 1990s. Some of these criteria have been applied to evaluate and select Internet information by Internet users, which can be divided into five categories: (1) the first criterion measures whether the repro- duced information is altered or abridged. Mary Ann Fitzgerald observed that the removal of information from its context was one of the most common types of misinformation (false information) on the Internet.13 In particular, if online scholarly information only provides an abstract, summary, or bibliography of the full publication, or omits some data, graph- ics, facts and so on, it is incomplete.14 (2) The second criterion examines whether information contains all of the required elements of an argument or topic. Mo- han Jyoti Dutta suggested that complete information “presents all four elements of argument quality-claim, ground, warrant, and backing; explains how (the process/ theory); presents method; contains scien- tific words and explains them.”15 Yang W. 388 College & Research Libraries July 2009 Lee et al. proposed that complete informa- tion on the Internet should include all nec- essary value, and have sufficient breadth and depth for our task.16 (3) The third criterion examines whether the informa- tion covers the topic extensively. Com- pleteness in this sense is evaluated along axes of its time periods, region, as well as its comprehensiveness of argument.17 (4) The fourth criterion measures the completeness of additional information, such as complete and various references, introductions of information sources, or related topics.18 (5) The fifth and final criterion evaluates whether access to the information is limited to fees, browser technology, or software requirements. If access to the information is restricted or limited, it is obvious that people are unable to obtain integral information.19 However, few scholars or libraries apply this criterion to assess information com- pleteness. Several researchers have ap- plied an information quality framework to assess Internet information, and most have found that, although online informa- tion has proliferated at a remarkable rate, the number of Web sites providing com- plete information accounts for only a very small portion of the total. For example, in the medical information area, Muham- mad Walji observed that the scientific societies did not always provide complete information about the possible adverse events related to hormone replacement therapy. One hundred forty-five sites (97%) had omitted information.20 Nicole T. Ansani et al. concluded that only 25 percent of Web sites provided complete information in their investigation about the quality of arthritis information on the Internet.21 Heather Yeo et al. found that thyroid cancer surgery Web sites were incomplete; these Web sites failed to address important aspects of all levels of perioperative care, and less than 10 percent of them included information about the surgical procedures.22 Completeness is one of the critical standards of information quality; incom- plete information has a negative effect on the accuracy and credibility of informa- tion.23 This inaccuracy may direct wrong decisions and bring negative influences to research projects. We find that most of the previous empirical studies have primarily focused on medical informa- tion. There are few surveys investigating the general status of the completeness of scholarly information on the Internet. The criteria of completeness proposed by some studies are not comprehensive, and several scholars ignore the varying priorities of the criteria. In this paper, we undertake an assessment of the complete- ness of online scholarly information and establish a generally applicable assess- ment framework. Based on this, we apply the framework to evaluate and describe the current state of the completeness of online scholarly information. The assess- ment framework of completeness and its implementations can help academic librarians, researchers, students, and others better select scholarly information on the Internet. Assessment Framework of Completeness for Online Scholarly Information Criteria Selection Based on the understanding of the concept of completeness, we examined the criteria used most frequently by several research institutions, libraries, and academicians, such as the American Public Health Association, UC Berkeley Library, Sacramento State University Library, and so on. We assumed that the information on the Internet is accessible by reasonable time, cost, and effort. We then developed a set of criteria to assess the completeness of scholarly information on the Internet, including: (A) Information breadth. Information must have sufficient breadth for the task or topic. There are three indicators: First, the information covers the topic exten- sively in terms of time (A1); the resource provides us with the past, present, and future development of the selected topic. Second, the information widely covers Assessment of the Completeness of Scholarly Information on the Internet 389 the topic with respect to geography (A2); the resource contains discussions about the topic locally, regionally, nationally, or internationally. Finally, the resource must comprehensively cover the topic in terms of a range of viewpoints (A3); it must represent the pros and cons con- nected to the topic. (B) Expatiation of topic. This criterion requires that all critical elements are in- cluded when discussing the given subject. For example, if the source is a scientific thesis, it should include a contention, evidence, and argumentation. Eysenbach et al. found that, among 79 articles that assessed the quality of health information on the Web, “most authors calculated a proportion of priori-defined elements covered by a website or reported the pro- portion of websites that mentioned all key elements” when discussing information completeness.24 (C) Accurate reproduction of informa- tion. Reproduced information should be presented in accordance with the original source. For example, the content of an article, authors’ affiliation, references, or copyright information should not be altered or omitted in the process of reprint or digitization. (D) Description of method. Informa- tion describes the methodology in detail (for example, how the research was conducted, including research design, subjects, procedures, and replicability). Rational and accurate methodology is of great importance to scientific studies. Connie A. Korpan et al. observed that uni- versity students’ needs for completeness of scientific information were primarily in the three areas of method, agent/theory, and data/statistics.25 (E) Provision of reasonable additional information. References and introduction to information sources can help Internet consumers understand the topic further and obtain more related information. This criterion includes two indicators: first, providing various types of references (E1); and second, giving enough informa- tion about those sources (E2). Assignation of Weights to the Criteria The respective weights of these five cri- teria were decided through the Analytic Hierarchy Process (AHP is a subjective method for analyzing qualitative criteria to generate a weight of the operating units), and the priority of each sub-crite- rion was estimated by a five-item Likert Scale, which includes six steps:26 1. Modeling the decision problem as a hierarchy (see figure 1). 2. Developing a judgment matrix by pair-wise comparisons. We put five criteria in a pair-wise matrix, such that a criterion will be compared with every other criterion to assess their relative priority. 3. Conducting a questionnaire sur- vey. The survey was divided into three sections: Part A collected demographic Figure 1 Hierarchy for evaluation of the Completeness of Scholarly information on the internet Completeness Information breadth (A) Accurate reproduction of information (C) Description of method (D) Information covers the topic extensively in terms of time ( A1) Level1: Goal Level 2: Criteria Level 3: Sub-criteria Provision of reasonable additional information (E) Expatiation of topic (B) Information covers the topic extensively in terms of geography (A2) Information covers the topic extensively in terms of a range of viewpoints ( A3) Providing various types of references (E1) Giving enough information about the sources (E2) 390 College & Research Libraries July 2009 information. Part B collected respondents’ judgments about the priority of the five criteria of completeness. We invited the respondents to rank these criteria in the order of importance by completing the evaluation matrix included in the ques- tionnaire. Part C collected respondents’ judgments about the priority of five sub- criteria of completeness, using a five-item Likert Scale (see appendix 1). From January 2007 to March 2007, we sent e-mail questionnaires to scholars who had published academic articles in selected twelve journals, most of which are Science Citation Index journals). Jour- nals in our sample pool included Nature (Nature Publishing Group), The Journal of Biochemistry (Oxford University Press), Scientometrics (Akadémiai Kiadó and Springer Science+Business Media B.V.), Acta Physica Sinica (Chinese Physical So- ciety), and so on. The publication years of these journals that we selected were 2004 and 2006. The scholars’ contact informa- tion was found in these journals. We sent five-hundred questionnaires in all, and reclaimed seventy-nine. Six question- naires didn’t pass the consistency test, so there were seventy-three effective ques- tionnaires collected. The disciplines of these seventy-three responders included Physics, Chemistry, Biology, Medicine and Health, Computer and Information Science, Engineering, Mathematics, and Geography. Included in respondents were 19 professors, 29 associate professors, and 25 doctoral candidates. (4) Synthesizing priorities and mea- suring of consistency. After the matrix was developed and all pair-wise com- parisons were obtained, the maximum eigenvalue ( maxλ ) and eigenvectors for the matrix was calculated using Matlab software. By normalizing these eigenvec- tors, we obtain the weights of the criteria. We utilized a consistency index (CI) and consistency ratio (CR) to verify the con- sistency of the comparison matrix. CI and CR are defined as follows: CI= ( maxλ -n)/(n-1) (1) CR = CI/RI (2) where RI denotes the average CI over numerous random entries of the same order reciprocal matrices, if CR≤0.1, the es- timate is accepted; otherwise, a new com- parison matrix is solicited until CR≤0.1. (5) Group decision making. The value of CR directly reflects the scoring methods of respondents. The smaller the value of CR, the higher the quality of the judgments of the respondents. We can cal- culate the respondent weight coefficients TABLe 1 Assessment Framework of Completeness for Online Scholarly information Criterion (weights) Sub-criterion (weights) A Information breadth (0.14) A1 Information covers the topic extensively in terms of time (0.32) A2 Information covers the topic extensively in terms of geography (0.31) A3 Information covers the topic extensively in terms of a range of viewpoints (0.37) B Expatiation of topic (0.25) C Accurate reproduction of information (0.23) D Description of method (0.18) E Provision of reasonable additional information (0.20) E1 Providing various types of references (0.46) E2 Giving enough information about the sources (0.54) Assessment of the Completeness of Scholarly Information on the Internet 391 based on the value of CR. The formula is as follows: P = 1 0.1 (0.1 ) i m i i CR RI = − −∑ (M=73) (3) where CRi refers to the consistency ratio of the ith criterion, RIi to the cor- responding average random index, and m to the number of respondents. After attaining the results of the weight coef- ficients of all the respondents, we obtain the matrix: P= [P1, P2,…P73 ] T. The global priority weights can consti- tute the matrix H= [W1, W2,…,W73], such that the final results of the criteria can be calculated by the following formula: W=H×P= [W1,W2,…W73]×[P1,P2,…P73 ] T (4) (6) Calculating the weights of the sub- criteria. The weights of the sub-criteria were calculated by the method of arith- metic average, according to the five-item Likert Scale in the questionnaire filled out by the respondents. The evaluation framework of com- pleteness is shown in table 1. Methods Search Methodology To make a comprehensive examination on the completeness of online scholarly information, we chose 32 search terms from Mathematics and Physics (n=3), Earth Science (n=3), Material and En- gineering Science (n=3), Computer and Information Science (n=3), Medicine and Health (n=5), Chemistry and Biotechnol- ogy (n=5), Humanities (n=3), and Social Science (n=7). Examples included “Tami- flu and bird flu,” “Genetically modified food and safety,” “Neural selection and human being,” and “Teflon and health.” These topics were selected in accordance with the following principles: (1) they are general knowledge connected closely to people’s lives; (2) they are compre- hensible to ordinary understanding or knowledge; and (3) they are accessible in peer-reviewed periodicals, textbooks, or encyclopedias. A keyword search was conducted from January 4, 2007, to Janu- ary 11, 2007. Since the vast majority of Internet consumers use search engines to begin an information search,27 we chose three search engines Google, Yahoo, and Altavista to conduct the keyword search; these three search engines represent the most common options for Internet surfing by general consumers According to our literature review, we found that most of the studies evalu- ating Internet information chose the first several (for example, 20, 30, or 50) results from search engines when select- ing the samples.28 Additionally, because most consumers of the Internet rarely examine beyond 50 search results,29 we chose the first 50 results from the three search engines, with a ceiling of 4,800 Web pages chosen. Among these samples, 1,986 (41.38%) Web pages were discarded because they were either duplicate Web pages, dead links, or irrelevant. This pro- cess resulted in a sample of 2,814 unique Web pages for analysis in this study. Evaluation of Web Pages Fifteen doctoral students from Wuhan University participated in the evaluation from February 20, 2007, to May 30, 2007. The disciplinary areas of these students included Chemistry and Biotechnology, Mathematics and Physics, Medicine, Com- puter and Information Science, Material and Engineering Science, Sociology, Hu- manities, and Earth Science. To maintain the objectivity of the results, each Web page was assessed by two reviewers with background knowledge in the subject. For example, 408 Web pages about Chemistry and Biotechnology key terms were evalu- ated by two doctoral students majoring in Chemistry and Biotechnology. In this study, an assessment of com- pleteness was based on relative docu- ments and information we found in peer-reviewed periodicals or reports, textbooks, and Encyclopedia Britannica Online. Furthermore, reviewers were given a few standards: 1. To evaluate whether information is sufficiently broad, it is necessary to exam- 392 College & Research Libraries July 2009 ine whether the information widely cov- ers the topic in terms of time, geography, and viewpoints. For example, consider the topic “Genetically modified food and safety.” If the Internet information con- veys different attitudes and views towars it in different periods, it can be recognized as a resource that satisfies the criterion of “information breadth.” 2. Information completeness is a rela- tively subjective measure, which depends on context, specific domain, or subject.30 Even within a topic, a wide range of scores can be obtained for the criterion “expiation of topic” due to different types of sources. 3. As for the criterion “accurate repro- duction of information,” compliance is determined by the extent that reproduced information matches the original informa- tion (for example, an article first published in a journal or another Web page). 4. As for a research paper, if it ex- plains how the research was conducted (that is, the design, procedures, sam- pling, reliability, validity, and replicabil- ity are discussed),31 it would get a high score on the “description of method” criterion. 5. If the information provides suf- ficient references and information about sources (such as which institution owns it, links to background, and introduction to related knowledge), then the document conforms to the criterion “provision of reasonable additional information.” Each criterion was scored on a five- point scale: 5=completely satisfy, 4=mostly satisfy, 3=basically satisfy, 2=partially satisfy, 1=failure to satisfy. Furthermore, if a criterion was not applicable, it was categorized as “N/A.” The mean value of completeness (Z) was calculated using weighted average arithmetic: Z=∑Zi, (5) Zi=∑Wi∑WijXij, (6) where Wi represents the weight of the ith criterion, and Wij represents the weight of the jth attribute under the ith criterion (if j=0, this means that there are no such attributes). Xij is the value assigned to the jth attribute under the ith criterion (that is, i=1,2,…,5; j=0,1,…, 3; and X=1, 2, …, 5). The total score for the scale can range from 1 to 5. Analysis was conducted in SPSS12.0 using two-tailed P values, and a value of 0.05 was considered statistically sig- nificant. Results The internal consistency of the assessment framework we developed was Cronbach’s α=0.89 (note that a reliability coefficient of 0.70 or higher is considered “acceptable” in most Social Science r e se a r ch si t u a t i on s) , which proved its con- sistency and reliability. T h e o ve r a l l m e a n score of completeness of the 2,814 samples came to 2.92 (taking N/A as a missing value that was replaced by the mean of certain criterion). Therefore, according to their mean scores, the completeness of online scholarly information was classified into the categories of excellent (Z=5), good (4≤Z<5), fair Figure 2 rating of Completeness of Scholarly information on the internet 11% 45% 33% 8% 3% very weak weak fair good excellent Assessment of the Completeness of Scholarly Information on the Internet 393 (3≤Z<4), weak (2≤Z<3), and very weak (1≤Z<2). The results are detailed in fig- ure 2. Of the 2,814 samples, 11 percent were rated as “very weak” (scored less than 2), while the majority of Web pages were rated as “weak” (45%). Over half of the samples did not even satisfy the basically requirement of completeness. One third of samples received mean scores between 3 and 4 (rated as “fair”), while only 11 per- cent were scored as “excellent” or “good.” Samples receiving a score of 4 or 5 were counted in each criterion. Figure 3 shows the details. Comparison of the Completeness of Online Scholarly Information by Domain Name The domain names of the samples were classified into six types to discern if there were significant differences among on- line scholarly information with different domains. Analysis of this variance was conducted using the Kruskal-Wallis Test for K Independent Samples. Since the P value was less than 0.05, there was a sta- tistically significant difference among the mean scores of various domains. From figure 4, we can see that samples with .org/.int and .gov domain names per- formed comparatively well on complete- ness in contrast to the others, with a mean score greater than 3 (3.05 and 3.04, respectively), and consequently rated as “fair.” Additionally, more Web pages with these two domain names rated as “excellent” or “good,”: that is, approximately 15 per- cent of samples with .org/.int and 11 percent of samples with .gov, respectively (see table 2). Web pages with .edu/.ac and “other” domains (such as .fr, .ca, .mil, and .us) scored below the Web pages with .org/.int and .gov domains. These pages were close to the level of “fair” (the mean values were near 3 points). Figure 3 Performances on each Criterion of Completeness 0% 5% 10% 15% 20% 25% 30% 35% 40% A1 A2 A3 B C D E1 E2 percentage Figure 4 Mean Values of the Completeness of Online Scholarly information by Domain Name com .net/.info else .edu/ .ac gov .org/ .int Domain 2.80 2.90 3.00 3.10 M ea n of C om pl et en es s 394 College & Research Libraries July 2009 The scores of Web pages with .com, .net, and .info domain names were 2.78, 2.78, and 2.85, respectively, which were lower than the overall mean score (2.92). Table 2 shows that more than 60 percent of them were ranked as “very weak” or “weak.” Comparison of the Completeness of Online Scholarly Information by Resource Type In the analysis of variance, the P value was close to 0, that is, less than the level of significance, proving that the complete- ness of online scholarly information with different types of resources had a statisti- cally significant difference. Mean values of the completeness of online scholarly information by re- source type appear in figure 5. The information complete- ness related to open access literature scored significantly higher than the other types of resources (mean value was 3.24, P<0.05); one-quarter of these resources were ranked as “excellent” or “good” (see table 3). We divided open ac- cess literature into open access journals, institutional reposito- ries, and subject repositories. Among the samples examined, open access journals exhibited the best completeness (getting mean scores of 3.44). Informa- tion from NPO Web sites come second to the open access lit- erature with a mean score of 3.01. Virtual community re- sources received the lowest completeness score (2.52). Table 3 shows that approxi- mately 80 percent of vir- tual community resources were rated as “very weak” or “weak.” By classifying these resources into four kinds discussion groups, BBSs, newsgroups, and wikis we found their com- pleteness varied wildly. The mean value of wiki pages was 3.01, discussion groups 2.62, newsgroups 2.55, and BBSs 2.25. Blogs received a mean value of 2.63. Comparison of the Completeness of Online Scholarly Information by Subject Adapting the same method discussed above to analyze the variance of online scholarly information by subject, the P value was less than 0.05, proving that there was a significant difference among the completeness of information between different subjects. Figure 6 directly de- picts these differences. Figure 5 Mean Values of the Completeness of Online Scholarly information by resource Type Note: 1=virtual community resources, 2=blog, 3=portal Web sites, 4=profit organization Web sites, 5=personal Web pages, 6= NPO Web sites, 7=open-access literature 1 2 3 4 5 6 7 Resource Type 2.6000 2.8000 3.0000 3.2000 M ea n of C om pl et en es s TABLe 2 rating of Completeness of Online Scholarly information by Domain Name very weak weak fair good excellent .com (n=1,163) 15% 47% 31% 6% 1% .net/.info (n=91) 12% 56% 20% 8% 4% else (n=170) 9% 44% 36% 7% 4% .edu/.ac (n=491) 7% 44% 39% 8% 2% .gov (n=213) 4% 50% 35% 8% 3% .org/.int (n=686) 9% 42% 34% 10% 5% Assessment of the Completeness of Scholarly Information on the Internet 395 From figure 6, we see that Web pages involving Earth Sciences, Humanities, and Mathematics and Physics received scores distributed in the level “fair” (the mean scores were 3.14, 3.09, and 3.02, respectively). The mean scores of the Web pages involving Computer and Information Science and Social Science were close to one another. Their scores were not appreciably higher than the overall mean value of the entire 2,814 sample set; however, they did not attain the “fair ” level. We b p a g e s a b o u t M e d i c i n e a n d Health, Materials and Engineering, and Chemistry and Biotechnology received even lower scores. Many scholars have assessed the completeness of different subjects related to Medicine and Health; and, therein, their results indicate that they did not perform well with regard to completeness, which matches our findings.32 Table 4 demonstrates that Web pages about the Humani- ties were most frequently rated as “excellent” or “good” (23%). There were more than 60 per- cent of Web pages dedicated to Medicine and Health, Com- puter and Information Science, and Chemistry and Biotechnol- ogy that rated as “very weak” or “weak.” Discussion and Conclusions In this study, we developed a comprehen- sive assessment framework to evaluate the completeness of scholarly information on the Internet, wherein weights of crite- ria were assigned by AHP and a five-item Likert Scale. The 2,814 Web pages relat- ing to eight subjects were evaluated by fifteen reviewers coming from different disciplinary areas. The General Characteristics of the Completeness of Online Scholarly Information Is “Weak” The results of our rating assessment in- dicate that the completeness of scholarly Figure 6 Mean Values of the Completeness of Online Scholarly information by Subject Note: a=Medicine and Health, b=Material and Engineering, c=Chemistry and Biotechnology, d=Social Science, e=Computer and Information Science, f=Mathematics and Physics, g=Humanities, h=Earth Science a b c d e f g h Subject 2.80 2.90 3.00 3.10 M ea n of C om pl et en es s TABLe 3 rating of Completeness of Online Scholarly information by resource Type very weak weak fair good excellent virtual community resources (n=34) 21% 59% 15% 3% 3% blog (n=33) 15% 58% 21% 3% 3% portal Web sites (n=16) 25% 50% 13% 13% 0 profit organization Web sites (n=866) 16% 49% 29% 5% 1% personal Web pages (n=135) 7% 42% 44% 4% 3% NPO Web sites (n=1598) 8% 44% 36% 9% 3% open-access literature (n=132) 7% 37% 31% 20% 5% 396 College & Research Libraries July 2009 information on the Internet is unsatisfac- tory. The overall mean score of complete- ness was 2.92 (n=2,814), which indicated that very few Web pages satisfied the assessment framework we developed. The General Characteristics of the Com- pleteness of Online Scholarly Information is “Weak.” Only 11 percent of samples provided comparatively complete infor- mation. These samples were rated as “ex- cellent” (Z=5) or “good” (4≤Z<5). A total of 33 percent of the sampled sites were distributed at the level of “fair” (3≤Z<4), while 56 percent of samples did not satisfy the basic criteria of completeness (Z<3). We found that more than 70 percent of Web pages lack essential elements of a top- ic. For example, when discussing whether the ozone hole can be repaired, several Web pages only contain the viewpoints of the authors without providing several impor- tant building blocks of argument structure, such as warrant, statistics, or backing. Less than 20 percent of samples com- prehensively covered the given topic in terms of a range of viewpoints. Let us consider “Teflon and health,” for ex- ample. Many Web pages merely provided information concerning the advantages of Teflon, and ignored some claims that perfluorooctanoic acid (PFOA) included in Teflon may have harmful effects on hu- mans. According to our statistics, less than 15 percent of for-profit organization Web sites performed well on this sub-criterion, which was lower than all other types of resources, except personal Web pages. Only 12 percent of scholarly informa- tion published on Web pages was in accordance with original sources. Most Web pages altered or omitted informa- tion when reproducing or digitalizing original sources. The number of samples that satisfied the criteria “giving enough information about the sources” was the least (less than 10%). It is inconvenient for Internet consumers to acquire more background and original information. The description of method is an impor- tant approach for researchers to discover the process of previous studies, and is vital for testing the reliability, validity, and repli- cability of studies. Unfortunately, less than 12 percent of online scholarly information performed well on this sub-criterion. To sum up, considering that complete- ness is one of the core evaluative features of information quality, and that scholarly information on the Internet plays a vital role in scientific research, the current situation of its completeness needs to be improved immediately. The Completeness of Online Scholarly Information of Various Domain Names, Types of Resources, and Subjects Have a Statistically Significant Difference 1. The completeness of Web pages TABLe 4 rating of Completeness of Online Scholarly information by Subject very weak weak fair good excellent Medicine and Health (n=511) 17% 47% 31% 4% 2% Material and engineering (n=259) 11% 45% 35% 7% 2% Chemistry and Biotechnology (n=408) 7% 58% 28% 6% 1% Social Science (n=696) 14% 41% 33% 9% 3% Computer and information Science (n=269) 10% 56% 24% 7% 3% Mathematics and Physics (n=268) 4% 38% 50% 4% 3% Humanities (n=188) 12% 41% 24% 18% 5% earth Science (n=215) 3% 35% 48% 10% 4% Assessment of the Completeness of Scholarly Information on the Internet 397 with .org/.int and .gov domain names performed better than average and satis- fied the basic criteria of the assessment framework. The completeness of samples with .com domain names is worse than the others. Most of the samples with .org/. int domain names belonged to research institutions. Generally speaking, the qual- ity of scholarly information published by research institutions is better than other sources. Many Web pages with .gov do- mains that were examined belonged to governmental Web sites of the United States, the United Kingdom, or Canada These countries have released information policies about government information publication, and pay more attention to information quality. For example, in the United States, Section 515 of Public Law 106-554, known as the Information Quality Act, requires the Office of Management and Budget to promulgate guidance to agencies to ensure the quality, objectivity, utility, and integrity of information dis- seminated by federal agencies.33 2. The completeness of open access literature and NPO Web sites performed well above the average level. Open access literature scored signifi- cantly higher than other types of resourc- es. This result is probably due to the fact that it can be browsed and downloaded free of charge. Additionally, several open access journals are peer-reviewed, and effective institutional and subject reposi- tory quality control measures have been developed. Blog and wiki sites are foundations of Web 2.0, and provide a more convenient platform for people to release informa- tion and communicate with one another. Unfortunately, their completeness was found to be unsatisfactory. To play a more active role in scientific communication, it is absolutely necessary to improve their completeness. Scholarly information provided by portal Web sites also performed weakly. Let us consider Yahoo as an example. Ya- hoo establishes Yahoo! Answers for Inter- net consumers to ask or answer questions. Although many volunteers participate in it actively, most information provided by them is incomplete, and the information credibility is questionable. 3. The completeness of Web pages involving Earth Sciences, Humanities, and Mathematics and Physics performed well above the average level, and satisfied the basic criteria of the assessment framework. We found that 53 percent of the sam- pled sites had .edu/.ac, .org/.int, and .gov domain names. Most of these sites were NPO Web sites and open access literature (69%). These domain names and resource types received high scores for complete- ness. We also found that the completeness of scholarly information involving Medicine and Health was weak. The number of members of the public searching for health information on the Internet is increasing. According to the investigation conducted by the Pew Internet and American Life Project in 2005, 79 percent of Internet users have looked for health information online.34 The current state of completeness for health information on the Internet is not conducive to the public’s search for credible health information. Research Limitations The research in this paper may have some limitations. First, we selected the top 50 ranked results from three search engines. Since many search engines rank better sites first, these search tools could influence the results;35 however, the results obtained by this method of sample selection seem to indicate that the overall completeness of online scholarly information should be worse than that examined in this study. Second, although all the reviewers were trained before the evaluation, we could not exclude the effect of subjective factors. Third, most studies of online information, includ- ing ours, are limited by the constantly changing nature of the Internet. If our study were repeated, the findings might be different. Therefore, it is necessary to conduct further study. 398 College & Research Libraries July 2009 Notes 1. Online Computer Library Center, “Perceptions of Libraries and Information Resources: A Report to the OCLC Membershi,” (Nov. 2005). Available online at http://www.oclc.org/reports/ pdfs/Percept_all.pdf. [Accessed 6 May 2007]. 2. Franz Barjak, “The Role of the Internet in Informal Scholarly Communication,” Journal of the American Society for Information Science and Technology 57, no. 10 (2006): 1350–67. 3. Steve Jones, “The Internet Goes to College: How Students Are Living in the Future with Today’s Technology,” Pew Internet & American Life Project (2002). Available online at/www.pewin- ternet.org/PPF/r/71/report_display.asp. [Accessed 9 January 2008]. 4. Jillian Griffiths and Peter Brophy, “Student Searching Behavior and the Web: Use of Aca- demic Resources and Google,” Library Trends 53, no. 4 (2005): 539–54; Anna M. Van Scoyoc and Caroline Cason, “The Electronic Academic Library: Undergraduate Research Behavior in a Library without Books,” Libraries and the Academy 6, no. 1 (2006): 47–58; Rajeev Kumar, “Internet Use by Teachers and Students in Engineering Colleges of Punjab, Haryana, and Himachal Pradesh States of India: An Analysis,” Electronic Journal of Academic and Special Librarianship 7, no. 1 (2006). Avail- able online at http://southernlibrarianship.icaap.org/content/v07n01/kumar_r01.htm. [Accessed 10 January 2008]. 5. Donna Scheeder, “Information Quality Standards: Navigating the Seas of Misinformatio,” (Sept. 28, 2005). Available online at/www.ifla.org/IV/ifla71/papers/192e-Scheeder.pdf. [Accessed 2 December 2006]. 6. Deborah J. Grimes and Carl H. Boening, “Worries with the Web: A Look at Student Use of Web Resources,” College & Research Libraries 62 (Jan. 2001): 11–22; Alison J. Head, “Beyond Google: How Do Students Conduct Academic Research,” First Monday 12, no. 8 (2007). Available online at http://www.firstmonday.org/issues/issue12_8/head/. [Accessed 9 January 2008]. 7. Jim Kapoun, “Teaching Undergrads Web Evaluation: A Guide for Library Instruction,” College & Research Libraries News 59 (July/Augt 1998): 522–23. 8. Gunther Eysenbach et al., “Empirical Studies Assessing the Quality of Health Information for Consumers on the World Wide Web: A Systematic Review,” The Journal of the American Medical Association 287, no. 20 (2002): 2691–700. 9. Shirlee Ann Knight and Janice Burn, “Developing a Framework for Assessing Information Quality on the World Wide Web,” Information Science Journal 8 (2005): 159–72. 10. Mohan Jyoti Dutta, “The Impact of Internet Information Completeness: The Moderating Role of Web Use Motivation” (PhD diss., University of Minnesota, 2001), 179. 11. Elizabeth Gadd, Charles Oppenheim, and Steve Probets, “ROMEO Studies 2: How Aca- demics Wish to Protect Their Open Access Research Papers,” Journal of Information Science 29, no. 5 (2003): 333–56. 12. Merriam-Webster, Merriam-Webster’s Collegiate Dictionary (11th ed.) (New York: Harper- Collins Publishers, 2003); Harry W. Bruce, “A Cognitive View of the Situational Dynamism of User-Centered Relevance Estimation,” Journal of the American Society for Information Science 45, no.3 (1994): 142–48. 13. Mary Ann Fitzgerald, “Misinformation on the Internet: Applying Evaluation Skills to Online Information,” Emergency Librarian 24, no. 3 (1997): 9–14. 14. Sacramento State University Library, “Evaluating Internet Source,” (Jan. 2003), Available onlinet at http://library.csus.edu/services/inst/ICCS/infocomp/tutorials/module5/general/cover- age/index.htm [Accessed 3 December 2006]; UC Berkeley-Teaching Library Internet Workshops, “Evaluating Web Pages: Techniques to Apply & Questions to Ask,” (Jan. 2007), Available online at http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/Evaluate.htm. [Accessed 3 February 2007]. 15. Dutta, “The Impact of Internet Information Completeness,” 55. 16. Yang W. Lee et al., “AIMQ: A Methodology for Information Quality Assessment,” Informa- tion & Management 40 (2002): 133–46. 17. John Ambre et al., “Criteria for Assessing the Quality of Health Information on the Internet,” (Oct. 1997), Available online at http://hitiweb.mitretek.org/docs/criteria.pd. accessed 1 February 2007]; Sacramento State University Library, “Evaluating Internet Sources.” Available online at http://library.csus.edu/services/inst/ICCS/infocomp/tutorials/module5/general/coverage/index. htm [Accessed 1 February 2007]. 18. Kapoun, “Teaching Undergrads Web Evaluation,” 52–23; Elmer V. Bernstam et al., “Usabil- ity of Quality Measures for Online Health Information: Can Commonly Used Technical Quality Criteria be Reliably Assessed,” International Journal of Medical Informatics 74 (Feb. 2005): 675–83. 19. Kapoun, “Teaching Undergrads Web Evaluation,” 522–23. 20. Muhammad Walji et al., “Efficacy of Quality Criteria to Identify Potentially Harmful Assessment of the Completeness of Scholarly Information on the Internet 399 Information: A Cross-sectional Survey of Complementary and Alternative Medicine Web Sites,, Journal of Medical Internet Research 6, no. 2 (2004). Available online at/www.jmir.org/2004/2/e21. [Accessed 8 March 2008]. 21. Nicole T. Ansani et al., “Quality of Arthritis Information on the Internet,” American Journal of Health-System Pharmacy 62, no. 1 (2005): 1184–89. 22. Heather Yeo et al., “Filling a Void: Thyroid Cancer Surgery Information on the Internet,” World Journal of Surgery 31 (2007): 1185–91. 23. Dutta, “The Impact of Internet Information Completeness,” 179. 24. Eysenbach et al., “Empirical Studies Assessing the Quality of Health Information for Consumers on the World Wide Web,” 2691–700. 25. Connie A. Korpan et al., “Assessing Literacy in Science: Evaluation of Scientific News Briefs,” Science Education 81 (1997): 515–32. 26. Thomas L. Saaty, The Analytic Hierarchy Process (New York: McGraw-Hill Book Companies, 1980). 27. Online Computer Library Center, “Perceptions of Libraries and Information Resources.” Available online at www.oclc.org/reports/pdfs/Percept_all.pdf. [Accessed 9 January 2008]. 28. Elmer V. Bernstam et al., “Instruments to Assess the Quality of Health Information on the World Wide Web: What can Our Patients Actually Use,” International Journal of Medical Informat- ics 74 (2005): 13–19; Athina Tatsioni et al., “Important Drug Safety Information on the Internet: Assessing Its Accuracy and Reliability,” Drug Safety 26, no. 7 (2003): 519–27; Yeo et al., “Filling a Void,” 1185–91. 29. Peter Sacchetti, Peter Zvara, and Mark K. Plante, “The Internet and Patient Education: Resources and Their Reliability: Focus on a Select Urologic Topic,” Urology 53 (1999): 1117–20. 30. Bernstam et al., “Usability of Quality Measures for Online Health Information,” 675–83. 31. Dutta, “The Impact of Internet Information Completeness,” 31. 32. Eysenbach et al., “Empirical Studies Assessing the Quality of Health Information for Consumers on the World Wide Web,” 2691–700. 33. The White House, “Guidelines for Ensuring and Maximizing the Quality, Objectivity, Util- ity, and Integrity of Information Disseminated by Federal Agencies” (Oct. 2001). Available online at http://www.whitehouse.gov/omb/fedreg/reproducible.html. [Accessed 12 January 2008]. 34. Susannah Fox, “Health Information Online” (May 2005). Available online at www.pewin- ternet.org/pdfs/PIP Healthtopics May05.pdf. [Accessed 8 March 2008]. 35. Eysenbach et al., “Empirical Studies Assessing the Quality of Health Information for Consumers on the World Wide Web,” 2691–700. 400 College & Research Libraries July 2009 Appendix 1 Questionnaire for the Measurement of Criteria Weights for Assessing the Completeness of Scholarly Information on the Internet Part A. Demographics 1. Your affiliation is q professor q associate professor q assistant professor q lecturer q doctoral student q else 2. Your discipline is q Chemistry & Biotechnology q Earth Science q Material & Engineering Science q Computer & Info. Science q Medicine & Health Social Science q Humanities q else Part B. Determining the Priority of Five Criteria of Completeness We take the Analytic Hierarchy Process (AHP) as our approach to assign weights of the criteria. The method of marking is as follows: For example: In the matrix below, comparing A in the vertical column with B in the horizontal row, if you think A is slightly more important than B, then put 3 in the intersecting cell; if you think B is slightly more important than A, then put 1/3 in the cell. Likewise, comparing A in the vertical column with C in the horizontal row, if you think C is very strongly more important than A, then put 1/7 in the intersecting cell; if you think A is very strongly more important than the other, then put 7. Scale for Pair-wise Comparison relative Comparison Value Two criteria are of equal importance 1 One criterion is slightly more important than the other 3 One criterion is strongly more important than the other 5 One criterion is very strongly more important than the other 7 One criterion is extremely more important than the other 9 Intermediate values to reflect compromise 2, 4, 6, 8 Note: If one criterion is less important than the other, then get the reciprocal value of the value listed in the table above. A B C A 1 3 1/7 B 1 1/5 C 1 Assessment of the Completeness of Scholarly Information on the Internet 401 Please compare each pair of the five criteria listed below. Part C. Determining the Priority of Two Groups of Sub-criteria of Completeness A Information breadth B Expatiation of topic C Accurate reproduction of information D Description of method E Provision of reasonable additional information A Information breadth 1 B Expatiation of topic 1 C Accurate reproduction of information 1 D Description of method 1 E Provision of reasonable additional information 1 Note: “Expatiation of topic” means that all critical elements are included when discussing the given subject. essential à not at all A Information breadth A1 Information covers the topic extensively in terms of time 5 4 3 2 1 A2 Information covers the topic extensively in terms of geography 5 4 3 2 1 A3 Information covers the topic extensively in terms of a range of viewpoints 5 4 3 2 1 E Provision of reasonable additional information E1 Providing various types of references 5 4 3 2 1 E2 Giving enough information about the sources 5 4 3 2 1