ACRL News Issue (B) of College & Research Libraries 160 /C&RL News ■ March 1998 C o l l e g e & R e s e a r c h L i b r a r i e s news Guaranteed hits How to make your library’s Web site stand out in Web search engines by Jeffrey Beall C reating and publishing a site on the World Wide Web is only the first step in making your library information available to the world at large: if you want to make sure it gets used, you have to publicize it. Publicizing in the context of the Web means ensuring that your site gets adequately in­ dexed in the appropriate search engines. {Search engines are Web sites with search platforms that allow users to retrieve lists of links to Web sites matching their search re­ quests.) Getting the search engines to in­ dex your site is a process that requires pa­ tience, but the indexing can pay high divi­ dends if it results in having your site be­ come a highly relevant “hit” for information- seekers looking for materials on the subject of your Web site. If you create a new Web site and make no attempts to publicize it on the Web, there is, admittedly, still a fair chance that it will become indexed eventually by at least a few of the search engines that index the Web. This happens because, one way or another, people are going to find out about your site and make links to it. For instance, search engines may have the ability to find and index every site on a given server. Some search engines crawl the Web. This means that they start on a single Web site, index it, and follow all the links on the page and index them as well, and so on. This process is also referred to as spidering. When a search engine indexes a site, it downloads all the textual material from it and excludes all the HTML coding. So what the search engine sees at each site is a body of lan­ guage material, or simply text. However, relying on Web crawling to get your site indexed in the search engines is not an efficient means of getting your site included. It is possible that few other sites will link to yours, or the spidering process can be slow or inefficient. The best way to get your site indexed is to be proactive about it: in short, to do it yourself. Include me in The most successful way to get your Web site included in the search engines is to visit their home pages, look for the “Add URL” button (or equivalent), click on it, and fol­ low the instructions. Usually the process is very simple; in fact, for some engines all you have to do is enter the URL or URLs you wish to be indexed. A few sites may ask for more information, such as your e- mail address and some keywords to describe the site. Some may even offer you a set of classification terms, which you can select to describe your site. Smaller search engines can come and go, but there are seven major engines that are permanent and comprehensive, and when you begin to publicize your site you should begin with these engines. These major en­ About the author Jeffrey Beall is senior cataloger at Widener Library at Harvard University; e-mail: jbeall@fas.harvard.edu mailto:jbeall@fas.harvard.edu C&RL News March 1998/161 ■ gines include AltaVista, Excite, HotBot, Infoseek, Lycos, O p e n Text, and WebCrawler. For a complete list of search engines on the Web, look at one of these two sites: http://www. isleuth .com/webs. html or http://www.yahoo.com/Computers_ a n d _ I n t e r n e t / I n t e r n e t / World_Wide_Web/ Searching_the_Web/ Search_Engines/. For a price … A number of Web-based companies have sprung up that, for a fee, will take the infor­ mation you provide, visit the “Add URL” pages of each search engine, and get your site indexed for you. Some even purport to do it, on a limited basis, for free. One major disadvantage of using such services is that they are mainly set up to serve Web-based businesses, the primary source of their rev­ enue, and therefore a library might spend money for something that it can do more efficiently and more accurately itself. It is probably much better for Web site creators to publicize themselves— what looks like a legitimate Web-promotion company (based on its Web appearance) can turn out liter­ ally to be two guys with a Mac and good graphic skills in a garage somewhere. In d e x in g by d e sig n There are a number of things you can do in terms of the design of your Web site that can increase the likelihood of its being in­ dexed accurately. First, don’t include vital information exclusively in graphics, because they may not get “read” by the search en­ gines. Put the most vital information— the name of your library or department, for ex­ ample— in a larger-than-average font at the top of the Web site. Some search engines determine relevance by comparing the us­ ers’ search queries with the size and posi­ tion of the information on a Web page. Use the D u b lin Core Second, incorporate the elements of the Dublin Core metadata set into your site. This is a block of data in a standard form that provides information about your site in a way that is recognized by the search en­ gines. Several sites exist on the Web that will create this vital HTML information for you— all you have to do is fill out a form on a Web page, submit it, and then copy/paste the HTML data the site returns to you di­ rectly into your Web site. By having a conventional set of metadata describing your Web site, you will increase the likelihood of its being accurately indexed and retrieved by the search engines. More­ over, the Dublin Core allows you to control how the search engines describe your site in their search results pages. To create an abstract of Web sites, many search engines merely copy the first 25 words or so of the text of your Web document. By filling out a brief description and including it within your Dublin Core metadata, you can control what the abstract of your site says. Because the Dublin Core standard has been promoted by the library community, using it means you are setting a good example and helping promote a standard for metadata. For Dublin Core metadata generators (or tem­ plates) see the following sites: http://www.ukoln. ac.uk/metadata/dcdot/ and h t t p : / / w w w . u b .l u .s e / m e t a d a t a / DC_creator.html. C a ta lo g it! Another way for a library to promote its Web site is to catalog it. This can be done exclu­ sively in the library’s online catalog, but better yet, each library should catalog its homepage in at least one of the bibliographic utilities. A catalog record may increase awareness of the site, and other libraries may be more willing to include the bibliographic record in their local OPACs if someone has already cataloged it. Some libraries may wish that certain Web documents not be included in Web search engines. There is a way to prevent your site from being included, and it is done by including some data in the meta tag of a HTML document. Simply include the following data, called the robots ex­ clusion tag, in the head portion of your HTML document: ‹META NAME=“ROBOTS” CONTENT =“NOINDEX, NOFOLLOW”› For more information on the robot ex­ clusion tag, see http://info.webcrawler.com/ mak/projects/robots/exclusion.html#meta. http://www http://www.yahoo.com/Computers_ http://www.ukoln http://www.ub.lu.se/metadata/ http://info.webcrawler.com/ 162/C&RL News March 1998 ■ The presence of the robots exclusion tag means the search engines will not include the document that contains the tag among items returned by searches done on the search engines. Note that some search en­ gines may be less sophisticated and may not be programmed to allow for this feature. A cce n tu a te w h a t's sp ecial Is there some special topical aspect of your site that may allow for its inclusion in a spe­ cialized search engine? Because the general search engines have gotten so big that they are often unreliable, many smaller, special­ ized search engines have sprung up. So, if your library deals with a special topic, say architecture, you may want to look for spe­ cialized search engines on that topic and get your site indexed in them. Perhaps the largest collection of search engines— specialized and general— is lo­ cated on the Web page called the “Internet Sleuth.” This site has a neatly classed col­ lection of hundreds of Web search engines. If the site you wish to promote has some special subject emphasis, you can go to the Internet Sleuth and find all the engines that correspond to your site’s topic. Then you can visit each site and look for the “Add URL” button, and follow each site’s instructions for getting indexed. The Internet Sleuth is located at: h ttp :// ww w.isleuth.com/. Be p a tie n t an d tim e it rig h t It is important to note that not all Web search engines index instantly. Some may take one or two weeks or more to include a site. Often this time lag information is stated when you get the confirmation screen after submitting your site’s URL. Also, it is better to submit your sites to the search engines in the morn­ ing, when the Web is less crowded. URL submission sites have a tendency to be crowded, so accessing them in the morning will be more time saving. Generally it is not possible to do mul­ tiple URL submissions via mega-search en­ gines. Mega-search engines (also known as meta-search engines or parallel search en­ gines) are Web sites that search two or more Web search engines simultaneously and then in one way or another collate and return the search results to the user. Although one can search several search engines at a time, one cannot use this method to submit URLs for multiple index­ ing. Moreover, do not be confused by the expression “Link to us,” a statement often seen on various Web sites. This means that the owners of the site want you to include a link to their site from within your site. This is not a means of publicizing your site. Take s h o rtc u ts w h e n p o ssib le Finally, there are several Web sites that bring together all of the “Add URL” pages of the major search engines. A good one is located at: http:// www. tiac. net/users/seeker/ searchenginesub. html and the corresponding Yahoo! page is lo­ cated at: http:// www. yahoo, com/Computers_ and_Internet/Internet/World_Wide_ Web/Announœment_Services/Indices/. By using these sites, you can gain easy and generally quick and direct access to pre­ cisely the pages you need to submit your URL to the search engines. T ry it y o u r s e lf After you have submitted the URL of your Web page to the various search engines and after you wait some time for them to get included in the engines’ indexes, you should search them to make sure your site is in­ deed included. A good way to do this is to search some of the less common words from your site in the search engines’ main search area. Using less common words will create a smaller retrieval set and make the retriev­ als easier to analyze. Alternatively, enter the title of your Web site in the search box, and see how highly ranked it is among the retrieval set. If it is not at the top, you may want to redesign your site to ensure that Web surfers, includ­ ing your local library’s users, have access to it. The whole point of putting together a Web page is to provide access to informa­ tion: you “build it so they will come.” This effort can be lost if no one knows that the information is there for the taking. So take the next logical step and use the aforemen­ tioned techniques to ensure others know it is built. ■ http://www.isleuth.com/