Stocking Your GIS Data Library Issues in Science and Technology Librarianship Winter 1999 DOI:10.5062/F4NP22F9 URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. Stocking Your GIS Data Library Jennifer Stone Geographic Information Systems Librarian Map Collection and Cartographic Information Services University of Washington jnstone@u.washington.edu Abstract So, now that you've got the machines and the software, where does the data come from? Data discovery and acquisition can be the most time-consuming part of GIS projects, whether hunting down and purchasing already existing data or creating your own. There are also the issues of documentation and metadata to consider before making an acquisition. A clear understanding of your user group is necessary to know what, exactly, to stock your library with. By using a combination of data from local, state, and federal government sources, plus data created locally and produced by vendors, your collection can be rounded out to serve a diverse user base. The University of Washington will be used as an example. The article will also look at some of the collection development literature concerning both traditional and digital formats. Introduction It is commonly stated in the library literature that use of geographic information systems in libraries is experiencing strong growth. There are a few libraries that have been using GIS or collecting digital geospatial data for several years, which many of us aspire to emulate. There is also an increasing number of librarians who have been thrown into the "GIS fire," so to speak: those who have purchased or been given the machinery and software to offer GIS services in the library, and must begin offering services. The acquisition of hardware and software is the easier part of establishing GIS services in your library, however. Figuring out what data to collect for your users can be daunting. There is an overwhelming amount of GIS data available, with more made available almost daily. This article presents a series of issues to consider when determining how and what to collect for your library. The best place to start is with traditional collection development policies for map libraries, or libraries collecting cartographic materials. Typically, digital materials should reflect the traditional collection. At the University of Washington, the Map Collection is global, with emphasis on the United States, the Pacific Northwest, Washington State, Western Washington, and the Puget Sound area. Our digital collection reflects the same geographic emphasis. (Highlights from the collection development policy for the University of Washington Map Collection are online at {http://www.lib.washington.edu/Maps/UsingCollection/about.html}). In the third edition of Map Librarianship: An Introduction, Mary Larsgaard talks about materials that can be easily obtained in both digital and traditional formats: reference and thematic maps of the world, maps of continents and nations, topographic maps of the world, world atlases, state atlases, aerial photos, monographs and serials; as well as outline and basemaps (Larsgaard 1998). When establishing GIS operations, this kind of basic information will provide a solid foundation. The User Community To figure out which of the basic materials mentioned above are a good starting point, it is helpful to be familiar with the user community. The librarian in charge of GIS needs to understand who in the institution is using GIS, what they are trying to accomplish, and what services they need help with. Larsgaard talks about paying attention to users' needs and use patterns to help determine what data to collect. Being active in local user groups and talking to individual departments within the institution will provide an initial understanding of the library's patrons. Larsgaard recommends writing and updating an acquisition policy or a collection development policy, "based on the type of clientele (which is dictated by the kind of institution), the information needs of the users, and miscellaneous matters such as consortial agreements and the proximity or lack thereof to other collections" (Larsgaard 1998). In some institutions, there is also going to be a wide range of users, from beginners (which may include yourself) to those who have helped write some of the existing programs on the market. Arc/Info, after all -- one of the industry's key software packages -- is 18 years old, and the digital data collection will have to support these expert users as well as the novices. At the University of Washington, the librarians working with GIS data have for years been members of the University of Washington Consortium on Geographic Information and Analysis (UWCGIA). This group includes GIS users from all over the campus, and meets on a regular basis to discuss various research applications of GIS. Non-campus users also come talk to the group, and there is a listserv available, which has members from campus and around the Seattle area. The listserv is an excellent way to disseminate information to a widespread group of people -- it is used for meeting announcements, to poll campus users, to post jobs both on campus and off, and serves as a technical support outlet for hardware- and software-related questions as well as for those seeking data. The librarians have also been involved with individual classes and departments, helping develop class assignments and giving workshops on finding and using digital geospatial data. Input from these workshops has helped the library understand users' activities and needs. Data Discovery Once it is understood who is using GIS and what they are trying to accomplish, it will become clearer where to develop the collection. Melissa Lamont lists several sources for GIS data, including the United States government, state and local governments, researchers on campus, local GIS firms, utility companies, real estate firms, and the Internet (Lamont 1997). U.S. government agencies produce a wealth of information, much of which is freely available over the Internet, that is standard for use in GIS projects. Local and regional governments offer the benefit of larger-scale datasets for the immediate area. Local and regional agencies will be the ones most likely to produce data showing features such as bus stops or bike paths, as opposed to smaller-scale, statewide features such as national parks. For those in an academic environment, it is possible that various campus departments have been using geographic information systems for years -- these departments and researchers are likely to have a wealth of information, much of which is able to be shared. These are also the people who are well connected in the local GIS community; talking with them frequently will help keep the GIS staff abreast of local GIS activity. Another good source of data and GIS information is through partnerships with other agencies and groups. Carolyn Argentati states that "Partnerships and grants linking libraries with governmental and commercial organizations have offered opportunities for collaboration on service models and the development of large data collections and new access tools. These extensive collections of digital spatial data are being organized and made available via the Internet frequently with a regional or local focus that is relevant to a library's primary constituency." She continues: "One strong partnership often leads to others and to additional contacts with people and organizations engaged or interested in GIS" (Argentati 1997). Being involved in the local community is a good way to start in on these partnerships. Cost, Delivery, Format Regardless of the source, cost is always an issue. Federal government sources tend to provide data less expensively than commercial sources, but this is not always the case. Different Internet sites offer data for free, for a small fee, or a large fee -- following the same structure as non-electronic acquisition. If it applies, request the data as an educational institution and inquire about other possible discounts. Negotiate to receive the data for free -- the University of Washington has had success with local and regional governments providing the Libraries with data for free or very low cost. The UW Libraries have also acquired data by providing blank recording media in return. Another issue to consider is delivery: can it be sent via post, or do you have to download it? If you must download it, be prepared to encounter different compression formats, space limitations, and perhaps a long download time. Having a directory already specified, knowing space limitations and keeping drives uncluttered will make it easier to retrieve data from the web. Larsgaard (Larsgaard 1997) provides a checklist of issues to consider when acquiring data, regardless of format. She suggests considering that the information: Is from a database not already available through the library's consortial agreements; Has acceptable licensing and use restrictions; Has print counterparts; Has reasonable customer support; Comes with "clear and thorough" documentation; Provides a trial version; Includes a tutorial; Is relatively easy to use, with menus, prompts, on-screen contextual help, error messages that actually give the user a clue as to how to get out of some mess, examples of operation; Provides easy installation; and Offers easy printing and downloading. Format is another factor to consider. ESRI and MapInfo formats have become somewhat de facto industry standards for data, and the U.S. government has created the Spatial Data Transfer Standard (SDTS). Many software packages can read other formats or perform conversions, however, increasing the amount of data available. Make sure to understand the formats and their differences, and how the importing functions work before choosing to acquire an unfamiliar format. Hardware and Software Issues A library's hardware and software issues will depend on the institution. The institution may have an infrastructure plan already in place, or the library may be involved in establishing such a plan. Site licenses, too, must be considered. For expensive software packages such as GIS, a large institution may find it much more economical to have a site-wide license for the software, rather than individual installations in many departments. What a library purchases will also depend on the staff's experience and education, as well as that of the user base. Even if a library decides on one major GIS package, there will no doubt be multiple pieces of software that the library staff will need to be familiar with. The big GIS packages often work in conjunction with other software; ESRI has a free data viewer that many users are likely to want to use; and each electronic atlas, gazetteer or mapping software will be structured differently. One or many of the library's staff will have to become familiar with these software packages, at least familiar enough to be able to walk a user through setup and the help files. For the frequently used or older (perhaps DOS-based) packages, additional user guides may have to be written. The University of Washington Map Collection has hundreds of CD databases in-house, with many other CDs in other library branches, and an abundance of data offered on the web. There is no way one person can know everything in the collection. Luckily, standards are emerging, both from the U.S. government, and from commercial, proprietary sources. These standards include everything from file formats (SDTS, mentioned above); metadata (Federal Geographic Data Committee); and similar software interfaces (ArcView and MapInfo look very similar, and the next version of Arc/Info will be highly based on graphical user interfaces, as opposed to the traditional command-line interface it is famous and infamous for). This is another reason that written user guides are important, so that the institution's GIS knowledge is easily shared. The focus of a library, however, is not on hardware and software, but data provision. "These infrastructure issues are secondary, however, to the even larger and more important responsibilities of collection, organization, and dissemination of geospatial data. ... Beyond hardware and software issues, any management discussion must address collection, describing, and accessing spatial data" (Lamont 1997). License Agreements and Usage Restrictions The issue that can stop an acquisition in its tracks is license agreements and usage restrictions. Many products, including stand-alone software and individual data, have very strict usage rules set up by the publisher. Some examples of the variety of agreements from the University of Washington's collection include: The data can be freely distributed to the general public; The data are restricted to UW patrons only (faculty, staff and students); The data are available to the general public, but can only be loaded on one machine in the Libraries; and The data can be used by anyone, but only UW patrons may take the data out of the Library; general public patrons can only print maps made with the data. The University of Washington checks license agreements before purchasing to make sure the product will be usable by the widest number of users. In some cases the university has been able to negotiate a separate license -- in some cases that negotiation involved months of legal wrangling over an official agreement, and in other cases the Libraries has promised that we will refer all non-UW patrons back to the data provider. To enforce the agreements the Map Collection has click-through agreements on its web site, and hands out paper agreements to patrons coming in to the library. If the dataset is from a local government, for example, then updating and currency become factors in the acquisition decision. The library may be fine with a one-time purchase, or may want to receive quarterly updates. Local datasets may also involve privacy issues, especially when dealing with data from assessor's offices, but privacy is something that should be ironed out with the license or usage agreement. The University of Washington has had good luck with organizations being willing to add themselves on to our existing agreements. For example, UW provides a standard data use agreement (available at {http://www.lib.washington.edu/maps/datause.html}) -- several providers have provided their data with the understanding that this agreement will be given to users, rather than negotiating a separate agreement. Documentation and Metadata Good descriptions of data (as well as software and hardware) are essential to serving as an effective data provider. Patrons will want to know where the data came from, how it was created, its lineage, when it was last updated, and by whom -- all of this information and more is necessary to understand the bias and error inherent in a dataset. The federal government has recognized the need for this information, and a national standard for digital datasets has been created, information on which is available at http://www.fgdc.gov/. This standard, although somewhat daunting at over 200 fields, is the most complete standard for data description established -- and it has begun to be adopted by other countries as well. The government has also created the National Spatial Data Infrastructure to share this metadata and data with one another (http://www.fgdc.gov/nsdi/nsdi.html). Nodes have been set up all over the globe to host metadata (and in some cases, the data itself), and the nodes are searchable. The University of Washington worked with the Washington State Geographic Information Council to establish one of these nodes for the State of Washington -- for information, see http://wa-node.gis.washington.edu/. If metadata and documentation aren't available for a dataset, the data has lost a great deal of value -- much like using an unattributed quote in an article suggests laziness or falsehood, lack of data description suggests error. If metadata or documentation does not exist for a dataset of interest, the library staff must consider whether they want to spend the time creating the metadata, pursuing metadata from the source, or deciding not to purchase the data at all. Conclusion There are several important factors to consider when stocking your GIS data library. Library staff in charge of GIS need to know about basic geographic information, details about the user community, issues about data discovery, cost, delivery and format, hardware and software, license agreements and usage restrictions, and documentation and metadata. Considering each of these issues will help library staff make the best GIS data acquisition decisions for the library. References: Argentati, Carolyn D. 1997. Expanding Horizons for GIS Services in Academic Libraries. Journal of Academic Librarianship V23, No6. (Special issue on GIS in libraries) Larsgaard, Mary Lynette. 1998. Map Librarianship: An Introduction. Libraries Unlimited Inc., Englewood, Colorado. Lamont, Melissa. 1997. Managing Geospatial Data and Services. Journal of Academic Librarianship V23, No6. We welcome your comments about this article.