id author title date pages extension mime words sentence flesch summary cache txt ital-11251 Bauder, Julia HathiTrust as a Data Source for Researching Early Nineteenth-Century Library Collections 2019-12-16 11 .pdf application/pdf 4798 168 52 BAUDER 15 https://doi.org/10.6017/ital.v38i4.11251 popular and/or widely available until well after their official publication date, and that some prolific authors who contributed hundreds of thousands of words to the Google Books corpus were never as widely purchased and read as authors who wrote a single, short, best-selling work.3 Although using books held by a set of libraries at a given time as the corpus has its own problems of unrepresentativeness—particularly, for long-established libraries, the fact that the books on the shelf at a given time represent not only works of interest to current users but also those of interest to users from decades past—triangulating this data with that provided by the Google Books Ngram data would at least give some sense of whether and where these different corpora disagree.4 Creating corpora using books that are known to have been owned by a given library at a given point in time is potentially feasible because digitized records of the books in several hundred nineteenth-century library collections are available in the form of scanned book catalogs: a book or pamphlet listing all of the books available in a particular library. By the end of the nineteenth century, book catalogs had largely been replaced by the card catalog system that remained in use through most of the twentieth century. cache/ital-11251.pdf txt/ital-11251.txt