201 How Large Is the “Public Domain”? A Comparative Analysis of Ringer’s 1961 Copyright Renewal Study and HathiTrust CRMS Data John P. Wilkin John P. Wilkin is Juanita J. and Robert E. Simpson Dean of Libraries and University Librarian at the Uni- versity of Illinois at Urbana-Champaign; e-mail: jpwilkin@illinois.edu. ©2017 John P. Wilkin, Attribution- NonCommercial (http://creativecommons.org/licenses/by-nc/4.0/) CC BY-NC. The 1961 Copyright Office study on renewals, authored by Barbara Ring- er, has cast an outsized influence on discussions of the U.S. 1923–1963 public domain. As more concrete data emerge from initiatives such as the large-scale determination process in the Copyright Review Manage- ment System (CRMS) project, questions are raised about the reliability or meaning of the Ringer data. A closer examination of both the Ringer study and CRMS data demonstrates fundamental misunderstandings and misrepresentations of the Ringer data, as well as possible methodologi- cal issues. Estimates of the size of the corpus of public domain books published in the United States from 1923 through 1963 have been inflated by problematic assumptions, and we should be able to correct mistaken conclusions with reasonable effort. he distinctive nature of U.S. copyright for the period covering 1923–1963 creates opportunities for making available recent creative works and invites speculation about the size of the public domain in that period. U.S. law required compliance with certain rules for works published through most of the 20th century, and many books published in the United States during that period entered the public domain as a consequence of failure to comply with those rules.1 One of those rules, a requirement that the copyright holder for a work published between 1923 and 1963 renew the copyright of the work 28 years after it was published, was the subject of an important Copyright Office study discussed here.2 Many cite this study to suggest that 93 percent of books published in the United States during this period are in the public domain.3 Recent work by the IMLS-funded CRMS project, a HathiTrust project focused on books digitized from research libraries, found a significantly smaller percentage of public domain books for the same period, approximately 50 percent.4 The significant difference in the numbers established by these two efforts is puzzling. It should be noted that there is no literature that examines this discrepancy between the two important sources. However, as librarians, copyright experts, and interested observ- ers have discussed this difference, they have offered a number of hypotheses to try to reconcile these two apparently accurate and yet very different pieces of information. doi:10.5860/crl.78.2.201 202 College & Research Libraries February 2017 One may wonder whether one or both of these efforts made errors in their analyses. Setting aside that possibility, in conversations about this issue, most commentators have suggested the discrepancy may be traced to content-type differences between the corpus of materials from research libraries under examination by the CRMS and the corpus reviewed by the Copyright Office report—for example, that the CRMS has reviewed primarily scholarly works while the Copyright Office registered copyright for more popular works.5 Why should we be concerned about the apparent discrepancy between these two U.S. copyright data points? After all, each effort was conceived for a different purpose, and one could reasonably argue that each may be entirely accurate as articulated and that each serves its own purpose without conflict with the other. On the other hand, the two studies suggest something very different about the size of the public domain for this period, especially for books published in the United States. The most urgent need for a better understanding of this problem is the need for facts that aid libraries in decision making. A better understanding of the size of the public domain, gaps in the portion of the public domain that has been digitized, the specific characteristics of the in-copyright corpus, and the problems and opportunities in the remainder can help drive digitization and rights clearance efforts. More important, better facts are needed to help shape a host of library programs such as preservation efforts and shared print initiatives. For example, shared print monograph initiatives typically give consideration to whether storage decisions can be shaped by online access to corresponding digital versions; in those discussions, the size of the public domain is typically considered. The most important reason for having a more accurate sense of the extent of the pub- lic domain for this period is the way that accurate, evidence-based accounting of the available facts can shape deliberate action by libraries. Especially in digitization and “collective collection” activities, library efforts are hampered by imperfect “facts” or an absence of information. In the research library community, we struggle with questions such as the extent of the corpus of U.S. federal documents, the size of the collective North American re- search library monograph collection, or more general questions such as the number of books published in the United States in a given period. Although our literature rarely highlights the problem of a lack of sufficient data for decision making, these gaps in information often color important strategic discussions. For example, in 2012 a group of four consortia met to chart a strategy for U.S. federal government documents digi- tization. The absence of good bibliographic data on the size and characteristics of the corpus left the group to speculate about the nature of the challenge based on estimates that varied widely: these estimates of the size of the corpus ranged from 1.8 million volumes to 2.2 million volumes, and from an average of 60 pages per volume to 300 pages per volume. The vastness of the range—between 108 million pages and 660 mil- lion pages—further complicated efforts by introducing a dramatic difference in cost and difficulty. The 2010 Lyrasis-IMLS meeting on “Developing a North American Strategy to Preserve & Manage Print Collections of Monographs” concluded through a vote of invited participants that the community’s highest priority for print storage efforts should be “public domain and published up to 1976” monographs in HathiTrust and “already in storage.”6 Discussions at the meeting struggled with determining a goal and strategy at least in part because there were no data on the scope of such an effort. As noted by Demas and Brogdon in their earlier work on copyright determination in support of preservation reformatting, “A curious lack of systematic investigation of [copyright status and permissions seeking] by the library community has effectively stymied large-scale library-based efforts to preserve” this portion of our collective collection.7 How Large Is the “Public Domain”? 203 With regard to the question of the size of the 1923–1963 U.S. public domain book corpus, in conversations with those who have some expertise in this area, there is some sentiment that the assumptions created by Copyright Office study are, even if wrong, helpful in that they might stimulate a “gold rush” looking for the public domain, and perhaps even that a clearer picture of the facts would be a nuisance. In research library strategy meetings, book-related data from Ringer’s Copyright Office study is sometimes invoked to dismiss concerns about the difficulties of copyright determination for the period or to paint a rosy picture about the ease of providing access to this relatively recent corpus of material. Additional and more accurately interpreted facts will certainly result in our being better able to direct our efforts. As a community, we have been both careless about the challenges and unsystematic about the opportunities. It is hoped that the analysis provided here will shed more light on the question of the size of the U.S. 1923–1963 public domain, as well as the opportunities presented by clearing rights for the remain- ing in-copyright collection and grappling with challenges in that middle ground of ambiguous and problematic publications. Background Efforts: Ringer’s Renewal of Copyright and the CRMS Project To better understand why Ringer and the CRMS come to different conclusions and why assertions about the size of the public domain (based on Ringer’s analysis) are problematic, a detailed analysis of Ringer’s work and the CRMS follows. Each sec- tion of analysis provides background on the effort in question, either Ringer or the CRMS, followed by an examination of the sources and methodology, the findings, and problems of each. Ringer and the Renewal of Copyright Study In 1961, Barbara Ringer published a study on Renewal of Copyright for the U.S. Congress on behalf of the Copyright Office. This outstanding work exploring the history, value, and complications related to renewals also contains very important facts intended to help understand patterns of renewals. To help decision makers understand the role and extent of copyright renewal, Ringer’s study sought to determine the frequency with which copyright holders renewed their copyrights for a variety of types of creative works. Ringer’s data and the study itself are frequently cited in support of a number of arguments, including the copyright status of books published in the United States from 1923 through 1963 in research libraries. Ringer’s Sources and Methodology Ringer’s study looks at renewals for materials whose copyright was registered in fiscal year 1932. The federal government’s 1932 fiscal year ran from July 1931 through June 1932. Ringer’s data are presented in detail on pages 220–224 of her study.8 However, Ringer does not clearly identify her sources or her methodology and we are left to speculate about both. The Copyright Office is the source of all information related to registration and renewal of copyrights in the United States. A limited amount of this information is publicly available outside the Copyright Office itself. One publicly available source of information on registrations and renewals, the Catalog of Copyright Entries (CCE), is also a key copyright resource and the foundation for resources such as the Stanford Copyright Renewal Database. According to the Copyright Office’s Circular 22, The Copyright Office published the Catalog of Copyright Entries (CCE) in printed format from 1891 through 1978. From 1979 through 1982, the CCE was issued in 204 College & Research Libraries February 2017 microfiche format. The CCE is divided into parts according to the classes of works registered. Each CCE segment covers all registrations made during a particular period of time. Renewal registrations made from 1979 through 1982 are found in Section 8 of the catalog. Renewals prior to that time are generally listed at the end of the volume containing the class of work to which they pertained.9 While CCE entries include essential information, the Copyright Office’s Circular 22 notes that CCE entries are “not a verbatim transcript of the registration record”10 and exclude information such as the address of the copyright claimant. The card system at the Copyright Office includes this additional information, and the copyright registra- tion certificates are the most complete and authoritative source of information about a copyright. In their 1997 study, “Determining Copyright Status for Preservation and Access: Defining Reasonable Effort,” Demas and Brogdon effectively demonstrated the reliability of using the CCEs in making copyright determinations.11 Ringer’s analysis probably relied on tallies of renewals (based on Copyright Office catalog card data) performed in 1959–1960 against total numbers of works registered and counted in 1931–1932 and documented in the 1932 Annual Report of the Copyright Office. In her 1962 study, Ringer notes that 57,065 items fell into the definition of books referred to as Class A works by the Copyright Office and reports that 3,942 of these works were renewed.12 The copyrights for these items were registered with the Copy- right Office in fiscal year 1932, and their renewals were required 28 years later—in 1958 and 1959. Reproducing the results of the Copyright Office analysis is made challenging by the absence of a methodology in the 1961 report, but we are aided by the 1932 Annual Report of the Copyright Office and a footnote in the table included in the 1961 report. The 1932 Annual Report of the Copyright Office spells out the details for Class A registrations that year.13 Often, the Copyright Office’s term “Class A” is used inter- changeably with the word “books,” though the term includes many other formats. The Copyright Office’s 1932 Annual Report notes that the copyrights of 57,065 Class A works were registered that year. This is the same number used in the 1961 Renewal of Copyright report. Many types of work qualify as Class A works, and the Copyright Office received registrations for non-U.S. works as well. As reported in the 1932 An- nual Report, the 57,065 Class A registrations included types of works and numbers of each type found in table 1. A footnote in Ringer’s analysis notes that the renewals number “Includes contribu- tions to periodicals,” which is consistent with the Class A compilation reported in the TABLE 1 1932 Copyright Office Annual Report Class A Registrations Class Subject Matter of Copyright 1931–1932 A Books (a) Printed in the United States Books proper 13,460 Pamphlets, leaflets, etc. 26,995 Contributions to newspapers and periodicals 10,489 Total 50,944 (b) Printed abroad in a foreign language 4,784 English books registered for ad interim copyright 1,337 Total 57,065 How Large Is the “Public Domain”? 205 1932 Annual Report. We should assume that Ringer’s number would also include all of the other categories that comprise Class A works, and thus each of the components reported in the 1932 Annual Report. For a breakdown of types of publications in Class A works, see figure 1. It would have been unlikely for Ringer and her staff to have reassembled all of the FY1932 registrations. It may have seemed unnecessary, as the initial reporting process should have been reliable, and recounting would have been labor-intensive. However, after nearly 30 years, it would have been extremely unlikely that a recount performed for the purposes of the 1961 study would have produced precisely the same number as the initial count. Thus, we should assume that Ringer only counted renewals and compared this number to the numbers (by category) recorded and reported in FY1932. As we will see later, because Ringer compares the number of renewals to an undocumented and unpublished set of registrations, this sort of procedure is problematic and makes reproducing the conclusions of the original work difficult, if not impossible. Ringer’s Findings It is critical that we understand what Ringer’s study did and did not establish. One of the study’s most often noted conclusions is that copyright was renewed for only 15 percent of the works whose copyright was registered in FY1932. Because this 15 per- cent is the percentage for a variety of types of works, embedded in this average of 15 percent are lower and higher rates of renewal for specific types of works. For example, Class E works (musical compositions) were renewed at a rate of 35 percent. Notably, Ringer reports that “only 7% of books … are being renewed.”14 The conclusion that the copyright for only 7 percent of the FY1932 books was renewed is the crux of much of the analysis that follows. The renewal rates reported by Ringer are critical data points; from them we can certainly learn much about works where renewal was required, especially those pub- lished between 1923 and before 1933, the date this study used for its analysis. If we infer from these data, as many do, that 93 percent of the books published from 1923 through 1963 are therefore in the public domain, we make several critical mistakes. FIGURE 1 FY1932 Class A Registrations by Type 24% 47% 18% 9% 2% Books proper Pamphlets, leaflets, etc. Contributions to newspapers & periodicals Printed abroad in a foreign language English books registered for ad interim copyright 206 College & Research Libraries February 2017 Consider Ringer’s statement that “At present about 15% of subsisting copyrights are being renewed; in fiscal 1959, for example, roughly 21,500 copyrights were renewed, as against 124,500 that went into the public domain at the end of their first 28-year term.”15 That is, Ringer suggests that a failure to renew necessarily moves one of these works into the public domain. If we make this assumption, we overlook some of the char- acteristics of the fairly large and heterogeneous corpus, especially in Class A works. Moreover, if we infer from these data that the rate of renewal for all books published between 1923 and 1963 remained a constant 7 percent, we fail to take into account data suggesting that the rate of renewals increased over time, a fact that Ringer herself reports in the 1961 study. Problems with Ringer’s Conclusions Where Nonrenewal Is Irrelevant We need first to understand that some works did not enter the public domain as a result of a failure to renew. One of these classes of works is books “Printed abroad in a foreign language.” At the time of Ringer’s analysis, renewal was required for these works, but subsequent changes to copyright law removes the renewal requirement for these works. Under current law, as a result of the post-Uruguay round of GATT agree- ments, renewal is not required for foreign-language books published abroad between 1923 and 1963 and for English-language works published abroad (and not published in the United States within a month) by non-U.S. authors for those same years. Peter Hirtle has explored the renewals question extensively, especially with regard to foreign works, and while he concludes that copyright for all 1923–1963 works is complicated, his analysis makes abundantly clear that most non-U.S. works from this period are protected by copyright.16 A second class of works to consider here is ad interim copyright registrations. Even at the time of Ringer’s 1961 report, ad interim registrations, defined by Circular 22 as “a special short term of copyright available to certain pre-1978 books and periodicals,”17 would have been complicated by subsequent publication of the work in the United States. This fairly complicated situation is discussed at length by Peter Hirtle using the example of a Class A work published first in 1939; however, as Hirtle concludes, “Almost by definition, therefore, an ad interim copyright means a restored copyright.”18 So, even though both “Printed abroad in a foreign language” and ad interim registra- tions would have been counted among the candidates for renewal, neither could be included today to calculate the 7 percent number. Thus, assumptions about public domain status based on the 7 percent renewal rate are related to a corpus that should exclude 11 percent of the works registered in FY1932 (see figure 2). Moreover, in the case of a third class of works, “Contributions to newspapers and periodicals,” many would have been afforded copyright protection as a result of separate copyright condi- tions for the newspapers and periodicals in which they were published. “Contributions to newspapers and periodicals” represents 18.3 percent of the FY1932 Class A works. The Importance of Terminology Terminology, and specifically how we understand “books” and “Class A” works, proves to be one of the most important facets of this analysis. A fuller understand- ing of what comprises Class A works helps to better understand a major difference between the findings of the Ringer study and what we expect when we discuss the copyright status of “books.” Included in Class A works are many items that would not be widely understood to be “books,” either in a research library context or popularly. As noted, Class A works include “Contributions to newspapers and periodicals” and “Pamphlets, leaflets, and the like,” as well as books. Research libraries do, of course, How Large Is the “Public Domain”? 207 collect pamphlets, sometimes in their special collections and sometimes in the general collection as “bound withs.”19 Leaflets and posters are also collected. And although many of these items may be in research libraries, libraries would not characterize them as “books” for the purposes of cataloging or collection management. Similarly, a popular understanding of “books” would exclude materials that make up 66 percent of the reported FY1932 Class A works. So, importantly, while Ringer says “only 7% of books … are being renewed,” she means that only 7 percent of Class A works are being renewed and not that 7 percent of the 13,460 “books proper” are being renewed.20 As shown in figure 2, “Books proper” comprise 23 percent of Class A works registered in FY1932, while works not subject to renewal make up 11 percent of the total and nonbook formats comprise the vast majority of works in this classification (66%). The Rate of Renewal for “Books Proper” We can only speculate as to the rate of renewal for different portions of the Class A works. Rights holders are likely to show less interest in renewing leaflets, for example. Book authors and publishers have the greatest economic incentive to renew copy- rights, and so it is entirely likely that the preponderance of the 3,942 Class A renewals received were made for the 13,460 “books proper” whose copyright was registered in FY1932. If it were the case that the 3,942 renewals were made primarily against the 13,460 “books proper,” this would be a renewal rate of 29 percent. While the rate of renewal for “books proper” may not be as high as 29 percent (for example, because some foreign works and some submissions to periodicals were renewed), the renewal rate for “books” is certain to be several times higher than the 7 percent reported. Based only on the information provided in the 1961 report, we are not able to determine the rate at which the copyright of “books proper” was renewed; however, we can be ab- solutely certain that Ringer’s statement that “only 7% of books … are being renewed” is not accurate. We should make some effort to confirm the rate of renewal for “books proper,” but this will prove challenging. FIGURE 2 FY1932 Class A Registrations Grouped 23% 66% 11% Books proper Nonbook formats Not subject to renewal 208 College & Research Libraries February 2017 The Copyright Review Management System (CRMS) In December 2008, a group of institutions led by the University of Michigan began a systematic and large-scale review of the copyright status of books in HathiTrust pub- lished in the United States between 1923 and 1963. That initiative received generous support from the Institute of Library and Museum Services (IMLS) for the creation of a strengthened copyright determination system and process.21 The challenge for these processes is, in a sense, the challenge of proving a negative: those who are interested in establishing the public domain status of a U.S. book published in this period must prove definitively that the book did not renew its copyright or did not follow other required copyright formalities (such as including a copyright statement). The IMLS- funded effort, referred to as the Copyright Review Management System (CRMS), introduced a unique methodology to help build reliability and thus confidence in making copyright determinations. Although the bulk of the reviews have now been completed (as of December 2015), as previously undigitized candidate U.S. works are digitized and become available in the HathiTrust Digital Library, they are reviewed in the CRMS using the same process. By 2016, more than 300,000 books published in the U.S. between 1923 and 1963 had been reviewed. The CRMS’s Sources The corpus of materials reviewed by the CRMS is drawn entirely from the HathiTrust Digital Library. The HathiTrust corpus is notable for its size and comprehensiveness. At 13.8 million volumes (in December 2015), the HathiTrust collection is larger than all but the largest research library collections. Overlap analysis performed by OCLC using 2009 and 2010 HathiTrust data demonstrated significant overlap with ARL library print collections.22 As noted by Malpas in 2011, “In June 2009, an average of 20% of titles held in any given ARL library was duplicated in the HathiTrust Digital Library; by June 2010, the average duplication rate had increased to 30%.”23 The 2010 OCLC analysis was performed when the HathiTrust collection was fewer than 6 million volumes. Now that the HathiTrust Digital Library has more than doubled in size, the median overlap between all ARL collections and HathiTrust will have grown significantly. The immense size and diversity of materials available to the CRMS, as well as the consistency of its findings, suggests that its data for books published in the U.S. from 1923 to 1963 are characteristic of the copyright status for these works in ARL libraries. The CRMS Methodology The CRMS methodology was designed to minimize human error, and the project con- firmed its reliability by contracting with the U.S. Copyright Office for three independent reviews of samples. The CRMS, through its online system, brings together reviewers from several different institutions (initially Indiana University and the Universities of Illinois, Michigan, and Wisconsin), and reviewers interact with materials through an interface that prioritizes candidate volumes for review. Books published in the United States between 1923 and 1963 comprise the candidates for review work and are assigned by the system to reviewers. The CRMS system makes available to each reviewer a variety of tools, including the Stanford Copyright Renewal database and a digitized copy of the work itself. Once a determination is made by one reviewer, the work is prioritized for review by a second reviewer at another institution. If the sys- tem finds that the two reviewers agree on the determination, that conclusion (that is, public domain or in-copyright), along with a reason, is registered in the CRMS system with metadata for the book. If the two reviewers disagree, the disputed determination is sent to an expert reviewer for arbitration. The arbitrated conclusion is then stored in the CRMS database. Because the goal of the CRMS is increasing the reliability of How Large Is the “Public Domain”? 209 determinations, the project regularly sampled determinations and paid the Copyright Office to perform research on these conclusions. This metareview of CRMS work found the process to be extremely reliable, ultimately exceeding a 99 percent accuracy rate.24 The CRMS’s Findings The CRMS process has been extremely important for a number of reasons. Perhaps most important is its success in opening access to U.S. books published from 1923 to 1963. It also establishes some very important facts: HathiTrust contains some 14 mil- lion books digitized by partner libraries, typically in cooperation with Google, and within that corpus there are some 300,000 books published in the U.S. between 1923 and 1963. By early 2016, these 300,000 books in HathiTrust had been reviewed by staff at HathiTrust partner institutions to determine their copyright status, and, of these, approximately 150,000 had been determined to be in the public domain, either because their copyright had not been renewed or because the work did not include a copyright notice. While roughly 50 percent of the books reviewed were definitively determined to be in the public domain, only about 16 percent were conclusively determined to be in-copyright as the result of a copyright renewal record being located. The remainder, more than 30 percent, remain closed for public access due to challenges in determining their copyright status: for example, the work may also include significant amounts of material likely to be protected by copyright. The findings by the CRMS have been remarkably consistent during the course of the project. That is, from month to month, the CRMS has found roughly 50 percent of the materials being reviewed to be in the public domain and approximately 16 percent to be in-copyright. The CRMS project stores and shares detailed, database-based data on copyright determinations.25 To better understand the characteristics of U.S. works published in 1932, a period consistent with Ringer’s analysis, the CRMS project provided the author cumulative CRMS data for determinations made throughout the project until the end of 2015. By January 2016, the CRMS reviewed 5,004 books characterized by catalogers as books having been published in the United States in 1932. In its selection routines, the CRMS tries to exclude books published simultaneously both in and outside the United States. The CRMS report shows that 634 (12.7%) of these volumes were found to be in-copyright as a result of renewal. A work can be determined to be in the pub- lic domain for several reasons: for the same period, 67 (1.3%) of the volumes were determined to be in the public domain because they omitted a copyright statement, and 2,759 (55.1%) were determined to be in the public domain because of a failure to renew. The remainder of the volumes remain closed because of complications such as in the inclusion of possibly in-copyright content within the book. Apparent Contradictions in Determining the Size of the Public Domain The Ringer study and the CRMS process have been extremely helpful and, indeed, influential in shaping our understanding of the U.S. public domain. And yet the con- clusions of the two efforts are at odds with each other. In our work as librarians and digital collection builders, our focus tends to be on the public domain, so the biggest clash is between the putative 93 percent public domain suggested by the Ringer study and the 50 percent public domain found by the CRMS. Errors Based on Problematic Assumptions The conclusion that those volumes that were not renewed are in the public domain, encouraged by Ringer herself and a host of sources, is a logical problem. Although Ringer found 7 percent of the books reviewed to have been renewed, we cannot then conclude that the other 93 percent are in the public domain. For example, as noted here, 210 College & Research Libraries February 2017 some of the items reviewed were foreign works with foreign copyrights and others were works with ad interim copyright registrations, registrations for works that would have been published and renewed later. Indeed, this kind of logic is often used to argue that the entire public domain for this period (that is, including all formats) is 85 percent or greater. One university copyright resource page states that: Estimates are that 85% of copyrights were not renewed (93% in the case of books), most likely because the works were no longer commercially valuable. In addition, works were not protected unless authors included a basic copyright notice—the word “copyright” or © with one’s name and year next to it (this notice require- ment was eliminated in 1989). By some estimates, 90% of works did not include this copyright notice and immediately entered the public domain. So, before 1978, only 10% of works might have been subject to copyright at all, and of the works that were, up to 85% only used the first 28-year term, with 15% renewing for the full 56-year term. That’s 1.5% of all works with 56 years of protection, 8.5% with 28 years, and 90% completely free.26 Critically important copyright resources imply similar conclusions (that is to say that works not renewed were thus in the public domain) or are not careful to make clear that nonrenewal is only one consideration. For example, the Copyright Sherpa web page implies the nonrenewed works are in the public domain in its statement that “The copyrights for many, many works were not renewed. In fact, the U.S. Copyright Office has estimated that less than 15% of works eligible for renewal were, in fact, renewed. That means a lot of works are in the public domain … but it also means you have to find out whether copyright renewal happened, or didn’t.”27 Just as one cannot conclude from a failure to renew that something is in the public domain, we cannot conclude from the CRMS’s data (in other words, that 50 percent of the works are in the public domain) that 50 percent are in-copyright. Only approximately 16 percent have been found to be definitively in copyright, and only a portion of the more than 30 percent that are problematic will be protected by copyright. Thus we find that we should not logically conclude from one set of facts (for example, that a percent- age is in the public domain) the other (for instance, the remainder are in copyright). Data and Terminology Errors The reliability of data from both the CRMS process and the 1961 report should be interrogated. As reported earlier, the CRMS subjected itself to ongoing scrutiny and confirmed the reliability of its methodology. It would be helpful to assess the reliability of the conclusions drawn by the Copyright Office in its 1961 analysis by reviewing the renewal data for the 13,460 “books proper.” Aside from card records at the Copyright Office, records from the Copyright Office for this period are only available in the Catalog of Copyright Entries (CCE), an annual published by the Copyright Office. Although these annual publications are organized chronologically, their relationship to registrations is only approximate. For example, a volume (such as 1931) may contain works published in a previous year (such as 1930). Moreover, an annual CCE volume will also include renewals in addition to registrations for U.S. and foreign works, and ad interim registrations. Although these CCE volumes do not correspond directly to the registrations reported in the annual reports of the Copyright Office, they should have a rough correspondence. Despite this, the numbers reported in the CCE volumes for the years corresponding to the 1932 Annual Report are curiously inconsistent with the Annual Report. The 1932 Annual Report tells us that 13,460 “books proper” were registered. The 1931 CCE for How Large Is the “Public Domain”? 211 books reports only 9,837 U.S. publications, and the 1932 CCE for books reports only 8,994 U.S. publications. A year later, in 1933, we see no significant variation and, instead, a slight decline: only 8,268 U.S. publications are recorded. Similarly, while the 1932 Annual Report notes 4,784 books “printed abroad in a foreign publication,” the CCE volumes for 1931, 1932, and 1933 report respectively 3,357, 2,471, and 3,170 “foreign books in foreign languages.” We are at a loss to explain the roughly 30 percent larger number of “books proper” in the Annual Report as compared to the CCE volumes for the corresponding years. Only by comparing card-based records at the Copyright Of- fice to the relevant CCE volumes can we hope to explain the discrepancy between the numbers in the CCEs and the Annual Report, but it is likely that the renewals Ringer’s study counted for the FY1932 works corresponds to a smaller candidate group than the 13,460 “books proper” reported by Ringer and the 1932 Annual Report (that is, approximately 9,000 books instead), thus increasing the percentage of the rate of renewal further. Failure to Take into Account Changes in the Rate of Renewal over Time Ringer and the CRMS effort both provide some very important chronological or date- based information about renewals. As Ringer notes, the trend was for renewals to increase with the passage of time. She reports a “dramatic rise in the total percentage of copyrights renewed.” Her report provides data showing annual rates of renewal beginning in 1883, and notes that the rate of renewals for works published between 1940 and 1959 had doubled.28 She includes a graph, “Appendix C: Graph to Accompany Table 2,” that shows growth in a compelling way.29 The CRMS also supports the notion of a general historical trend of renewals increas- ing after 1932, though not a trend of constant increase. Renewals for books published in 1923 were approximately 9 percent and had grown to 13 percent by 1932. In many years following 1932, the CRMS found renewal rates of 20 percent or more. One of the more peculiar findings of the CRMS project is that rates declined toward the end of the 1923–1963 period after a relatively steady growth. Figure 3 shows this very effectively. A table of renewals and other related data found by the CRMS process is included in appendix A. FIGURE 3 Rate of Renewal by Year (CRMS) 212 College & Research Libraries February 2017 Popular resources commonly conclude or imply that Ringer’s analysis applies to the entire period before 1964, or even 1978 as in the case of the university copyright web page quoted earlier. A commercially published book by Stephen Fishman guiding readers on exploiting the public domain makes both the problematic chronological and the problematic logical argument: “The Copyright Office estimates that only 15% of all works published between 1923–1963 were ever renewed. This means that all works first published in the United States from 1923 through 1963 for which no renewal was filed are in the public domain” (emphasis added).30 In fact, Ringer was careful to limit her conclusions to FY1932 works and pointed out that rates of renewal were dramatically increasing. The Stanford Fair Use page appears to imply the findings from renewal rates for FY1932 works are applicable to the entire 1923–1963 period when it writes that “If a work published after 1922 and before 1964 was not renewed, it fell into the public domain. According to Copyright Office surveys, the great majority of pre-1964 works were never renewed and, therefore, are in the public domain” (emphasis added).31 In fact, as we know, the Copyright Office report only focused on renewals for works published before 1933. It is reasonable to assume that the rate of renewals would increase over time and could not be generalized from the 1932 data. Indeed, both Ringer’s data about increas- ing rates of renewal and data from the CRMS process support this likelihood. Ringer does not provide data on renewals for years subsequent to the FY1932 publications, but CRMS data show an increase in renewals after 1932. In their analysis of copyright renewal for core agricultural monographs, Demas and Brogdon also found increasing rates of renewal over time, from “11% for titles published in the 1920s versus 39% for titles published in the 1940s.”32 It would also be reasonable to assume that publication of Ringer’s study would stimulate an increase in renewals, for example, because public- ity around renewal rates would motivate rights holders. Regardless, it is important to keep in mind that the rates of renewal reported by Ringer are only for the FY1932 publications and not for all publishing in the 1923–1963 period. Conclusions and Future Work Estimates of the size of the corpus of public domain books published in the U.S. from 1923 through 1963 have been inflated by problematic assumptions. The apparent dif- ferences between numbers reported in Barbara Ringer’s 1961 Copyright Office study and data generated by the Copyright Review Management System are, on closer examination, not great. Indeed, the corpus under review by the CRMS process may represent a major portion of the U.S. book publishing output for the period; thus, the CRMS data help us better understand the copyright characteristics of books examined by the Ringer study. Problematic assumptions based on Ringer’s data are the source of misunderstandings about the size of the 1923–1963 U.S. book public domain. Those assumptions include the following mistakes: • Problematic (logical) assumptions, where we assume that nonrenewal neces- sarily means a work is in the public domain. • A problem of misinterpreting data, typically by conflating terminology, first, where we assume that Class A works are synonymous with “books” and, sec- ond, where Ringer assumes that the corpus of registered works is as large as was reported in the 1932 Annual Report. • A failure to take into account changes in the rate of renewal over time, principally where we assume the conclusion collected for the FY1932 publications applies to publications from the entire period from 1923 through 1963. Nonetheless, the percentage of the U.S. 1923–1963 books in the public domain is extremely large, and those that are not definitively in the public domain remain an How Large Is the “Public Domain”? 213 important corpus for collective attention. If we assume no significant differences from the data generated by the CRMS, we can conclude that approximately 50 percent of the U.S. books from this period are in the public domain (and that many are currently openly accessible in HathiTrust). A further 16 percent are protected by copyright as a result of renewals. This may in fact be a feasibly small body of materials on which the community can perform rights clearance and collective licensing work. The CRMS has made a tremendous contribution by reviewing what may be the majority of the books published in the United States between 1923 and 1963. With more than 300,000 books reviewed, and many clear determinations about copyright status, future work should be much easier. Still, this analysis of Ringer’s 1961 study and comparison with CRMS data raise many questions and can help guide subsequent work. Suggested areas include: • Closer examination of the 1932 Catalog of Copyright Entries: It would be helpful to more carefully compare 1932 copyright registrations to data from Ringer’s study. The Copyright Office has embarked on a process to digitize cards from their catalog, and this would be an ideal source to perform that examination. Alternatively, one could convert the entries in the 1932 CCE to a database so that entries can be categorized (for instance, “books proper” as one category) and conclusions further compared to data from Ringer’s study. It should be possible to determine the rate of copyright renewal for these U.S. books published in 1932. It should also be possible to determine overlap and difference with HathiTrust, helping to better understand the extent to which HathiTrust includes U.S. publishing output and perhaps guiding efforts to augment HathiTrust. Should this analysis find a rate of renewal similar to that found by the CRMS, we may be able to extrapolate CRMS findings to other years. • Comparison of the 1932 Catalog of Copyright Entries and WorldCat: WorldCat is an interesting but problematic point of comparison, compiling, as it does, books cataloged rather than books published. Still, it is another data point, albeit a noisy one. The 1932 CCE reports 8,994 U.S. books, while OCLC’s WorldCat reports 55,657 U.S. 1932 books.33 To what can we attribute this surprising and significant difference? A closer examination of the WorldCat output should help us determine the extent to which that number is inflated by, for example, nonbook material cataloged as books34 and should also help us understand the extent to which U.S. publishing for the period is not represented in the CCE. • Rights clearance/licensing of in-copyright works: The CRMS’s conclusions about the percentage of works whose copyright has been renewed inspires confidence that this body of relatively contemporary publishing can be opened more generally. In 1997, Demas and Brogdon urged establishing a “reasonable effort procedure for contacting copyright holders and seeking permissions” to advance our collective preservation efforts.35 A coordinated effort to secure permissions for works that are not on the market may be feasible. Roughly 55,000 such volumes are in HathiTrust now. What is the financial and practical feasibility of such an effort? • Determining the ROI for further work on works with a complicated copyright status: Again, based on the work done by the CRMS, it appears that approxi- mately 30 percent of U.S. 1923–1963 book publishing has complicated rights issues—neither definitively public domain nor definitively in-copyright. This body of material should be analyzed (perhaps sampled) by experts and sub- jected to collective review. Many of these works will be in the public domain and can add to our collective public good. 214 College & Research Libraries February 2017 We should, with reasonable effort, be able to correct the mistaken conclusions drawn from Ringer’s 1931 study with some precision. A better understanding of U.S. publish- ing between 1923 and 1963 can be extremely beneficial for managing and providing access to the total collection. Acknowledgements I owe a special debt of thanks to several people who aided me in this analysis. Peter Hirtle patiently worked through many of the issues and challenges with the method- ology and sources. John Mark Ockerbloom was generous, both as a guide to nuances in the CCEs and as a reader. The CRMS staff at Michigan, Melissa Levine and Moses Hall, helped clarify CRMS issues and provided invaluable data from the project. And of course my best friends and readers, my wife Maria Bonn and colleague Aaron Mc- Collough, helped untangle my prose. How Large Is the “Public Domain”? 215 APPENDIX A. CRMS Renewal Data The table included below, table 2, drawn from the CRMS database on January 21, 2016, removes duplicate digitized books, that is copies of the same book that were reviewed twice, and provides a count of total candidate volumes for each year, along with the number of works determined to be in copyright: (1) as a result of renewal; (2) in the public domain as a result of no copyright notice; and (3) in the public domain as a result of nonrenewal. Column headings use documented CRMS terminology.36 The first code is an “attribute” code and the second is a “reason code.” Attributes in this table include IC (“in-copyright”) and PD (“public domain”). Reasons include REN (“Copy- right renewal research was conducted”) and NCN (“no printed copyright notice”). TABLE 2 CRMS Renewal Data (January 21, 2016) Year Total IC/REN PD/NCN PD/REN Pct Renews 1923 6,246 578 80 3,320 9.25% 1924 5,686 529 74 3,034 9.30% 1925 5,694 572 80 3,073 10.05% 1926 5,785 658 95 3,078 11.37% 1927 6,541 659 70 3,375 10.07% 1928 6,379 740 88 3,547 11.60% 1929 6,203 759 89 3,338 12.24% 1930 6,752 794 102 3,602 11.76% 1931 5,813 759 60 3,092 13.06% 1932 5,004 634 67 2,759 12.67% 1933 4,693 618 78 2,452 13.17% 1934 4,827 668 64 2,506 13.84% 1935 5,488 879 86 2,779 16.02% 1936 7,525 1,041 111 3,042 13.83% 1937 7,074 962 145 3,862 13.60% 1938 6,354 1,016 87 3,303 15.99% 1939 6,651 1,011 111 3,436 15.20% 1940 6,891 1,097 117 3,616 15.92% 1941 6,804 1,142 104 3,486 16.78% 1942 6,177 1,113 91 3,114 18.02% 1943 5,486 1,114 85 2,654 20.31% 1944 5,147 924 77 2,537 17.95% 1945 5,355 914 77 2,673 17.07% 1946 6,311 1,108 77 3,101 17.56% 1947 7,404 1,406 165 3,412 18.99% 1948 7,603 1,426 77 3,515 18.76% 1949 8,225 1,435 86 3,887 17.45% 1950 8,735 1,690 103 3,763 19.35% 216 College & Research Libraries February 2017 Notes 1. In addition to the rules discussed here for works published in the United States between 1923 and 1963, U.S. law required a copyright notice for all works published between 1923 and 1977, and registration within five years of publication for works published without notice between 1978 and March 1, 1989. 2. Barbara Ringer, “Study No. 31: Renewal of Copyright” (1960), reprinted in Library of Congress Copyright Office. Copyright Law Revision: Studies Prepared for the Subcommittee on Patents, Trademarks, and Copyrights of the Committee on the Judiciary, United States Senate, Eighty-sixth Congress, first [-second] session. (Washington, D.C.: U.S. Govt. Print. Office, 1961). 3. James Boyle and Jennifer Jenkins, Intellectual Property: Law & The Information Society: Cases & Materials (Durham, N.C.: Center for the Study of the Public Domain, 2014), 294. While other examples will be cited later, significant works like Boyle and Jenkins refer to the Ringer work to argue that “the majority of works [entered] the public domain after the first 28-year term (studies put the rate of nonrenewal for all works at 85 percent, and for books alone at 93 percent).” 4. The CRMS project is documented online at www.lib.umich.edu/copyright-review-man- agement-system-imls-national-leadership-grant [accessed 12 December 2016]. A full description of the project is included there. 5. Samuel Demas and Jennie L. Brogdon, “Determining Copyright Status for Preservation and Access: Defining Reasonable Effort,” Library Resources and Technical Services 41, no. 4 (Oct. 1997): 323–34. A similar line of reasoning was advanced by Demas and Brogdon in this work on copyright determination. In this analysis comparing agriculture monographs to the figures promulgated by the Copyright Office, they note: “The renewal rate on a group of qualitatively selected, core scholarly monographs might reasonably be far higher than that of books and pamphlets as a whole” (329). They may have been correct, and were surely right if we include pamphlets. 6. From a November 2010 summary document circulated to participants. 7. Samuel Demas and Jennie L. Brogdon, “Determining Copyright Status for Preservation and Access,” 324. 8. Barbara Ringer, “Study No. 31: Renewal of Copyright.” See “Appendix C: A Statistical Survey of Renewal Registrations.” 9. How to Investigate the Copyright Status of a Work: Circular 22 (Library of Congress, Copyright Office, 2004), available online at www.copyright.gov/circs/circ22.pdf [accessed 12 December 2016]. 10. Circular 22, 2. 11. Demas and Brogdon, “Determining Copyright Status for Preservation and Access: Defining Reasonable Effort,” 323–34. 12. Ringer, “Study No. 31: Renewal of Copyright,” 221. 13. Thirty-Fifth Annual Report of the Register of Copyrights for the Fiscal Year Ending June 30, 1932 TABLE 2 CRMS Renewal Data (January 21, 2016) Year Total IC/REN PD/NCN PD/REN Pct Renews 1951 7,521 1,546 86 3,369 20.56% 1952 7,838 1,681 98 3,371 21.45% 1953 7,748 1,439 108 3,273 18.57% 1954 8,679 1,616 89 3,529 18.62% 1955 8,655 1,696 105 3,729 19.60% 1956 8,677 1,561 98 3,766 17.99% 1957 9,377 1,692 108 3,886 18.04% 1958 9,528 1,619 110 4,364 16.99% 1959 10,563 1,639 106 4,392 15.52% 1960 12,586 1,831 115 5,585 14.55% 1961 12,628 1,892 131 5,557 14.98% 1962 14,106 1,855 124 5,940 13.15% 1963 14,797 1,800 184 6,224 12.16% How Large Is the “Public Domain”? 217 (Library of Congress, Copyright Office, 1932), 17, available online at www.copyright.gov/reports/ annual/archive/ar-1932.pdf [accessed 12 December 2016]. 14. Ringer, “Study No. 31: Renewal of Copyright,” 220. 15. Ringer, “Study No. 31: Renewal of Copyright,” 187. Emphasis added to highlight the way that Ringer suggests a status of public domain from a failure to renew. 16. Peter Hirtle, “Copyright Renewal, Copyright Restoration, and the Difficulty of Determining Copyright Status” D-Lib Magazine 14, no. 7/8 (July/Aug. 2008), available online at www.dlib.org/ dlib/july08/hirtle/07hirtle.html [accessed 12 December 2016]. Hirtle’s primary argument is that that the effect of “restoration” creates confusion for all copyright determinations for this period. He writes that “it is almost impossible to determine with certainty whether a work published between 1923 through 1963 in the US is in the public domain because of copyright restoration of foreign works.” Considerable successful work (such as by the CRMS) has been done to make determinations of the copyright status of works published in this period. We can agree, however, that the complications Hirtle documents with regard to foreign works are immense and that, in most cases, these works would be protected by copyright. 17. Circular 22, 6. 18. Hirtle, “Copyright Renewal, Copyright Restoration, and the Difficulty of Determining Copyright Status.” 19. As one library notes, “bound withs” are bound volumes “containing two or more works bound together after publication by someone other than the publisher,” frequently a library. These volumes present discovery problems because the works included are usually not cataloged separately. Note, for example, the University of Illinois Rare Book and Manuscript Library instruc- tions for cataloging these individual works, available online at www.library.illinois.edu/rbx/qnc/ procedures/bound_withs.html [accessed 12 December 2016]. The CRMS process excludes “bound withs” in its review process. 20. Ringer, “Study No. 31: Renewal of Copyright,” 220. 21. The author was the Principal Investigator of the first IMLS grant awarded for the CRMS. 22. Lorcan Dempsey, Brian Lavoie, and Constance Malpas, Understanding the Collective Collec- tion: Towards a System-wide Perspective on Library Print Collections (2013), available online at www. oclc.org/content/dam/research/publications/library/2013/2013-09.pdf [accessed 12 December 2016]. 23. http://www.oclc.org/content/dam/research/publications/library/2011/2011-01.pdf [accessed 12 December 2016]. 24. The final report by the CRMS project to IMLS is available online at www.lib.umich.edu/files/ services/copyright/2012-03-06_CRMS-US_Final_Report_to_IMLS-narrative_only.pdf [accessed 12 December 2016] and notes that “two audits we undertook with the US Copyright Office’s own records showed marked improvements in the accuracy and reliability of our reviews” (2). The results of those reviews are not available in published documents. 25. Detailed CRMS data is publicly available to us through HathiTrust’s monthly bibliographic data reports, the hathifiles. The compact monthly and cumulative reports include a single line record for each item in HathiTrust and its copyright status. The hathifiles are documented on- line at https://www.hathitrust.org/hathifiles [accessed 12 December 2016], and codes for CRMS categories are documented online at https://www.hathitrust.org/rights_database [accessed 12 December 2016]. 26. Duke Law Center for the Study of the Public Domain, “Public Domain Day—Frequently Asked Questions,” available online at https://web.law.duke.edu/cspd/publicdomainday/2014/ faqs [accessed 31 January 2016]. 27. Public Domain Sherpa, “Copyright Renewal: When It Had to Happen, or Else,” available online at www.publicdomainsherpa.com/copyright-renewal.html [accessed 31 January 2016]. 28. Ringer, “Study No. 31: Renewal of Copyright,” 221. 29. Ringer, “Study No. 31: Renewal of Copyright,” 223. 30. Stephen Fishman, The Public Domain: How to Find & Use Copyright-Free Writings, Music, Art & More (Berkeley, Calif.: Nolo, 2008), 336. 31. Stanford University Libraries, Copyright & Fair Use, “Searching the Copyright Office and Library of Congress Records,” available online at http://fairuse.stanford.edu/overview/copyright- research/searching-records/ [accessed 31 January 2016]. 32. Demas and Brogdon, “Determining Copyright Status for Preservation and Access: Defin- ing Reasonable Effort,” 328. Note, too, that they found an overall rate of renewal of 18 percent, a number very similar to the rate of renewal found by the CRMS effort. 33. Work performed in December 2015 by the University of Illinois Library’s Cataloging and Access Management unit, using GLIMIR-based deduplication algorithms. 34. The cataloging of nonbook material as monographs or “books” is a common problem in WorldCat. 218 College & Research Libraries February 2017 35. Demas and Brogdon, “Determining Copyright Status for Preservation and Access: Defining Reasonable Effort,” 333. 36. For example, www.lib.umich.edu/files/services/copyright/CRMS-World_Cheat_Sheet-v2-0. pdf and https://www.hathitrust.org/rights_database [accessed 12 December 2016].