Where in the world is TCB?

A colleague (Christa Strickler) announced on a mailing list (ACQNET) the existence of a new issue of TCB (Technical Services in Religion and Theology). It was touted as an open access journal, and I wondered whether or not there was an application programmer interface (API) for downloading the content. After a bit of rooting around, I discovered that TCB is published using a system called Open Journal Systems (OJS), and OJS rigorously supports a protocol called OAI-PMH. So, to answer my question, "Yes, TCB does support an API."

I then wondered how easy it would be to actually harvest/cache/acquisition the content of TCB and then do some analysis against the result. Ironically, I had played with this exact same idea a few years ago. More specifically, I wrote a system of software to harvest the whole of another joural, Information Technology And Libraries. Looking through my archives, I found the desired code, updated it to point to TCB, and in less than two minutes I had downloaded the whole of TCB. The code to do this work is called OJS Toolbox Redux, and the resulting content ought to be here in the cache. 'Want some of the metadata describing each content item? Take a gander at ./metadata.csv.

Some analysis

I then wanted to get my mind around the whole of TCB. What sorts of things were discussed? How have those things ebbed & flowed over time? To answer these questions, I used a thing called the Distant Reader Toolbox to create a data set from the cache and then do some analysis ("reading") against it. Here are a few of the rudimentary things I discovered:
  1. The cache includes 287 items.
  2. Items are dated from 2010 to the present; see the rudimentary bibliography
  3. The sum total of words in the cache is just more than 295,000. By comparison, the sum total of words in Melville's Moby Dick is about 250,000, and the Bible is about 800,000 words long.

The following word clouds illustrate the frequency of unigrams, bigrams, and statistically significant keywords found in the corpus. The content of the corpus lives up to its name, obviously.



unigrams


bigrams


keywords

(If you want to take a gander at some additional characteristics of this data set, then check out the rudimentary index page.)

I then applied topic modeling to the corpus, and since the title has been in existence for twelve years, I topic modeled for twelve topics. This resulted in the following enumeration of themes, and the pie chart illustrates the dominance of the themes across the whole. Again the result echoes the name of the journal.

        labels  weights                                           features
       library  0.07629  library course data cataloging digital metadat...
           rda  0.06975  rda atla cataloging library funnel naco conser...
    cataloging  0.06966  library cataloging quarterly classification se...
         terms  0.06435  rda terms library cataloging religion form atl...
       records  0.05667  records data record library cataloging use inf...
         class  0.03728  class topics individual theology cataloging li...
       heading  0.02412  heading music field headings add terms genre/f...
        cancel  0.02192  cancel church heading religious theology chris...
         india  0.02151  class india history literature information chu...
    collection  0.01955  collection openathens library oer resources ht...
           tcb  0.01453  library tcb maps information cataloging san ma...
         field  0.01361  heading field add literature former bible chan...

To illustrate how these themes ebbed & flowed over time, I augmented the underlying topic model with a year column, pivoted the model, and created the following stacked area chart. From the result we can see that the topic of "rda" was predominate between 2010 and 2012. We can see that "terms" had a going on just after that, but upon closer inspection, "terms" was still a lot about "rda". We can also see that the theme of "catalog" and "library" are pretty consistent throughout time.

Some recommendations

You might ask, "Given this analysis, can you recommend some salient articles elaborating on the themes?" And my answer is, "Sure!" For example, a theme seems to be "rda". Searching the underlying data set's full text database, the following three articles are specifically about RDA and have RDA in the title:

  1. RDA Toolkit & Examples by Lynn Berg (2010-08-01)
  2. My Experience with RDA, Part 1: Overview by Armin Siedlecki (2011-02-01)
  3. My Experience with RDA, Part 2: Examples by by Armin Siedlecki (2011-05-01)

An article specifically about TCB itself includes an editorial by Cynthia Snell (2021-08-23). From the computed summary:

Therefore, in order to appeal to a broader audience—including persons acquiring and cataloging materials at museums and archives—as well as to provide opportunities for interdisciplinary engagement with other library technical services professionals, we will roll out an expanded TCB beginning in 2022. TCB will remain a publication that focuses on the needs of technical services professionals, transforming from a publication for catalogers of materials in religion and theology to one that addresses the interests of all technical services staff who may be working with materials in religion and theology. –the Editors

Epilogue

They say that if you have a hammer, then everything begins to look like a nail. Well, my current hammer is the Distant Reader Toolbox, and I enjoy using the Toolbox to practice librarianship. With the Toolbox I create and curate collections. I then provide services against them. This missive outlined one of my explorations.

If you want to play with this collection, then begin by downloading the data set, and it is temporarily available at http://distantreader.org/tmp/tcb/etc/reader.zip. The compressed zip file is made up of mostly plain text files, a relational database, and a few images. You can then use the Toolbox or any number of other tools to do you own analysis. Other tools include: Wordle, OpenRefine, Antconc, any spreadsheet or database program, or even your text editor. Enjoy!


Eric Lease Morgan <[email protected]>
Navari Family Center for Digital Scholarship
University of Notre Dame

October 28, 2022