What are hermits?

For a good time, I collected and analyzed ("read") as many books and articles on the topic of hermits as I could, and I did this for the purposes of simply learning about hermits and hermitages. What did I learn? Simply put, my understanding of what hermits was re-enforced. Hermits are solitary people -- usually men -- and they are usually interested in delving into religious experiences. Along the way I also learned a bit about hermit crabs.


cover art from
Hermit Of Holcombe

TLDNR; A hermit would be no hermit without a skull.

Bibliographics and extracted features

I collected about 170 bibliographic items -- book or journal articles, and one can peruse them in a computed bibliography. The whole collection is about 9 million words long, thus, relatively speaking, the collection is approximately the size of 11 Bibles. They date from 1495 to the present, with the vast majority of them dating from 1810 to 1930.

The most frequent extracted features (words, two-word phrases, parts-of-speech, etc.) include words such as I, man, time, old, hermit, day, I think, I know, young man, old man, God, lord, other, great, London, England, France, Japan, home, life, mind, and heart. A summary of these things, complete with simple visualizations, is a bit more informative.

The Company of words: Distant reading

A linguist named John Firth once said, "You shall know a word by the company it keeps." If I take this statement to be true, then it behooves me evaluate what words are associated with my word of interest. I am interested in the word "hermit", and I can extract all the bigrams -- two-word phrases -- which include the word "hermit". I can then visualize the result in the forms of a word cloud or network graph. I believe the network is more informative, and it illustrates what words are used in common with hermit-esque words. The hermit-esque words include "hermit", "hermits", "hermitages", etc. And they have a set of words in common: poor, wild, life, holy, brother, solitary, etc.

Topic modeling is another way to compute what words are in the company of others. I applied topic modeling to the corpus, and for the sake of simplicity, I denoted 12 as the number of clusters ("topics") to return. Thus, I assert the whole corpus is about time, love, hermits, etc. Moreover, the topics of time, love, and hermit predominate:

       labels  weights  features
         time  0.44158  time man day country house life place people s...
         love  0.32503  love heart life old god light man day eyes sou...
       hermit  0.29516  hermit man old time back think came way young ...
     japanese  0.10698  japanese corea corean chinese north new japan ...
        nigel  0.10632  hermit nigel water time moses captain dat der ...
          god  0.09616  god man holy saint came father life church tim...
    cleveland  0.09515  cleveland ruth new company archibald hermits h...
        moira  0.05946  hermit moira foucauld birds charles nest found...
      species  0.04922  species specimens surface length specimen vol ...
          und  0.04211  hermit und der edith fred die life robert natu...
        rolle  0.03999  rolle richard english john parliament monks na...
          man  0.01990  man time hermit fome young quarll faid ſhe fuc...


computed topics

There are grammatical relationships between words, and the most well-understood relationships are synonomy (similarity) and antonymy (opposition). A language model called WordNet asserts many more relationships, and one of them is hypernym -- denoting a broader term in a hierarchy of terms. Given the words from the topic modeling process I can discover the common hypernyms (if they exist) between each & every word in the topic modeling output. I can then illustrate the result as another network graph. If I use my imagination, I believe I can see sets of quantities, parts, persons, nature, organizations, and locations:


hypernyms rooted in the computed topics

The application of semantic indexing is yet a another way to calculate and visualize what words are used on conjunction with others. By plotting the position of words in an n-dimensional space, one can calculate the (cosine similarity) distance between any two words, and words whose distances are smaller than a given threshold can be denoted as nearby. This process can be then be applied recursively to create a network. For example, using this technique the words closest to the word hermit are: martin, hermits, hollow, hunter, sage, godric, recluse, secluded, finchale, and mystery. I believe these words belie the fact that many of items in the collection are fiction as opposed to non-fiction. Applying this process recursively results in the following network diagram:


semantic relationships

Closer reading

Metaphorically speaking, the previous exercises are techniques of distant reading or "looking at the forest", but now I will try to apply close reading or "looking at the trees". This is accomplished through full-text indexing, concordancing, and sentence extraction. For example, after indexing the collection and then searching for items whose titles, summaries, and computed keywords all contain the word "hermit", a list of 15 items is returned, much less than the 170 items in the entire collection. Items appearing nearer the top of the list are considered more relevant:

  1. Hermit In Prison / Jouy, Étienne (1823) - HathiTrust version
  2. Modern Hermit / Bell, H. J. S. (1899) - HathiTrust version
  3. Hermit Of Turkey Hollow / Train, Arthur (1921) - HathiTrust version
  4. Hermit Of Holcombe... / Chellis, Mary Dwinell (1871) - HathiTrust version
  5. Wanderings Of The Hermit Of Westminster On The Continent / Spice, R. P. (1880) - HathiTrust version
  6. Life And Adventures Of Robert, The Hermit Of Massachusetts... / Trumbull, Henry (1829) - HathiTrust version
  7. Hermit Of Eskdaleside / Merryweather, I. A. (1833) - HathiTrust version
  8. Hermit's Dell / Wetmore, Henry Carmer (1854) - HathiTrust version
  9. In The Hermit Land / Williams, Henry F. (1912) - HathiTrust version
  10. Old Mountain Hermit / Raymond, James F. (1904) - HathiTrust version
  11. Hermit Of Moss Pond / Pitcher, James (1896) - HathiTrust version
  12. Devil Turn'd Hermit / Lambert de Saumery, Pierre (1741) - HathiTrust version
  13. Narrative Of The Extraordinary Life Of John Conrad Shafford... (1840) - HathiTrust version
  14. Wanderings Of The Hermit Of Westminster... / Spice, R. P. (1884) - HathiTrust version
  15. Three Girls And A Hermit / Conyers, Dorthea Smyth (1908) - HathiTrust version

Concordancing is a quick & easy way to see how words/phrases are used in the context of a text, and concordancing for the phrase "hermits are" returns the following result:

  ut prattling to himself as conventional hermits are supposed to do. and his shanty was no c
   the baroque novel, simplicissimus, the hermits are closely related to the characters upon 
  mm, trost der nacht, o nachtigal!" both hermits are heard singing the songs which character
  here there wasn't treasure? that's what hermits are for—to guard a treasure.” “well, maybe,
  ubjects from the lives of the two first hermits are the temptation of st. antony, his meeti
   the best made custom shirts a bunch of hermits are wearing them watson sixth city shirt ma
  service the cleveland telephone co. the hermits are in vienna, but after the play will be f
  land taxicab service they enjoy at home hermits are discriminating. they want more than a m
  the new auto fuel get aboard-all ୧୧ the hermits are next" the energine co-cuyahoga bldg.cle
  s both phones maker 110 euclid ave. the hermits are secluded so are we'l they are being dis
  e -the new auto fuel get aboard-all the hermits are next the energine co-cuyahoga bldg.clev
  en subject of conversation. try it. the hermits are seldom seen. here's hoping we may see m
  . arter in the hermits in africa 72 the hermits are going some so are davis & farley! why d
   electric building, cleveland, ohio the hermits are all wearing stone's shoes how about you
  here there wasn't treasure? that's what hermits are for--to guard a treasure." "well, maybe
   in the hermits at happy hollow 138 the hermits are going some so are davis & farley! why d
  t. 5858 cleveland, o. longer's like the hermits are great on the parisian stuff models arri
  oy tonneau runabout for recreation, the hermits are automobilists. and they know good cars.
  ter all, that may well be the case, for hermits are noted for the frugality of their fare."
  done me any good yet and may not. these hermits are likely to live long. their habits are r
  ter all, that may well be the case, for hermits are noted for the frugality of their fare."
  nfra, p. 507). very many anchorites and hermits are remembered by name. a large legacy of b

Unfortunately, concordancing is a brute force technique relying solely on syntax; zero elements of semantics are applied in concordancing. Yet phrases such as "hermit is", "hermitages are", "hermits were" can be very informative. Moreover, we can truncate the results of concordancing, and sort the results to get quick & dirty definitions. Some of the more interesting, to me, are listed below:

I can get around the syntactical limitations of concordancing by extracting sentences instead of snippets. Algorithmically I can: 1) segment the corpus into sentences, 2) optionally define a desired grammar, 3) optionally identify sentences matching the grammar, and 4) filter the result by supplying nouns and verbs. For example, I'm still interested in the definition of hermits and hermitages. Thus my nouns (lexicon) will be "hermit", "hermits", "hermitage", and "hermitages". My verbs are rooted in the word "be", and thus my verbs will be "is", "was", "were", etc. Finally, I will define desirable sentences to be in the form of <noun phrase><verb><noun phrase>. Thus I expect to get sentences akin to "Hermits are _____." Similarly, modal verb (verbs prefixed with words like "should", "could", "ought", "can", etc.) are very telling. By combining all the techniques outlined in this paragraph, a long list of sentences about hermits was generated. Some of the more interesting results include:

"Yes, but what about crabs?"

Extracting some close reading sentences about hermit crabs is kinda fun:

Summary

Again, this reading re-enforced things I believe I already knew. For example, hermits -- for the most part -- are loners, but they do not live alone 100% percent of the time. Like everybody else, then are forced to eek out a living, one way or another.

While this reading re-enforced my beliefs, I am now able to be more articulate regarding these beliefs and I am able to back up my beliefs with more than antidotal evidence. Some people might ask, "Why don't you just read an encyclopedia article?" And my response might be, "I desire to make my own conclusions rooted in my own experience; encyclopedia articles are all well and good, and I will use them to supplement my understanding, not form it completely."

Epilogue

This entire data set -- a Distant Reader study carrel -- is designed to be available at https://distantreader.org/stacks/carrels/hermits/index.zip. Akin to visiting a museum and admiring a painting, download the data set, do your own analysis ("reading"), and share your interpretations.


Eric Lease Morgan <[email protected]>
Navari Family Center for Digital Scholarship
University of Notre Dame

December 10, 2023