Text analysis of the Diary of Jacob Engelbrecht

For the print project I propose to perform text analysis on the diary of Jacob Engelbrecht which is owned by the Historical Society of Frederick County.  I propose using MALLET, Voyant and WordSeer to discover which of these tools work best with the diary and to gather what information I can through analysis of the diary’s text.

Fun guy, huh?
Fun guy, huh?

Jacob Engelbrecht, born in 1797, was a local Renaissance man, who earned his living as a tailor in Fredericktown.  He was the son of a Hessian soldier from Bayreuth who had been imprisoned in Frederick in 1782 and later, when presented the opportunity, elected to stay and marry a local girl named Margaret Haux.

Engelbrecht’s diary provides valuable insight into life in 19th Century Frederick.  While several other published diaries exist that were recorded by Frederick residents, this one is unique in that Engelbrecht seemed to be capturing events for posterity.  His entries rarely recount personal events and exploits, instead Engelbrecht relates facts, lists of names, marriages, deaths and other events.  This diary also covers a much broader period of time than others in the Historical Society collection spanning from 1818, through Jacob’s death in 1878, and continues on, under the guidance of Engelbrecht’s son, until 1882.

The “diary” was originally 22 separate diaries of various sizes and conditions that were transcribed and compiled into a two-volume work published by the Historical Society of Frederick County.

Image of an image of Jacob Engelbrecht's handwriting, found in the inside cover of the published diary.
Image of an image of Jacob Engelbrecht’s handwriting, found in the inside cover of the published diary.

We have already discussed MALLET and Voyant and their capabilities in class.  WordSeer is a NEH grant funded project that helps users perform exploratory text analysis.  It allows visualizations, side by side comparisons, explore the contexts of specific words, and create and compare categories or thesauri.

I was inspired by Cameron Blevins’ blog post about topic modeling Martha Ballard’s diary and how successful the technique was.  The Engelbrecht diary like that of Martha Ballard, contains daily entries, but instead of covering 27 years, it covers more than 60.  No analysis has been done on the text of the Engelbrecht diary because the body of text is so large.  Although I have no idea how many entries were written in the diary, the transcribed edition that was subsequently digitized consists of 1,167 pages of text.

Currently the only way to really search the Engelbrecht diary is with a simple keyword search.  In her article “Doing More with Digitization,” Sharon Block discusses the limitations of performing keyword searches on electronic sources stating that “the results of keyword searches are quite often incomplete or full of ‘noise,’ irrelevant results that make it hard to find what you are looking for.”  She continues, maintaining that “for searching to be effective, access needs to be supplemented by analysis.”  My experience searching for information within the Engelbrecht diary has been largely unsuccessful or time consuming.  I look forward to exploring the different ways these three tools interact with the diary and the information that will be revealed through text analysis.

2 Replies to “Text analysis of the Diary of Jacob Engelbrecht”

  1. Would it be helpful to try to predict what you might find by analyzing the text with said tools? How would that contribute to the bigger picture of Frederick County history? Comparing it to the Martha Ballard diary seems productive, but are there other diaries from Engelbrecht’s time or region for comparison? What might be a hook to get researchers or students interested in perusing Engelbrecht’s diaries? I also like how you propose to use a few tools to find what might produce the best results for your project. I look forward to hearing what you find.

  2. You have a solid subject and approach. It would be interesting to see what we learn about this individual’s diary by working with these tools.

    Along with that, I would hazard to guess that the topic models you might generate from the text could themselves be a useful way of describing the text. So, if you did want to try using that tool, I think the auto generated topics could be a useful way of creating a kind of finding aid that you could put up to help show what themes appear in the text over time.

    If you did do this project, I would caution you to plan to have some time to work through setting up and using these tools. It often takes a bit of work to get set up to run some of these kinds of command line tools and it also often takes a bit of work and effort to get the text prepared in such a way that it will work properly with the tools.

    With that said, I have little doubt that you would learn some interesting things in the course of using these tools on this text. By setting up your approach as exploratory, comparing and contrasting how results from different tools help you understand the text, you have a straightforward outcome. Even if you don’t find anything particularly interesting, there is a clear value on hearing how the tools converged or diverged in suggesting interpretations of the text. Along with that, I think you would likely end up finding out some interesting things about your subject that could turn out to be useful as both work on automating parts of archival description and potentially some things that could be useful for historical scholarship on the period and your subject too.

    If you did go ahead and do this as a course paper, you would ultimately want to have a bit more upfront contextualizing the work in terms of some other literature on text analysis. So you could use Moretti, Underwood and some of the reviews of Moretti’s work as a place to lay out how different folks are thinking about this topic.

Leave a Reply

Your email address will not be published.