For the print project I propose to perform text analysis on the diary of Jacob Engelbrecht which is owned by the Historical Society of Frederick County. I propose using MALLET, Voyant and WordSeer to discover which of these tools work best with the diary and to gather what information I can through analysis of the diary’s text.
Jacob Engelbrecht, born in 1797, was a local Renaissance man, who earned his living as a tailor in Fredericktown. He was the son of a Hessian soldier from Bayreuth who had been imprisoned in Frederick in 1782 and later, when presented the opportunity, elected to stay and marry a local girl named Margaret Haux.
Engelbrecht’s diary provides valuable insight into life in 19th Century Frederick. While several other published diaries exist that were recorded by Frederick residents, this one is unique in that Engelbrecht seemed to be capturing events for posterity. His entries rarely recount personal events and exploits, instead Engelbrecht relates facts, lists of names, marriages, deaths and other events. This diary also covers a much broader period of time than others in the Historical Society collection spanning from 1818, through Jacob’s death in 1878, and continues on, under the guidance of Engelbrecht’s son, until 1882.
The “diary” was originally 22 separate diaries of various sizes and conditions that were transcribed and compiled into a two-volume work published by the Historical Society of Frederick County.
We have already discussed MALLET and Voyant and their capabilities in class. WordSeer is a NEH grant funded project that helps users perform exploratory text analysis. It allows visualizations, side by side comparisons, explore the contexts of specific words, and create and compare categories or thesauri.
I was inspired by Cameron Blevins’ blog post about topic modeling Martha Ballard’s diary and how successful the technique was. The Engelbrecht diary like that of Martha Ballard, contains daily entries, but instead of covering 27 years, it covers more than 60. No analysis has been done on the text of the Engelbrecht diary because the body of text is so large. Although I have no idea how many entries were written in the diary, the transcribed edition that was subsequently digitized consists of 1,167 pages of text.
Currently the only way to really search the Engelbrecht diary is with a simple keyword search. In her article “Doing More with Digitization,” Sharon Block discusses the limitations of performing keyword searches on electronic sources stating that “the results of keyword searches are quite often incomplete or full of ‘noise,’ irrelevant results that make it hard to find what you are looking for.” She continues, maintaining that “for searching to be effective, access needs to be supplemented by analysis.” My experience searching for information within the Engelbrecht diary has been largely unsuccessful or time consuming. I look forward to exploring the different ways these three tools interact with the diary and the information that will be revealed through text analysis.