Print Project: Text Analysis of Earl Shaffer’s Appalachian Trail Thru-Hike Journals

Every spring between early March and mid-April, a couple thousand intrepid hikers laden with backpacking equipment hiking poles and venture to Springer Mountain in northern Georgia, varying in characteristics such as age, nationality, motivation, and physical ability, but all possessed with a common goal—to walk the entire distance of the 2000-plus mile Appalachian Trail. A “thru-hike” of the trail, which runs through the mountains of the eastern United States all the way to Katahdin Mountain in remote central Maine, typically involves putting one often sore and blistered foot in front of the other over sometimes steep or rugged terrain through all sorts of weather while also carrying 40 to 50 pounds of gear nearly every day for six months. For most who begin the journey with the intent of making it to a triumphant finish atop Katahdin, the goal remains an elusive dream. The physical and mental challenges of the endeavor sooner or later prove to be too much for all but about 25% of those who originally set out to complete the entire trail each season.

Earl Shaffer on his 1948 thru-hike. Photo from the collection of the National Museum of American History, Smithsonian Institution.
Earl Shaffer on his 1948 thru-hike. Photo from the collection of the National Museum of American History, Smithsonian Institution. Used under a Creative Commons license.

Earl Shaffer is the first person known to have completed a thru-hike of the AT, a feat he first accomplished in 1948. A veteran of World War II, Shaffer decided to attempt the hike as a means of dealing with war-related stress, which included the death of his good friend and hiking partner in the Battle of Iwo Jima. Shaffer’s successful AT thru-hike demonstrated that it was in fact possible to hike the entire trail in one trip, and as a result, interest in the trail grew. Shaffer completed a second thru-hike in 1965, traveling southbound this time. And in 1998 at the age of 79, Shaffer marked the 50th anniversary of his initial thru-hike by achieving the feat a third and final time. Shaffer kept trail journals during all three of his thru-hikes, and these journals are all now part of the archival collection of the Smithsonian Institution’s Museum of American History. For my digital history print project, I propose to perform a textual analysis of Shaffer’s thru-hike journals in order to examine the themes and patterns present in his on-trail writings to assess how he ascribed meaning to his trail experiences.

Specific subject matter aside, the project appeals to me for two reasons. First, while I was quite wary of the class readings concerning computational analysis, I actually found Cameron Blevins’ use of Mallet to analyze the diary of Martha Ballard (familiar to many students of history from Laurel Ulrich’s A Midwife’s Tale) to be quite intriguing (“Topic Modeling Martha Ballard’s Diary”). Robert K. Nelson has also used Mallet to study the political and social history of Civil War-era Richmond by topic modeling that city’s Daily Dispatch newspaper from 1860 through 1865 (“Mining the Dispatch). I think that it would be useful to do a similar analysis of Shaffer’s diaries (collectively and perhaps individually as well) to see how his topics varied over time both within a specific hike and between the different hikes. I thought it might also be interesting to examine how his themes varied from state to state as well since each state offers hikers a different experience in terms of terrain, flora, fauna, and people encountered along the trail. The journals might be analyzed through Voyant as well to discern additional textual patterns that provide further indication of what Shaffer found to be important while on his journeys.

All three of Shaffer’s thru hike diaries (as well as two out of three of his other AT hike logs in the Smithsonian collection, which could also possibly be included in the analysis) have been digitized and transcribed, thus making it far easier to run the text through analysis tools like Mallet and Voyant. The second draw of this project for me is that these transcriptions are the product of voluntary crowdsourced labor. The Smithsonian is a relatively new yet heavyweight player in the trend of crowdsourcing the transcription of historical documents, launching their initiative in July 2013. As someone who actually pays the rent right now by transcribing oral histories, I’m somewhat ambivalent about this kind of crowdsourcing. Yet the Smithsonian now has (by my count) roughly 360 completed transcription projects because of this effort, with another 30-something in progress, which as a history/archive-y person strikes me as a good thing. So my project in part could help demonstrate the value and benefits of the crowdsourced transcription process.

Shaffer was a pioneer in a long-distance hiking movement that has exploded in popularity since the 1970s. The thru-hike in a sense is a form of escapism—what would possess a person to leave the relative comforts of home and society to embark upon and complete a 2000 mile walk in the wilderness?—and the trail diaries bring us as close as we can get to understanding his raw experience of the trail as it unfolded. In the 50 years between Shaffer’s first thru-hike and his last, American society changed greatly, as did attitudes regarding nature and conservation. What do Earl Shaffer’s Appalachian Trail thru hike journals have to say to us about the roles of nature and of the physical journey in helping him to make sense of his contemporary world, and how do his perceptions change through the individual journeys as well as over the longer course of time?



3 Replies to “Print Project: Text Analysis of Earl Shaffer’s Appalachian Trail Thru-Hike Journals”

  1. You’ve got an interesting subject and a set of data that is ripe for use in the tools you have identified. I found the backstory about Shaffer and the context of the hike interesting and I imagine many others would as well.

    So that is all great. Along with that, I imagine that SI would be thrilled to see someone making use of the data they have produced, so there is a good chance they would be excited to feature and draw attention to results of what you would do.

    At this point, doing this kind of analysis of something like a diary is so new that even just documenting how the different tools reveal or fail to reveal important parts of the narrative are useful. So, whatever approach you take I think you would be likely to generate some interesting results.

    With that said, it would be great to know a little bit more about the subjects he talks about on the trip. That is, it would be useful to have a bit of a sense of the kinds of things that he talks about in the book to help figure out what the best methods would be to work with it.

    One other thought, if you did do this, it might also be interesting to think about using some of the Named Entity extraction tools to try and identify people, places and dates that appear in the text.

    So I think you are onto an interesting project concept. If you went ahead on this, my main suggestion would be to make sure to give yourself a good bit of time to prepare the text and figure out how to use the tools. There is a good bit of work that is necessary to get these things set up in such a way to get good results.

  2. Recognizing that what I am about to suggest would take this from a scholarly project about something digital to a digital project about something scholarly (i.e, a close reading of a historic resource), would there be at the conclusion of this project an opportunity to transfer it to something like History Pin or a Google My Map so that you could highlight locations on the AT where things of interest happened, to help the reader visualize the correlation between events and the spaces in which they happened? I think of My Maps in particular because it allows you to color-code your pins so might there be an opportunity to define the three different journeys, and the interesting things that happened upon them in this way. For example, you could do all the first journey in blue and see where it was that he found something interesting to talk about (making rough estimates on location based on context clues and average miles hiked per hour) and then comparing all three for different sightings of, say, bears or eagles?

    The data set would likely be of interest to some of the Citizen Scholar projects as well, and probably the National Park Service in addition to SI. Just some thoughts; I think you’ve an amazing data set there.

  3. Catherine- Yes, what you suggest would be really cool and is theoretically possible, with one big caveat. The path of the AT has altered quite a bit over the years. It’s pretty stable now because now all but 1% of the trail is on public property, but that was very much not the case in the past. Then again, I suppose any pin using the journal data is going to be an approximate location. Still… something else to ponder.

Leave a Reply

Your email address will not be published. Required fields are marked *