Topic Modelling DocSouth’s North American Slave Narratives

I propose to topic model DocSouth’s archive of North American Slave Narratives, a digitized archive that’s already optimized for data analysis. In topic modelling, a computer sifts through large bodies of text and identifies “topics,” or groups of words that often occur near one another. The computer doesn’t understand the words or their meaning – it just notes their frequency and groups them based on that frequency.

There are some great examples of digital history projects online that use topic modeling…

  • Cameron Blevins’s article on Modelling Martha Ballard’s Diary
  • Matthew McClellan’s interesting work on Olaudah Equiano’s diary
  • Sarita Alami, Moya Bailey, Katie Rawson, and Sara Palmer’s project analyzing sermons given on the occasion of Lincoln’s assassination
  • Mining the Dispatch by Robert K. Nelson, which analyzes Civil War Richmond through a newspaper archive

… but very few scholars seem to have engaged with this particular archive. I only found three examples. Laura Tilton’s analysis references DocSouth’s archive among other sources and focuses on racialized dialect in narratives recorded by the Federal Writers’ Project. Jed Dobson’s github offers his code, but no output data or historical interpretation. It was done as part of his book project, which is definitely going on my reading list. The other example I found, a Gephi visualization by Jim Casey, doesn’t interpret the data either.

It’s super cool though.

It seems odd that so few people have worked with this data considering how rich and easily accessible it is. If anyone has come across a topic modelling project that uses DocSouth’s Slave Narratives, I’d love to hear about it.

To get started, I did a quick tutorial on MALLET from the Programming Historian. The tutorial was easy enough to follow, but I’ll definitely be keeping it and MALLET’s documentation handy for a while.

The Programming Historian is an amazing resource. Highly recommend.

I ran my data through MALLET a couple different times, trying out different parameters and seeing how they affected the output. The command I settled on for this proposal was:

bin\mallet train-topics –input slavenarr.mallet –num-topics 30 –optimize-interval 30 –output-state topic-state.gz –output-topic-keys enslaved_keys.txt –output-doc-topics enslaved_composition.txt

So, I fed MALLET the slave narratives (bundled into the MALLET-friendly file slavenarr.mallet) and told it to give me 30 weighted topics (written in enslaved_keys.txt) and some metadata.

A screenshot of MALLET's output
A screenshot of my command terminal after the program finished running. As you can see at the bottom of the image, this command took MALLET nearly 6 minutes to run. If I want to run it paragraph-by-paragraph – which I think I do – it will take significantly longer. Hope my computer can handle it.

The chart below shows the 5 topics that MALLET found to be most prevalent in descending order. I added the final column as a (very tentative!) attempt to assign names to these topics.

31.338told time man good day thought home made years asked back house place make found give money put night peopleMemory?
151.276time made found part day received called state present means person large number hands case purpose long return great makeDoing? Society?
140.996heart poor long children eyes life mother felt friends master child mind thought friend hope face tears dear light kindEmotion
10.827master night time work house man place men day road large slaves miles people plantation water woods great cotton cornEscape
90.802life years great good time place work man character young high history knowledge strong influence early public true service churchAspirations
In order, each column represent the number MALLET assigned to the topic (arbitrary, as far as I know), the topic’s weight (how often it occurred), the words in the topic, and my sketchy attempt to name each topic.

Topic 14 hit me hard. While the two topics weighted most heavily are pretty vague, topic 14 is obviously about intense, embodied emotion. The memoirs of former slaves are of course deeply emotional – that shouldn’t surprise anyone. But the fact that the computer picked up on that emotion so quickly and clearly – and weighted it so heavily – gives me hope that this could be a worthwhile project.

My goal here isn’t necessarily to make a novel historical argument, although that would be a nice bonus. Instead, I just want to get familiar with this software and learn how to best tailor it to a specific data set. How does MALLET work, and what pitfalls do I need to be aware of? How many topics do I need to achieve a good model, and how do I find that number? What are the benefits of modelling paragraph-by-paragraph rather than memoir-by-memoir? How might I factor in temporal data – like date published – to these texts? What other parameters do I need to include in my commands? How do I best interpret the data MALLET gives back to me? How do I model and visualize that data?

I have no idea what the answers are to any of these questions, but I’d like to find out.

Folklore and the Fear Factor: The Evolution of Legends in the Era of Reddit

In the era of technology, modern medicine, and science, the concept that people still believe in, share, and adhere to folklore might sound absurd. Take, for instance, the story of the Pied Piper of Hamelin. The story of a colorfully dressed rat catcher, hired by the town of Hamelin, who plays his flute, entrancing the pests and leading them out of the town. When the town refused to pay for his services, however, the Piper used his flute to lure a new set of victims: the town’s children. Lured by his tune, the children left town and vanished never to be seen again. By today’s standards, this story sounds more than a little odd, the type of tale that would be unlikely to pass the test of time as it once did. However, if you dig more deeply into that story, a truth unfolds.

Pied Piper of Hamelin rendition, copied from the glass window of the Market Church in Hamelin.

While the rats were a later addition to the story, one common truth remained: a stranger came to town, and left with the children. In 1227, approximately 50 years prior to the story in Hamelin, the Holy Roman Empire and Denmark fought in a battle that pushed back Danish borders. Colorfully dressed Roman salesmen, often called “locators,” travelled the land to find skilled men and women to move north to protect the Empire’s new borders. For obvious reasons, this was a hard sell. For towns like Hamelin, losing skilled laborers could put the town at risk. As a result, it was common practice to sell or give away children to this cause when locators came into town. For Hamelin, the tracing of surnames to new towns proves the less savory version of this folktale: a town made the collective decision to sell their children to locators to ship off to new towns. From there a collective story was constructed as a way to cope with their actions for years to come, and the Pied Piper was born.

Much like those that came before us, humans still tell stories to make sense of the world. Most especially, we continue to be drawn in by stories of tragedy, of what hides in the dark, or what steals our children. Our modern legends can be traced in figures such as the Slender Man. Slender Man, an unnaturally thin and tall humanoid creature, is said to stalk, abduct, and traumatize it’s victims, usually children or young adults. His story began on the Something Awful forum, with a couple of doctored photos, but those on the forum (and on other forums, such as Reddit and 4chan) began adding narrative and visual art, building a mythos of Slender Man.

The legend increased in popularity, showing up first in video games, blending into traditional popular culture, and then movies. Unfortunately, much of this limelight was a result of a 2014 tragedy, when two 12 year old girls lured their friend into the woods and stabbed her as an “offering” to Slender Man. Their actions, as awful as it may seem, continue to show the pervasive power of folklore in the modern era.

Film poster for Slender Man Movie, released 2018

While the original Slender Man story proliferated on a pre-Reddit site, there is little doubt that Reddit has become a breeding ground for modern day folkore. Subreddits such as r/creepypasta, r/nosleep, r/letsnotmeet, and more have acted as a space for entire communities built around the purpose of creating, sharing, and commenting on scary stories.

For now, my primary question remains: when we compare these stories against more traditional folklore, what role does a medium such as Reddit or TikTok play in the creation and proliferation of folklore? And in the era of science and technology, are we somehow more beholden to these stories than ever before?

In my project, I am hoping to explore some of the most popular subreddits and examples of modern folklore, examining how the medium of social media plays a part in the creation and proliferation of folklore. Without our knowledge, have these stories become even more important to our societies than the folktales we believe we have left behind?

For now, I will look at examples such as Slender Man (and other creepypasta figures) and trends such as Randonautica to track how they show up in social media (most likely using tools such as Voyant, Google n-gram, and topic modeling programs where possible). From there, I will attempt to assess the role these platforms play in the potency of the stories told, as well as assessing the lasting power of the legends in the context of “virality” and the fleeting nature of trends online.


Blank, Trevor J., and Lynne S. McNeill. “Introduction: Fear Has No Face: Creepypasta as Digital Legendry.” In Slender Man Is Coming: Creepypasta and Contemporary Legends on the Internet, edited by Blank Trevor J. and McNeill Lynne S., 3-24. Logan: University Press of Colorado, 2018. Accessed February 24, 2021.

Manhke, Aaron hosts, “A Stranger Among Us,” Lore (podcast). December 28, 2015. Accessed February 24, 2021.


Print Project Proposal: How do people rate history?

When I first began working at historic sites, nobody talked about Yelp.

The rise of the online review aggregator has been felt across virtually every sphere of commercial activity. Sites like TripAdvisor and Yelp provided a platform for people to write and publish reviews of everything from dentists (true story: I receive regular emails from my dentists’ office asking me to help “get the word out” by posting a good review online) to fancy restaurants, often semi-anonymously. Some sites offered the ability to provide a mere rating, zero to five stars, without requiring any additional explanation. Paul Ford described the Internet as a customer service medium, not a publishing medium, and nowhere is this more evident than the places on the Internet explicitly designed to solicit the input and feedback of customers.

Review sites have become powerful, I suspect, because of their perceived power and influence over the decisions of potential consumers. An online review is publicly accessible from anywhere with an available internet connection, and the websites often have mobile-compatible websites or dedicated apps to allow the perusing and posting of reviews from smartphones. Additionally, most review sites loudly proclaim that they do not allow paid reviewers to post – the implicit assumption being that the reviews found on TripAdvisor might be more honest and accurate because they are voluntary acts performed by “regular” people, instead of curated reviews written by paid professionals who might be bought or influenced by the place under review. For someone in an unfamiliar place, checking TripAdvisor might be the only way to have the feeling that you’re getting a real sense of the area.

Many historic sites rely on the income generated by admission fees and store revenue to fund their operations. A drop in overall visitation can have a serious impact on a site’s ability to hire staff, plan and present programming, and perform necessary preservation and maintenance. Both sites were also in relatively isolated areas, not near major cities. They couldn’t rely on the kinds of visitors who might see a sign on the road and decide to check out the site on impulse; they needed people to make deliberate plans to visit (and spend money) at the site in order to maintain continued financial health. Word-of-mouth was seen as paramount in motivating those visits. If people who visited had positive experiences, they would tell other people, and then those people would visit and have a positive experience, and so on. At both of my most recent places of employment, high-level staff obsessively checked sites like Yelp and TripAdvisor, along with the reviews written through Google Maps, to find out if we were successfully generating that positive word-of-mouth.

For my project, I propose to study the content of the reviews posted about two sites: Colonial Michilimackinac and Fort Mackinac, both part of Mackinaw State Historic Parks in northern Michigan. The two sites have some key differences that will make comparing their reviews interesting: Colonial Michilimackinac is a reconstructed 18th century fortified trading post on the mainland just off a major interstate highway, while Fort Mackinac is a partially-preserved 19th century fort on Mackinac Island, accessible only by ferry. I would like to see what things are common to both positive and negative reviews of the two sites, and where the feedback from visitors differs. The project promises to provide some very useful knowledge pertaining to visitor experience: knowing what sort of experiences stick in the minds of visitors long enough to make it into a TripAdvisor review can help a historic site present visitors with programming and interpretation that does the job of teaching them about the history of the site in memorable ways.

Print Project Proposal: Mapping Queer D.C.

In June 2017, the Kate Rabinowitz partnered with the Rainbow History Project (RHP) to launch the virtual map called, “Places and Spaces.” This interactive map charts locations significant to the LGBTQ community in Washington, D.C. since the 1960s. Hosted by RHP’s online archives, people from around the world can scroll through decades of D.C.’s LGBTQ history, click on individual pins implanted on Google maps, and search through the RHP archives of oral histories and digitized material for more information about a particular location.

This software is reminiscent of Philadelphia’s Philaplace application, although instead of embedding photographs and ephemera into the map, Places and Spaces offers metadata that describes the nature of the establishment (i.e. bar, health center, book store) with the dates of operation and the gender/ethnicity of the core clientele. Anyone is welcome to submit suggestions for additional points on the map, allowing amateur historians and community members to contribute.

Touching on four of Roy Rosenzweig and Dan Cohen’s seven qualities of digital media, this map of queer spaces contributes to the accessibility, diversity, interactivity, and manipulation of digital data to study memory of a marginalized community. Through Places and Spaces, users are able to manipulate the map to view the fluctuation of queer gathering places from decade to decade, highlighting disparities in the community’s public spaces as it responded to changes in the D.C. environment.

As D.C. gentrifies, housing for low- and middle-income residents has become scarce, causing overwhelming rates of displacement (DC Curbed reports a 10% decrease in families living in the district with incomes under $35,000/year). LGBTQ establishments are not impervious to new development and augmented rent, and the map reveals a sharp decline in public queer spaces beginning in the late 1990s and early 2000s. There are many theories towards what has contributed to the decline of “gayborhoods”–the AIDS crisis, the digital culture of dating apps, and increasing assimilation must also be held accountable. However, I plan to focus on the effects of gentrification on physical LGBTQ spaces by comparing the statistics and maps tracing change in D.C., as collected by Governing Magazine, DataLensDC, and an app created by the Urban Institute called “Washington, D.C.: Our Changing City.”

The purpose of this study is to 1) further document a history that has been ignored or intentionally erased, and 2) identify core causes of disappearing spaces and its impact on the present community. Although a far cry from the longue duree research toted by Jo Guldi and David Armitage in The History Manifesto, the availability of these interactive maps allows for the comparison between the queer spaces and gentrification over the past two decades.

Memory and Materiality: An Examination of Dear Photograph

On January 13, 2014, the Tumblr based blog, Dear Photograph reached 150,000 followers. Although the site has not been updated since last fall, its first three years of use provide a wealth of material I will use to examine how people interact with the past, form memories, and view materiality on the web. The blog of focus features digital photos taken by people of physical photos lined up with their original setting, with a caption beginning with “Dear photograph.” Meta right?

Here’s an example:


Dear Photograph,
Trafalgar Square 50 years ago and my Granny never looked happier! If my house was burning down, this would be the one possession I would be desperate to save. I miss so many things about my Granny but most of all I miss her beautiful smile.

This example combines a personal photograph and message and places it in a setting of historical significance.

Some of the other photos are inherently more personal, both in place and in subject:


Dear Photograph,
This is when I still had hair and my brother pooped himself.
We were happy, but we didn’t know it.

If you do a quick Google search for “dear photograph” you will find, beyond the actual site (and its manifestations on other social media platforms) a number of articles profiling the site and its owner/curator, Taylor Jones. None of these articles are very long or in depth. The articles focus on “New-age nostalgia” or “digital nostalgia” but few delve into the ideas of memory.

One of the few scholarly pieces that deals with memory, Dear Photograph, and that sets the frame for my study is “Remembering with Rephotography: A Social Practice for the Inventions of Memories” by Jason Kalin. This article briefly mentioned Dear Photograph as part of a larger set of websites involved in “rephotography,” or retaking the same photograph in the same place at a different time to show change. Kalin argues that the way we share digital photos on the web  and use rephotography changes the way we remember things. Its application in a digital social environment allows users to “follow in the footsteps of previous walkers while simultaneously making that walk their own, thus producing a collective text, a collective, public memory of place that responds to past, present, and future.” In essence, these images are not only a way of remembering the past but are a means to create new memories, in a dialogue more public than ever before. This study will build off Kalin’s ideas as well as the general literature about memory to examine how Dear Photograph in particular reveals the changing nature of memory in the digital environment.

A piece in the New Yorker demonstrates another side to Dear Photograph, saying that “the project is a powerful reminder that digital photos can’t ever quite duplicate how it feels to hold a timeworn, sun-bleached, wrinkled old family photo in your hand.” This sentence gets to the heart of ideas espoused by Matt Kirschenbaum in Mechanisms when he discusses how the digital is often associated as something inherently not physical. Dear Photograph represents a juxtaposition of the nostalgia for the materiality of analog photographs while putting these objects within the structure of the new media that replaced them. Looking at these ideas and those of memory outlined above, I question, do memory and materiality relate to one another? Is Dear Photograph an attempt to adapt the memories associated with tactile feel to the digital environment? Through the examination of the content of images and text in the posts of Dear Photograph, I hope to answer these questions and reveal how this platform relates to the way we form memories in the digital age.