Digital Searching by Michaela Fehn

(Michaela’s had some trouble accessing the blog. Here is her post for this week!)

So we’re talking data and mining this week. We’ve got a great lineup up scholarly pieces, so we’re jumping in.

The History of Walking and the Digital Turn: Stride and Lounge in London, 1808–1851” by Joanna Guldi

Searching is Selection. Scholars have to be creative when searching. Guldi argues that true scholarly work is in the nuance. In fact, Google Books, as well as other search engines, offer a curated list. The question scholars must ask is How can I create nuanced search words that curate a list of diamonds in the rough?

As historians, it becomes vital to find common links – as databases become more interconnected and search engines evolve, results will become varied. For now, a good historian will follow a trail and find hidden gems. Like Guldi, we must ask How do certain words or jargon lead to different results? How can understanding the language within sources or research topics make a difference?

“Space, Nation, and the Triumph of Region: A View of the World from Houston” by Cameron Blevins

Space and Place are difficult notions – even for a seasoned historian. SO what are these notions made of? Well, according to Blevins, space is ever-changing. It’s dynamic and usually associated with processes of power. Place is built on locations. There’s often an emotional response to place. Space and Place can have tension. But what happens when the two coexist within literature? Well, that’s what Blevins is exploring.

So, Blevins discusses in depth this terminology called “distant reading.” For Blevins, this looked like noting the frequency of place names mentioned in a newspaper over a prolonged time period. By noting this, Blevins is able to blend a digital reading with a traditional reading, better understanding the political implications of space. This leads the reader to ask How can distant reading enhance other projects? Are there other types of projects where distant reading can be helpful? 

“Topic Modeling Martha Ballard’s Diary” by Cameron Blevins

Blevins again with some fantastic lessons to learn. 27 years worth of diaries leaves the reader to wonder: How is Blevins going to comb through that vast number of articles? Most important to note is Blevins’ use of topic modeling. And again, a question emerges – What is topic modeling? Topic modeling according to Blevins is “a method of computational linguistics that attempts to find words that frequently appear together within a text and then group them into clusters.” The software, specifically Mallet, produces a list of topics with words within each topic based on how the word is used rather than what a word means. This gets the audience to ask Does the software relate topics to each other? Can the software work in other languages? And how does a software like Mallet affect sources other than diaries? Would it be helpful for archives of other types?

“Digital Visualization as a Scholarly Activity” by Martyn Jessop

Graphic aids aren’t new. But digital technology enhances them. Jessop raises these three questions: What role have visualizations played in humanities scholarship in the past? If the majority of images in print are to be regarded as ‘illustrations’ what is the distinction between ‘visualization’ and ‘illustration’? How has the emergence of digital media affected the development of visualization?

By looking at visualization within the digital humanities, Jessop begins thinking about these questions. He notes that there are different categories visualization affects: space, quantitative data, text, time, and 3D visualization. This asks the audience to think How can different types of visualization affect different projects? When are the best times to bring visualization into a scholarly subject? And which category is most compelling based on Jessop’s article?

We’re going to look at some of these questions in class as we tackle talking about digital data, searching, and mining. Until then, happy reading.

Introduction: McKenna Crews

Hello! Sorry I am so late to the party but I am here now so buckle up!

My name is McKenna Crews and I am a first year MA Public History student. I was born and raised in the Buckeye state and moved to Indiana to triple major at Ball State University for Social Studies Education, History, and Public History with minors in African American Studies and Political Science. I am also currently an 8th grade Civics teacher for Fairfax County Public Schools at Lake Braddock Secondary School. My history interests include focusing on the Gold Coast of Africa and the slave trade that took place there. I received an award at the Ball State University History Conference last year for best undergraduate paper for my thesis on this topic. I am interested in museum education as well and how to make it more accessible to students in low socioeconomic status school districts.

My entire family is made up of teachers and I grew up around education and its importance to future generations. My dad teaches 8th grade US history and is my biggest influence on becoming a historian and educator. I gained my passion for working with low income students while working at a summer camp back home in Ohio for kids with severe medical disorders who received treatment from Cincinnati Children’s Hospital as well as working with children in the foster care system and in homeless shelters throughout the greater Cincinnati, Ohio region. I went to this camp as a kid because I had cancer and it changed my life making me want to be a counselor. I worked there for 5 years working my way up to Leadership and Development Coordinator working with young adults who wanted to gain leadership experience. At camp, I realized the limited access to hands on education in schools due to funding. I made it my mission to learn as much as I could to be able to help combat this and help children and families have access to museums that had entry fees. Here I am at summer camp where my campers decided to stick a plethora of glow sticks into my hair, including a unicorn horn on top obvi!

My goal in this class is to be able to bring my knowledge of classroom education, public history, and digital networks to help make museums more accessible to students and teachers especially during the pandemic. How can I upload a virtual tour? How can I still create hands on learning in the history classroom? Can I make interactive lesson plans for teachers to follow based off of museum exhibits and curriculum? Being able to combine these passions and make history more accessible is so significant in working with low SES communities to create a partnership between the institution and the local school districts in the area.

Week 3 Readings: Data Feminism

Data Feminism by Catherine D’Ignazio and Lauren F. Klein is a readable and thorough entry into how data science needs feminism and how feminists and scholars can use data science to further their goals. Each chapter focuses on one of D’Ignazio and Klein’s seven principles of data feminism.

1. Examine Power

Data science is deeply influenced by unequal power structures, or matrices of domination. Readers are encouraged to ask, “Who?” when thinking about data collection and analysis: Who is doing data science? Who benefits from data science? And whose interests and goals are being served by data science?

By asking “who” questions, we can spot gaps in data collection and analysis and begin to fill those gaps.

2. Challenge Power

D’Ignazio and Klein offer four methods of challenging unjust data science:

  1. Collect: Compile counterdata.
  2. Analyze: Audit algorithms.
  3. Imagine: Imagine a future of co-liberation.
  4. Teach: Engage and empower people to use data science as a tool.

As part of their “imagine” method, the authors also advocate for a shift from data ethics, which tends to frame problems as the result of a few “bad apples” and technological glitches, to data justice, which acknowledges that injustice is structural.

Table 2.1 From data ethics to data justice

Concepts That Secure Power
Because they locate the source of the problem in individuals or technical systems
Ethics
Bias
Fairness
Accountability
Transparency
Understanding Algorithms

Concepts that Challenge Power
Because they acknowledge structural power differentials and work toward dismantling them
Justice
Oppression
Equity
Co-liberation
Reflexivity
Understanding history, culture, and context
Table 2.1 presents principles of data ethics alongside alternative, parallel concepts of data justice (60).
Why is the shift from data ethics to data justice so radical?

3. Elevate Emotion and Embodiment

Data science is weighed down by the false binary of reason vs. emotion. As historians, though, we know that there is no such thing as a neutral perspective. Instead, the feminist approach to data science is to embrace emotion and affect as a valid type of data.

4. Rethink Binaries and Hierarchies

False binaries and unjust hierarchies lead to flawed classification systems that overlook or discriminate against certain groups. Problems with classification must be evaluated on a case-by-case basis. Ethical solutions might include adding categories to a classification system, making certain data categories optional, or avoiding gathering some types of data in the first place.

How data is presented is just as important as how it is categorized. Feminist approaches to data visualization, like Amanda Montañez’s infographic on gender and sex in the Scientific American, can challenge false binaries.

5. Embrace Pluralism

Traditional data science focuses on clarity and control, sometimes to the detriment of minoritized voices. Data cleaning is sometimes necessary to prepare data for computational analysis, but it can also enact epistemic violence, perpetuating unjust hierarchies by separating data from their context.

Feminist data scientists, on the other hand, embrace multiple perspectives. Focusing on team projects and community-driven work can give us better, more complete information than the work of a single individual.

What does embracing pluralism look like in digital/public history? What are the benefits? The challenges? Are there any situations in which we should reject pluralism?

6. Consider Context

Data is meaningless without context. In this chapter, D’Ignazio and Klein coin the term Big Dick Data to refer to “big data projects that are characterized by masculinist, totalizing fantasies of world domination as enacted through data capture and analysis” (151). Big Dick Data projects overstate their scope and importance and ignore essential context. These inaccuracies can in turn lead to massively erroneous reporting, like in this FiveThirtyEight article on kidnappings in Nigeria.

Data are never raw. They are inherently cooked by their sociopolitical and historical context, and that context is essential to accurate data collection, interpretation, and visualization. Institutions need to invest significant funding into documenting, restoring, and communicating context, especially in instances involving discrimination and inequity.

What might “big dick history” look like? Can you think of any examples?

7. Make Labor Visible

Much of the effort goes into data science is invisible labor, paid, underpaid, and unpaid. Data feminism requires that we make labor visible and always give credit where credit is due.

What are some ways labor can be hidden in academia and public history? How do we rectify this?

Conclusions

D’Ignazio and Klein’s data textbook is built on a foundation of Black feminism, an intersectional ideology that prioritizes humanity and process over profit. This is a great and easy intro into data science for humanities scholars and into feminist thought for data scientists. It’s a long read, but well worth the journey. In class, we’ll think about how we can apply these principles to digital history projects. Happy reading!

Practicum: Voyant Tools

What is Voyant?

Voyant is a text analysis tool that allows a user to interact with the text at a more molecular level. It has an easy-to-use format that invites audiences of all types (professional scholars to students) to study and interpret a text in a different way.

Getting Started

There are three different ways to interact with Voyant. Upon first entering the site, users are greeted with a simple text box where they can copy and paste their text. Scholars may also choose to either interact with an already analyzed source (many Classics such as Shakespeare or Jane Austin are available to open) or upload their own from dcouments on their PC. For my example analysis, I copy and pasted the link to each chapter of D’Ignazio and Klein’s Data Feminism into the text box.

This is the base dashboard that will display every time a new corpus is created. Users can change and interact with the data from here.

The dashboard is highly customizable and offers a wide array of analytical tools from a word cloud to a scatter plot to the bottom right-hand corner where it took the most frequent word (data) and displays the context for every time the word appears. Users can choose from other options such as a word tree, “terms berry,” mandala, or correlations to name a few. By clicking on a specific word in the word cloud, for example, the text in the middle highlights all the usage of that word within the document for easy finding.

Interacting with the data

Voyant allows users to group words into defined categories. The default categories are “positive,” which is all the words that possess positive connotations (enjoy, confidence, superior), and “negative,” which is words that possess negative connotations (depression, suffer, sad). By clicking the “define options for this tool” switch at the top righthand corner of the Cirrus, users can edit the words they deem most important and define new categories and parameters for their text (see below).

There is also an option in the top right-hand corner to click “Features” where you can choose a colour for each category. Once everything is saved, the word cloud should change from its original, random color (see the second picture on this post), to a word cloud categorized by defined colours (see below). For reference, I made “positive” words red, “negative” words blue, “uncategorized” words tan, “people” words purple, and “city” words green.

Tools

There is no shortage of possibilities with Voyant as the browser-based tool offers multiple analytical options that range from simple word association to intricate spider webs of interlinked, repeated themes. To change the display to one of the 27 other options, there is a symbol that looks like the Microsoft logo in the upper right-hand corner by the “Options” switch that I discussed earlier.

A summary of the corpus in its entirety featuring average words, readability, vocab density, and length by document; as well as distinctive words that were prominent in their given chapter, but not the corpus as a whole.
“Terms Berry” associates a word with the frequency of contextual phrases.
“Dreamscape” where, if a user hovers over a dot, they can see where the location / associated word appears in the text and why it was placed at that specific spot on the map.
“Links” takes highly repeated words and displays how they are attached to other highly repeated words.

Help and Support

It’s important to note the level of user assistance that Voyant offers. Their site is easy to get started with, but there is a myriad of options and features that might go overlooked without a user’s due diligence or assistance from the source. Voyant’s help page is an extremely in-depth step-by-step guide to every feature that the browser offers. There are pictures, examples, and separate categories for every feature to allow for ease of access to the help as well as a streamlined appearance.

Voyant Support

Practicum: Google N-Gram

An Easy and Unique Research Tool

Google NGrams Home Page https://books.google.com/ngrams

Google N-Gram is a search engine that allows users to explore words or phrases that appear in books ranging from 1800-2019. Users have the option to change the language and dates they are searching within. Results are shown on a graph, providing users with a visualization of the frequency a word or phrase has been used over time. The case sensitivity can be altered as well as the measurement of the frequency by percentage.

Searching “Malcolm X,” from 1950-2019 in American English gives us the following results. This search shows that in American English literature, Malcolm X was at peak popularity in 1970, 1995, and 2012.

Users can also combine words and phrases to get advanced results. For example, searching “Malcolm X, Black Power,” gives us the following results showing the trends of both subjects in comparison to one another.

Both subjects follow a similar trend initially, however, “Malcolm X” spikes in the early 1990’s and goes back down in the late 2010’s, while “Black Power” steadily increases. After being presented with the graph, users can scroll down and choose from a selection of books organized by groups of dates.

Clicking on “1971-2006” for Malcolm X provides us with an abundance of books on Malcolm X published during that time period. Google N-Gram is a great resource for anyone researching a topic (or topics) who wants to know when that topic(s) was the most popular. Whether you are a seasoned researcher, or just getting started, Google N-Grams provides users with an easy and unique option.