Digital Searching by Michaela Fehn

(Michaela’s had some trouble accessing the blog. Here is her post for this week!)

So we’re talking data and mining this week. We’ve got a great lineup up scholarly pieces, so we’re jumping in.

The History of Walking and the Digital Turn: Stride and Lounge in London, 1808–1851” by Joanna Guldi

Searching is Selection. Scholars have to be creative when searching. Guldi argues that true scholarly work is in the nuance. In fact, Google Books, as well as other search engines, offer a curated list. The question scholars must ask is How can I create nuanced search words that curate a list of diamonds in the rough?

As historians, it becomes vital to find common links – as databases become more interconnected and search engines evolve, results will become varied. For now, a good historian will follow a trail and find hidden gems. Like Guldi, we must ask How do certain words or jargon lead to different results? How can understanding the language within sources or research topics make a difference?

“Space, Nation, and the Triumph of Region: A View of the World from Houston” by Cameron Blevins

Space and Place are difficult notions – even for a seasoned historian. SO what are these notions made of? Well, according to Blevins, space is ever-changing. It’s dynamic and usually associated with processes of power. Place is built on locations. There’s often an emotional response to place. Space and Place can have tension. But what happens when the two coexist within literature? Well, that’s what Blevins is exploring.

So, Blevins discusses in depth this terminology called “distant reading.” For Blevins, this looked like noting the frequency of place names mentioned in a newspaper over a prolonged time period. By noting this, Blevins is able to blend a digital reading with a traditional reading, better understanding the political implications of space. This leads the reader to ask How can distant reading enhance other projects? Are there other types of projects where distant reading can be helpful? 

“Topic Modeling Martha Ballard’s Diary” by Cameron Blevins

Blevins again with some fantastic lessons to learn. 27 years worth of diaries leaves the reader to wonder: How is Blevins going to comb through that vast number of articles? Most important to note is Blevins’ use of topic modeling. And again, a question emerges – What is topic modeling? Topic modeling according to Blevins is “a method of computational linguistics that attempts to find words that frequently appear together within a text and then group them into clusters.” The software, specifically Mallet, produces a list of topics with words within each topic based on how the word is used rather than what a word means. This gets the audience to ask Does the software relate topics to each other? Can the software work in other languages? And how does a software like Mallet affect sources other than diaries? Would it be helpful for archives of other types?

“Digital Visualization as a Scholarly Activity” by Martyn Jessop

Graphic aids aren’t new. But digital technology enhances them. Jessop raises these three questions: What role have visualizations played in humanities scholarship in the past? If the majority of images in print are to be regarded as ‘illustrations’ what is the distinction between ‘visualization’ and ‘illustration’? How has the emergence of digital media affected the development of visualization?

By looking at visualization within the digital humanities, Jessop begins thinking about these questions. He notes that there are different categories visualization affects: space, quantitative data, text, time, and 3D visualization. This asks the audience to think How can different types of visualization affect different projects? When are the best times to bring visualization into a scholarly subject? And which category is most compelling based on Jessop’s article?

We’re going to look at some of these questions in class as we tackle talking about digital data, searching, and mining. Until then, happy reading.

Week 3 Readings: Data Feminism

Data Feminism by Catherine D’Ignazio and Lauren F. Klein is a readable and thorough entry into how data science needs feminism and how feminists and scholars can use data science to further their goals. Each chapter focuses on one of D’Ignazio and Klein’s seven principles of data feminism.

1. Examine Power

Data science is deeply influenced by unequal power structures, or matrices of domination. Readers are encouraged to ask, “Who?” when thinking about data collection and analysis: Who is doing data science? Who benefits from data science? And whose interests and goals are being served by data science?

By asking “who” questions, we can spot gaps in data collection and analysis and begin to fill those gaps.

2. Challenge Power

D’Ignazio and Klein offer four methods of challenging unjust data science:

  1. Collect: Compile counterdata.
  2. Analyze: Audit algorithms.
  3. Imagine: Imagine a future of co-liberation.
  4. Teach: Engage and empower people to use data science as a tool.

As part of their “imagine” method, the authors also advocate for a shift from data ethics, which tends to frame problems as the result of a few “bad apples” and technological glitches, to data justice, which acknowledges that injustice is structural.

Table 2.1 From data ethics to data justice

Concepts That Secure Power
Because they locate the source of the problem in individuals or technical systems
Ethics
Bias
Fairness
Accountability
Transparency
Understanding Algorithms

Concepts that Challenge Power
Because they acknowledge structural power differentials and work toward dismantling them
Justice
Oppression
Equity
Co-liberation
Reflexivity
Understanding history, culture, and context
Table 2.1 presents principles of data ethics alongside alternative, parallel concepts of data justice (60).
Why is the shift from data ethics to data justice so radical?

3. Elevate Emotion and Embodiment

Data science is weighed down by the false binary of reason vs. emotion. As historians, though, we know that there is no such thing as a neutral perspective. Instead, the feminist approach to data science is to embrace emotion and affect as a valid type of data.

4. Rethink Binaries and Hierarchies

False binaries and unjust hierarchies lead to flawed classification systems that overlook or discriminate against certain groups. Problems with classification must be evaluated on a case-by-case basis. Ethical solutions might include adding categories to a classification system, making certain data categories optional, or avoiding gathering some types of data in the first place.

How data is presented is just as important as how it is categorized. Feminist approaches to data visualization, like Amanda Montañez’s infographic on gender and sex in the Scientific American, can challenge false binaries.

5. Embrace Pluralism

Traditional data science focuses on clarity and control, sometimes to the detriment of minoritized voices. Data cleaning is sometimes necessary to prepare data for computational analysis, but it can also enact epistemic violence, perpetuating unjust hierarchies by separating data from their context.

Feminist data scientists, on the other hand, embrace multiple perspectives. Focusing on team projects and community-driven work can give us better, more complete information than the work of a single individual.

What does embracing pluralism look like in digital/public history? What are the benefits? The challenges? Are there any situations in which we should reject pluralism?

6. Consider Context

Data is meaningless without context. In this chapter, D’Ignazio and Klein coin the term Big Dick Data to refer to “big data projects that are characterized by masculinist, totalizing fantasies of world domination as enacted through data capture and analysis” (151). Big Dick Data projects overstate their scope and importance and ignore essential context. These inaccuracies can in turn lead to massively erroneous reporting, like in this FiveThirtyEight article on kidnappings in Nigeria.

Data are never raw. They are inherently cooked by their sociopolitical and historical context, and that context is essential to accurate data collection, interpretation, and visualization. Institutions need to invest significant funding into documenting, restoring, and communicating context, especially in instances involving discrimination and inequity.

What might “big dick history” look like? Can you think of any examples?

7. Make Labor Visible

Much of the effort goes into data science is invisible labor, paid, underpaid, and unpaid. Data feminism requires that we make labor visible and always give credit where credit is due.

What are some ways labor can be hidden in academia and public history? How do we rectify this?

Conclusions

D’Ignazio and Klein’s data textbook is built on a foundation of Black feminism, an intersectional ideology that prioritizes humanity and process over profit. This is a great and easy intro into data science for humanities scholars and into feminist thought for data scientists. It’s a long read, but well worth the journey. In class, we’ll think about how we can apply these principles to digital history projects. Happy reading!