Google Ngram Viewer

Google Ngram Viewer is a digital tool like Voyant and the Time Magazine Corpus, which compiles all books from 1500 to 2019 in several languages. In this search engine, a word, phrase, or date can be entered and a graph of information tracking the frequency of the searched term in literature. Partial phrases completed by using “*” in place of the missing words result in different lines on a graph representing the top substitutions. The tool uses lines on a graph to track the frequency of the searched item through history with the X-axis representing time and the Y-axis representing the percentage of the searched item in literature. The filters can be edited to search American English, British English, and other languages and will provide different results instantly.

This way researchers can track the popularity of the searched item. The results will immediately clue the researcher in on the time periods most relevant to their research and how it faded from popularity. The searched time periods can also be edited to narrow the search to a custom set of years. The ability to search multiple items against each other is a great, easy way to compare and contrast the popularity of each item over time.

On the bottom of the page is an “interesting” year range the connects the user with results from Google Books. This way the user can track the texts relevant to their search in the time period they desire. This digital tool is an important resource to researchers looking for texts to access related to their field of research. The popularity of the searched item also informs the user of the societal shifts surrounding the searched item. This data alone can be valuable to support the researcher’s argument. It enables researchers to ask questions about the differences in results based on the filters, for instance, the difference in results when searching “Nursery Schools” in British English and American English. This clues the researcher in on the different levels of focus in societies around the world on the same subject. The researcher has access to start their search in a subject by looking at it through time and locations. Other ways researchers can narrow their search is by adding “_VERB” or “_NOUN” to a word like “Tackle”. Searching the verb will result in different results than the noun and the search can be changed to a list of tags found on the “About Ngram Viewer” link. Here researchers can educate themselves about what the search can do, how it can be used, and what they’re looking at means.

Data Analysis: Distant reading, Text Analysis, Visualization 

This week’s readings are on the computational analysis of texts and the interpretations of abstractions of those texts. The readings are a broad scope of subjects and formats that induce the reader to consider how the author is visualizing their subjects with different methods of presentation, be it a book, article, blog, etc. The readings discuss both the capabilities that data science and digital history provide, and also the potential weaknesses and injustices they may carry.

D’Ignazio and Klein, Data Feminism

The first reading is the book Data Feminism by Catherine D’Ignazio and Lauren Klein. To quote the authors, “Data Feminism offers strategies for data scientists seeking to learn how feminism can help them work toward justice, and for feminists who want to focus their efforts on the growing field of data science. But Data Feminism is about much more than gender. It is about power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed.” It is a work based in intersectional feminism focused on both how data science needs feminism and how data science can be used by feminists. The book is divided into seven principles of Data Feminism.

Principle 1: Examine Power.
This first principle exhorts readers to examine who holds power in data science at its very foundation, and how the structures of power continue to affect the discipline in the present. The authors refer to the concept of the matrices of domination, which reveal the ways that power is entrenched in data science and in the general world. Examples of this power in data science include the story of an MIT grad student who, after discovering that a facial recognition service could not see her black face even when it could see her white colleagues, investigated the foundational elements of the service. She found that the data underlying the service was based on overwhelming numbers of white subjects, which naturally led the service to be simply unable to recognize any nonwhite faces. This is examining power and its effects.

To restate the question of the authors: What other power structures in data science do we take for granted, and who established them?

Principle 2: Challenge Power.
This second principle logically follows the first when it is accepted as a premise: If there are power structures that favor particular groups of people over others, they ought to be challenged by anyone who can. The authors demand that their readers shift their entire conceptions of power and responsibility into what is best exemplified by their table that describes the difference between data ethics and data justice. Data ethics are founded on the idea that the system is generally good and problems and injustice only occur when individuals act unethically. The authors charge their readers with denouncing this worldview as complicit with the injustice itself and to move beyond ethical concerns to focus on justice, which understands the system to be hopelessly corrupted by the structure of power that yields inequality.

Is this radical change necessary? Are there any downsides to such absolutism?

Principle 3: Elevate Emotion and Embodiment.
This principle, put succinctly, asks data scientists to reject the idea that emotion and embodiment do not belong in data science. According to the authors, emotion is an essential feature of a feminist approach to data science.

Principle 4: Rethink Binaries and Hierarchies.
In this principle the authors ask data scientists to examine how cultural preferences and conceptions extend even into the realm of data science. For a truly intersectional feminist approach to data science, the authors say one must be willing to attack these cultural issues at every level. To them, one must see every structure in data science through the lens of postmodern gender theory and strive to remake the field in accordance with that philosophy.

Principle 5: Embrace Pluralism.
This is a fairly simple and uncontroversial principle: the most holistic and complete knowledge comes from the synthesis of multiple perspectives and different methods of data compilation. The specific idea that the authors articulate is that data scientists who analyze a data set they are foreign to, perhaps because they have no personal experience with the people or place being studied, are most at risk of missing essential elements of the data necessary for good interpretation.

Principle 6: Consider Context.
Again, this is a very benign sounding principle. The authors say that data is never totally neutral, and always carry some context that will provide more insight. As intersectional feminists, they go further and clarify that the context almost always indicates some prior unequal social relationships, and it is obligatory to discover these relationships. They use an example of a mistake 538 made about kidnappings in Nigeria where the reporter misread a data source and published a story indicating that there were hundreds of kidnappings in Nigeria per day, which was just flat out wrong.

Principle 7: Make Labor Visible.
This principle states that much of the labor done in data science is invisible, meaning it is uncredited, unpaid, or underpaid. To bring about a feminist revolution in data science it is compulsory that all labor is visible, credited, and properly compensated.

Is this problem of invisible labor only limited to fields like data science?

Guldi, The History of Walking and the Digital Turn: Stride and Lounge in London, 1808–1851
The next reading is by Joanna Guldi about the plight of digital historians in the era of databases. She first explains how the advent of digital historical databases have created new opportunities for research, but also great opportunities for overconfidence in the new technology. She shows how limited and biased most of the search engines are by demonstrating where their sources usually prioritize the voice of the privileged and are often constrained by the limits of the underlying software, which is not perfect, and the curation of results by humans. Her point is that nuance, patience, and a healthy skepticism of results are all necessary to safely navigate the brave new world of digital history. Although she begins by showing how flawed many contemporary datasets are, she does also argue that if one calibrates their search according to the specific needs and idiosyncrasies of the data, useful information can be obtained. Thus, historians must be more vigilant, not less, when using the extraordinarily powerful digital tools at their disposal and always remember that databases and word searches are not infallible oracles.
What is the safest way to use historical databases? How does this relate to data feminism?

Bevins, Space, Nation, and the Triumph of Region: A View of the World from Houston
Continuing on the theme of extraordinarily powerful digital tools, Cameron Bevins discusses how the difficulties of studying large-scale concepts like space, nation, and place in relation to specific topics can be dealt with by a tool known as “distant reading.” For example, if one was studying utopianism, instead of scanning a particular 19th century book for passages on the subject one could use digital tools to examine hundreds of 19th century texts on utopianism to get a “distant read” on the subject. This is something historians could never practically do before, and if it is combined with traditional reading, it can provide amazing new works of scholarship. Of course, the problems articulated by Guldi still apply. So,
What are the dangers of distant reading? Can it be done in a diligent and nuanced way?

Blevins, Topic Modeling Martha Ballard’s Diary

This final text looks at the problem of analyzing a large volume of text, specifically a woman’s diary. In the past, it would have been necessary for someone to qualitatively analyze this text with huge generalizations, considering the sheer vastness of the source material. Now, however, Blevins shows how one can use the power of topic mining to analyze the text in a quantitative way. With a little bit of work to establish topics of a diary entry, the computer program can nearly instantly analyze the diary and present its contents in useful graphs and statistics. For example, once you explain to the program everything that would fall under the topic of “church” like worship, sermons, the pastor, etc, the program can find and enumerate every time she makes a diary entry on the topic of church. This can be anecdotally tested by creating the topic “cold weather” and having the program graph the frequency of the topic in her diary by the months of the year. Sure enough, she wrote about the topic of cold weather far more in the winter months than in the summer. Amazing.

What are the broader implications of topic modeling in historical research? What are the potential weaknesses of such an approach?

Time Magazine Corpus

The Time Magazine Corpus digital tool compiles all Time magazine articles since 1923 and analyzes the changes in how the English language has been used and changed over time. This tool reveals how society and culture influenced trends in language with examples of words like “flapper”, “global warming”, and “hippy”. It’s easy to see how each of these words is associated with a certain period of time and how language changes with time. Additionally, researchers can view how parts of words have been used through time such as “-gate” as in “Watergate”, “-aholic” as in “shopaholic”, and common parts of words like “-dom” as in “Kingdom”.

Time Magazine Home Page

By simply typing a word into the search bar section titled “Chart”, an analysis of how often that word has been used in Time broken down by decade and its frequency is demonstrated by the varied shades of color. By clicking on the decade you want to analyze, results will show you the instances and issues the searched word appears. This could come back with over 100,000 results so the more specific and distinct a word the better results you’ll get.

Text analysis broken down by decades and then years

For example, typing the word “Mustang” in the chart section of the search bar quickly provides data on when the word is most frequently used. Not surprisingly, we see rises in frequency during the 1940s, when the P-51 Mustang was the most popular fighter of World War II, and in the 1960s, when the classic Ford Mustang was in its heyday. But the research doesn’t stop there, an additional click on the decade will show a second result broken down by each year in it. Here, we can see how in 1944 the word “Mustang” was most frequency used during the heart of World War II before dramatically dropping off in the post-war years. One more click will bring you to the lines in the issues published with “Mustang” in it displaying how useful this tool would be in researching this subject. However, by clicking the “List” part of the search bar, the results will take you directly to all the lines the word was used throughout history with no analysis of frequency through time. A third option is to click the ”Collocate” section of the search bar which allows you to search two words and get results to the times they were used near each other further narrowing results. By searching “Mustang” and “Germany”, you may be surprised to learn the two words only appear once near each other in a 1942 issue.

Thankfully, this tool acts as a Control+F search for the entirety of the Time magazine collection. I acclimated myself to this tool and found its usefulness far quicker that the Voyant tool because it is easier to navigate. The “Help” page was shorter and more concise providing examples of how the tool can be used and links to search result examples to demonstrate how the directions can be applied. One drawback for this tool is the need to register with an account and link your account with your university where as Voyant did not require that to use the tool.

Tour “Help” Page

Like the Voyant Digital Tool, the Time Magazine Corpus is hard to navigate and even harder to understand how to use. A very simple site and slow to load, the tool offers unbiased and accurate results that can be used to help researchers know where and when to look. Researchers now have the ability to contextualize language and pinpoint the areas they should look into. A very useful tool to analyze cultural shifts through language, this tool can clue us in on how fickle language is.

Voyant Tools

Voyant Tools is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public.

At first glance, it is unclear what this research tool is, how to use it, and what to take from it.

Voyant Site Home Page

After confusingly typing a word into the text box and clicking reveal, I found myself even more at sea by all of the analysis available if the site is used properly. The “Help” button is represented by a questions mark link that will redirect you to another overwhelming page of information seen below (Just look at all of those files to sift through). Any new user to this tool will undoubtedly need to spend some time navigating the site and learning all the potential text analysis available. The site is not particularly pleasing to the eye (Though I am not tech savvy so take what I say with a grain of salt).

Although, once the user inputs a URL into the initial text box (the way I should’ve started), a flurry of text analysis and data is immediately available. I copied and pasted the URL to the Wikipedia page on the P-51 Mustang as an example. Here you can view the frequency of key words in the text in the form of a word cloud, in graphs displaying in which parts of the text the word appears most, and a flurry of other available data with multiple ways of viewing that data.

Again, to really understand how to properly use this tool to its full potential, you’ll have to spend a bit of time reading through the descriptions of each option available to understand how to apply this site towards your research. Thankfully, the “Help” page is thorough and any information you may need can be easily accessed.

Researches may be interested in creating a “corpus”, which is a set of documents or URLs analyzed together. The tool is meant for humanities scholars to quickly analyze several texts by revealing trends, similarities, and distinctions. Hopefully, it will direct the scholar to ask questions of what the analysis can tell you, but its primary function is to be used as a tool for exploration and to assist with interpretative practices. The digitalization of history will save scholars time combing through countless texts and sitting in archives or libraries with a stack of books. The tool can also be used in different languages so it is not limited to just English. According to the tutorial and workshop page, an extensive workshop on how to use this tool can take an entire day signifying how complex accurately using this tool really is.

I believe there is a bit of irony in the digitalization of history as I chose to study humanities because the STEM field was always my weakest subject, and yet, I am again trading hardback books for digital tools. Despite my best efforts, technology is working its way back into my studies!

Introducing Patrick Sullivan

Hello there. My name is Patrick Sullivan and I am from the North Shore of Massachusetts. I attended Gettysburg College and graduated last year. I am in the first-year of the general history master’s program and chose AU because I wanted to be closer to politics and history. During my time in D.C., I hope to take advantage of the opportunity of being close to so many archives, museums, and other important historical locations.

My research interests are very diverse but mainly incorporates modern history, specifically international relations, foreign policy, and imperialism. For my undergraduate thesis paper, I wrote about the covert imperialism of the Eisenhower Administration in regards to Cuba after its revolution in 1959.

Other non-academic interests of mine include following and playing sports such as soccer and golf. For soccer (or football), I support Manchester United and was lucky enough to meet some of the players (Marcus Rashford and Jesse Lingard) when seeing a friendly match against Real Madrid in Miami. I also was able to study abroad in England at Lancaster University during the fall of 2020 and traveled to Liverpool and the Lake District.

Lake District in Cumbria in Northwestern England

For this digital history course, I hope to learn about the ways in which digital technologies are influencing the work of history and its eduction. Despite being relatively-well versed in technology and having a passion for history, I have not explored and researched tools and other aspects of digital history studied in this course. Besides expanding my knowledge of digital history, I wish to understand how artificial intelligence, such as ChatGPT, can present advantages and disadvantages towards the study of history. In using ChatGPT these past couple weeks, I have been pleasantly surprised and even astounded at its capabilities in exploring and explaining historical subjects. Despite this, I have also seen potential problems such as its training data only being up to 2021 and the fact its data remains limited to what it has been trained on and cannot source from the great breadth of information from the internet. This presents consequences for people using it as a tool to study and learn about history, especially for younger students who may believe uncritically a biased or inaccurate answer.

As for the graduate program, I hope to not only explore my research interests but also grow as a historian. I believe the skills used to study history (research, reading, critical thinking, writing, etc.) are relevant to a wide variety of profession but also that these skills help in improving as a person in society in general. The influx of social media combined with the ever more complex, modern world has made the skills mentioned above important to understanding “hot-button” or more nuanced issues and topics. This graduate program, as having already seen in the first semester, will help me towards that growth. Beyond this, I hope to go into the Foreign Service or a similar profession that works to understand and improve relations between nations and other trans-national problems. My mind is open to other paths, though, as I have many interests and am really just here for the ride.