Without looking, who wrote this post? What’s my gender? What decade was this written in?
Ok, you probably know the answer to the last question and as an aspiring historian you are more than likely used to looking for author names before you begin reading. And if you know my name, you should probably know the answer to the second question.
But what if you hadn’t read my name before clicking on the post? Would you be able to guess my gender just by reading this post? WordPress is 14 years old, for all you know this post predates even WordPress and was simply transferred to this blog from elsewhere.
What if a computer could answer each of those questions, and more, simply through a comparative analysis of this text. Well if you have the right data set, it can.
Matthew Jockers’ Macroanalysis takes us through the trials and tribulations of literary analysis utilizing digital tools. He outlines his process of analysis, discusses statistics, and reviews the applications, both practical and potential, of computer-assisted data-mining of massive databases of text. His case study of 19th century literature provides insights into how a computer can determine styles, nationalities, themes, and even genders of authors given the proper comparative material.
So what is macroanalysis anyways?
For decades scholars have relied upon interpretations of representative works to build understandings of periods, genres, and historical patterns but the digitization of large quantities of written works allows them the opportunity to draw from entire libraries instead of just small samples. Rather than “close reading” which limits scholars to consuming only one book at a time “distant reading” entails the analysis of hundreds, thousands, or even hundreds of thousands of books simultaneously. This “macro”-analysis takes the big picture, the one with the rolling hills and endless sea in the background, instead of the small picture, the close-up, the selfie if you will.
Macroanalysis is not intended to replace its micro counterpart but to complement it. Where microanalysis is necessary for scholars to develop interpretations of works and to get into the nitty gritty, macroanalysis provides less specific but broader information.
What is it good for?
Macroanalysis has a variety of uses in the ever evolving academia of the 21st century. Maybe you are interested in the Irish-American experience of the 19th century and want to know more about patterns of self-identification and societal perception of the Irish in America. Thanks to macroanalysis Jockers is able to tell us quite a bit about how Irish authors as a whole approached their ethnic and cultural heritage instead of trying to extrapolate from one or two authors deemed representative.
Or, perhaps you would like to conduct a comparative cultural analysis through the use of language. Jockers is able to use macroanalysis for this purpose as well. For example, Jockers compares the language usage of Irish authors and English authors during the time period to determine that the collective English conscious was more confident as evident through more frequent usage of “absolute” and “determinant” words whereas the Irish used more words that indicated imprecision and indeterminacy.
Finally, computers are quite good at paying attention to the things that humans subconsciously, and incorrectly, determine inconsequential. How many times in this post have I written “the”? Have I used it any more or any less frequently than the average writer? You probably would have trouble determining but with a computer this kind of attention to detail is trivial. And yes, “the” can matter a great deal when analyzing language and culture. Jockers uses “the” to differentiate American from British writers.
What can’t we do with it?
Jockers is careful throughout his book to maintain that macro and micro analysis must work in tandem, not that macro should replace micro. And he provides good reason for it.
Computers lack human interpretative abilities. Computers are incapable of comprehending underlying themes or piecing together the context in which the words extracted exist. In this sense macroanalysis is little better than a glorified word cloud generator. The information that it does provide allows us to uncover new evidence for theories and to approach old material in new ways but computers do not understand the information so much as relay it.
Additionally, we are not going to blow anyone’s minds with the evidence gathered through macroanalysis. This tool excels in providing evidence for what most people already believe to be true. Until there is significant advancement in the technology available to us macroanalysis is not going to be responsible for the next major shift in how we do History.
Why isn’t everyone doing it yet?
Now that our eyes have been opened to this great new tool for conducting analysis we’re left wondering, “why isn’t everyone doing this yet?” Well, there are several reasons.
First, there are inaccuracies. Although this method is right more than it is wrong (assuming a proper database is provided), it still can be wrong. Macroanalysis struggles to make sense of transitionary literature and literature that is indistinct. These two categories are often lumped together by computers due to their not belonging to any strict category.
Second, it is as subject to bias as traditional analysis and interpretation, just because the data is presented as numbers does not mean that humans were not responsible for the selection of texts drawn from. Throughout his book Jockers conducts in-depth analysis of Irish-American writers but clearly states in his introduction that he has excluded popular writers because their work does not directly address their Irishness.
Finally and perhaps most importantly, using macroanalysis for any topic within the past century is not only difficult but impossible in many cases. You can thank Disney and what’s called the Mickey Mouse Protection Act for this. Copyright laws in the United States are essentially endless which means that the vast majority of literature is locked behind physical or digital paywalls.
Also worth noting is that classified material presents complications for many historians. The information contained within classified government documents would undoubtedly provide an entire generation of graduate students with dissertations but for better or worse all we can do is wait and hope that information valuable to historians eventually becomes worthless in the eyes of the government.
So now what?
If you are lucky enough to pursue a research topic with a significant amount of digitized material, then get to it. Don’t be afraid to use computers for more than searching texts for key words and phrases, use them to compile sets of data drawn from an entire libraries’ worth of books.
For the rest of us we have to pick up the fight to make as much information accessible and digital as possible. In the meantime we have to make do with the data that we have, constructing data sets without the missing source material.
Macroanalysis and you.
If you had access to the kind of database that Jockers had, how would you use it?
Jockers is a literary historian and as such is focused primarily on english literature, what unique benefits does this method of analysis provide historians of other persuasions?
In a world of ever increasing material to work with despite legal obstacles, how do you determine what should be included and what not (social media, email communications, blog posts, etc)?