Week 4: Macroanalysis, an Analysis

Without looking, who wrote this post? What’s my gender? What decade was this written in?

Ok, you probably know the answer to the last question and as an aspiring historian you are more than likely used to looking for author names before you begin reading. And if you know my name, you should probably know the answer to the second question.

But what if you hadn’t read my name before clicking on the post? Would you be able to guess my gender just by reading this post? WordPress is 14 years old, for all you know this post predates even WordPress and was simply transferred to this blog from elsewhere.

What if a computer could answer each of those questions, and more, simply through a comparative analysis of this text. Well if you have the right data set, it can.


Matthew Jockers’ Macroanalysis takes us through the trials and tribulations of literary analysis utilizing digital tools. He outlines his process of analysis, discusses statistics, and reviews the applications, both practical and potential, of computer-assisted data-mining of massive databases of text. His case study of 19th century literature provides insights into how a computer can determine styles, nationalities, themes, and even genders of authors given the proper comparative material.

So what is macroanalysis anyways?

For decades scholars have relied upon interpretations of representative works to build understandings of periods, genres, and historical patterns but the digitization of large quantities of written works allows them the opportunity to draw from entire libraries instead of just small samples. Rather than “close reading” which limits scholars to consuming only one book at a time “distant reading” entails the analysis of hundreds, thousands, or even hundreds of thousands of books simultaneously. This “macro”-analysis takes the big picture, the one with the rolling hills and endless sea in the background, instead of the small picture, the close-up, the selfie if you will.

Macroanalysis is not intended to replace its micro counterpart but to complement it. Where microanalysis is necessary for scholars to develop interpretations of works and to get into the nitty gritty, macroanalysis provides less specific but broader information.

What is it good for?

Macroanalysis has a variety of uses in the ever evolving academia of the 21st century. Maybe you are interested in the Irish-American experience of the 19th century and want to know more about patterns of self-identification and societal perception of the Irish in America. Thanks to macroanalysis Jockers is able to tell us quite a bit about how Irish authors as a whole approached their ethnic and cultural heritage instead of trying to extrapolate from one or two authors deemed representative.

Or, perhaps you would like to conduct a comparative cultural analysis through the use of language. Jockers is able to use macroanalysis for this purpose as well. For example, Jockers compares the language usage of Irish authors and English authors during the time period to determine that the collective English conscious was more confident as evident through more frequent usage of “absolute” and “determinant” words whereas the Irish used more words that indicated imprecision and indeterminacy.

Finally, computers are quite good at paying attention to the things that humans subconsciously, and incorrectly, determine inconsequential. How many times in this post have I written “the”? Have I used it any more or any less frequently than the average writer? You probably would have trouble determining but with a computer this kind of attention to detail is trivial. And yes, “the” can matter a great deal when analyzing language and culture. Jockers uses “the” to differentiate American from British writers.

What can’t we do with it?

Jockers is careful throughout his book to maintain that macro and micro analysis must work in tandem, not that macro should replace micro. And he provides good reason for it.

Computers lack human interpretative abilities. Computers are incapable of comprehending underlying themes or piecing together the context in which the words extracted exist. In this sense macroanalysis is little better than a glorified word cloud generator. The information that it does provide allows us to uncover new evidence for theories and to approach old material in new ways but computers do not understand the information so much as relay it.

Additionally, we are not going to blow anyone’s minds with the evidence gathered through macroanalysis. This tool excels in providing evidence for what most people already believe to be true. Until there is significant advancement in the technology available to us macroanalysis is not going to be responsible for the next major shift in how we do History.

Why isn’t everyone doing it yet?

Now that our eyes have been opened to this great new tool for conducting analysis we’re left wondering, “why isn’t everyone doing this yet?” Well, there are several reasons.

First, there are inaccuracies. Although this method is right more than it is wrong (assuming a proper database is provided), it still can be wrong. Macroanalysis struggles to make sense of transitionary literature and literature that is indistinct. These two categories are often lumped together by computers due to their not belonging to any strict category.

Second, it is as subject to bias as traditional analysis and interpretation, just because the data is presented as numbers does not mean that humans were not responsible for the selection of texts drawn from. Throughout his book Jockers conducts in-depth analysis of Irish-American writers but clearly states in his introduction that he has excluded popular writers because their work does not directly address their Irishness.

Finally and perhaps most importantly, using macroanalysis for any topic within the past century is not only difficult but impossible in many cases. You can thank Disney and what’s called the Mickey Mouse Protection Act for this. Copyright laws in the United States are essentially endless which means that the vast majority of literature is locked behind physical or digital paywalls.

Also worth noting is that classified material presents complications for many historians. The information contained within classified government documents would undoubtedly provide an entire generation of graduate students with dissertations but for better or worse all we can do is wait and hope that information valuable to historians eventually becomes worthless in the eyes of the government.

So now what?

If you are lucky enough to pursue a research topic with a significant amount of digitized material, then get to it. Don’t be afraid to use computers for more than searching texts for key words and phrases, use them to compile sets of data drawn from an entire libraries’ worth of books.

For the rest of us we have to pick up the fight to make as much information accessible and digital as possible. In the meantime we have to make do with the data that we have, constructing data sets without the missing source material.

Macroanalysis and you.

If you had access to the kind of database that Jockers had, how would you use it?

Jockers is a literary historian and as such is focused primarily on english literature, what unique benefits does this method of analysis provide historians of other persuasions?

In a world of ever increasing material to work with despite legal obstacles, how do you determine what should be included and what not (social media, email communications, blog posts, etc)?

18 Replies to “Week 4: Macroanalysis, an Analysis”

  1. Getting Jockers’ perspective as a literary scholar, rather than a historian, was interesting. If anything, literary analysis is even more resistant to being quantified and made systematic than history is, but Jockers makes some strong points about how macroanalysis of texts can function alongside more traditional close reading.

    To your question about defining the scope of the content in an analysis, I wonder if that process is easier in some ways for literary scholars than historians. After all, the entire scope of their study is by definition always the printed, published word, which might not be true in history.

    1. I think it is interesting that you state literary analysis is more resistant to quantification, why is that? In my mind it is actually more difficult for historians as the source base varies so much more. Historians have to determine how different types of sources relate to one another adding an additional level of complexity. For example, if I have a stack of diaries from the Civil War for a research project and another stack of newspaper articles from the period, I need to decide how to quantify the information from each source without over-emphasizing or diminishing either.

      One of the main problems with quantification is the removal of context and I think that poses just as many if not more issues for historians than it does for literary scholars.

  2. Interestingly, the concept of metadata and having these databases is proven to be incredibly helpful to literary scholars, and can even make the case to be helpful to historians. However, the connection between this computational analysis program and digital humanities was a bit of stretch. How can we link this analysis that Jockers provides us with what we are learning in this class about becoming digital historians? How can we take what Jockers has given us and see where it takes us in the future?

    1. I think that quite a bit of what Jockers discussed is relevant to us as potential digital historians. One aspect in particular that I would emphasize is the necessity to be skeptical of data. Our world is increasingly understood through terms of big data and statistics and we need to be careful to include context if we are presenting work and to seek out context if we are the audience. For example, we talked about the usefulness of projects like Digital Harlem in class previously which provides a great model of marriage between macroanalysis and careful interpretation of context.

  3. I think the great thing about the use of big data for analysis is the fact that you don’t have to know what it is you’re looking for or how to look for it; if you just play with the data a little some startling things just pop out at you.

    For example, when Jockers demonstrated that usage of the word “the” waxed and waned both in British and American literature simultaneously, you can’t help but wonder how the heck something like this happens. Like you mentioned in your post, computational analysis can’t provide much in the way of interpretation– it won’t be useful in deciphering meaning and doesn’t help you connect with specific historical events or actors– but it does reveal trends, and trends have their own meaning. Analysis of this type then can be wonderful starting points for new and unique research.

    For posterity, what this means is that we’re currently amidst the wave of a new research methodology for the discipline, one that alters the course of current methodologies.

    1. I think you bring up a great point about not having to know exactly what you’re looking for before you start. Messing with data can be a fun way to find oddities and those oddities can be the basis of very interesting research.

      This new wave of research methodology is rather interesting but I think may pose some problems as well. As a discipline, even as a species, I think we need to be careful with how we approach data. As the numerous petitions to ban dihydrogen monoxide prove data is easily manipulated and can be taken out of context. So while the numbers may be true interpretations of those numbers aren’t necessarily any more accurate than a “close reading” not backed by statistics.

  4. I thought Jockers provided an interesting perspective as a literary scholar and showed the usefulness of using a database to sort through thousands of digitized sources to see trands, which would not be practical by hand. However, a historian would need to set parameters in order to avoid information overload. To your question about deciding what sort of material to include and exclude in a search, I think it entirely depends on what the scholar is looking for. For example, Bevins limited to specific newspapers and time period. Scholars could easily look at social media posts of certain people, maybe limiting themselves to certain websites or time frame, and look for speech or reference patters. What may be useful for one scholar, may not be useful for another. Depending on the work and what the scholar is interested in finding, it up to the individual scholar to set his or her own parameters and decide what is relevant and what is not.

  5. I think Jocker’s macroanaylsis will be utilized by historians in the future to sort through social media aka there will definitely be historians who specialize on Donald Trump’s Tweets. The internet offers an abundance of opinions. Databases such as the one Jockers used can help take the “temperature” of a population’s standings on issues and events in both quantitative and qualitative ways. Just as computer programs now can differentiate spam in emails, I think databases will improve to sort information in similar ways. New technologies will allow us to store a seemingly endless amount of information, so the question is not about inclusion but rather exclusion from the story we are trying to tell. This could have larger implications if sources are twisted to prove a historical point, but historians shift through sources in similar ways to support their own arguments. Therefore, it is our duty to facilitate the conclusions drawn with a healthy amount of skepticism.

    Additionally, I couldn’t help but see parallels of macro/microanalysis with arguments surrounding longue durée/specific histories (for lack of a better term). These methods of analyzing the past all must work in tandem as Jockers and Guldi & Armitage respectively suggest.

  6. This is a really great summary of Jockers’s book. Your opening made me think a lot about how and why computers can analyze and categorize texts based on time, gender, etc. That this is possible really shows that language is a product of its time (which most of us probably would say is obvious) but also that part of the construct of gender is “gendered” use of language, which is an interesting and–I think at least–under-discussed phenomenon.

    An area where macroanalysis could be especially interesting in the future is in the analysis of web-based writing, especially from a cultural studies perspective. Fan fiction especially would make an interesting topic of study, and since most pieces are written under screennames or pseudonyms it would be interesting to see if macroanalysis can properly draw conclusions about the genders and other information of the authors based on the types of words they use.

  7. It is particularly interesting to me that Jockers is a Literary scholar because of the practicum I will be doing tonight on Voyant Tools. This is a website devoted to the comparative analysis of texts and its main examples were for literature rather than historical examples. It has a fully functioning example available for the works of Jane Austen and the works of Shakespeare. While it is interesting to look at this analysis between texts it really doesn’t provide and valuable connections, in my opinion, because it lacks context. Until methods of macro-analysis can become more advanced I do not see how we can glean anything truly valuable.

    1. In what ways do you think macro-analysis needs to be more advanced before anything of value can be gleaned?

      I was convinced by Jockers’ argument that current technology can be useful in at least limited ways, such as an analysis of the use of certain language. Particularly in cultural histories knowing how events are referred to can provide valuable insight into cultural perceptions. For example, in Kielman’s “A Misplaced Massacre” macroanalysis could have played a helpful role in his discussion of how the perception evolved from “battle” into “massacre”

  8. While Jockers was a literary scholar, macroanalysis has just as much to offer historians as it does scholars looking at literary trends. When attending the ASEEES Conference in Chicago a few months ago, I was amazed at how many presentations relied on exactly this type of macroanalysis. One presenter in particular caught my eye when she presented a map of the former USSR with population data from every GULAG. Her intent was to map population data through a ten-year period, to track migration/fluctuation in those registered within the GULAG system. It was clear the amount of work she had put in to come to an illustrative conclusion. Aside from her actual analysis, her presentation also illustrated a generational divide in the audience that highlighted the evolution of the field of history. Digital components and macroanalysis are very much present within the humanities, and while there are caveats as to the pros and cons of such analysis (as Jockers points out throughout the book), it remains crucial to the forward progression of history as a field of academic and scholarly study.

    1. Did the presentation rely entirely on macroanalysis or did she include significant context and “close reading” interpretation to accompany it? Did you find the presentation lacking if not?

  9. Jockers makes an interesting point about macroanalysis, and how useful it can be to discover larger historical trends quickly and efficiently. I can’t help but be reminded of the Digital Harlem sight we spoke about last class, where the author was able to discover new insights of race relations based on the sheer quantity of data provided. However I agree with many of the others who have commented that we need to be weary of the context. Data without context can be misconstrued and it will be a historians job to put that context back into place while presenting large themes to the general public. While macroanalysis will be a great tool for historians who work with digital content, we need to be aware of the different factors that can change the way a source is used such as gender, race, age, and location. I predict that digital programing tools such as Python will be used increasingly by historians to extract this more specific data while working with large macro data sets like social media.

  10. In response to Blake’s question, how would you use Macroanalysis?, I see Blevin’s article “Space, Nation, and the Triumph of Region” as an excellent model. In the article Blevin uses Houston newspapers to track the growth of the city and what the city conceived of itself and the rest of the country. Using a database like Jockers presents in “Macroanalysis” creates an opportunity to quantify not just language but popular topics in newspapers such as events or people to understand the cultural landscape of the timeframe both at the nationwide and local level. With proper contextualization (and maybe crowdsourcing the digitization of newspaper archives), this new(ish) method of data collection can be a valuable source for cultural historians as well as other historians who want a more in depth understanding of the popular sentiment.

    1. I love that you mentioned crowdsourcing as I think that is a wonderful and necessary tool if macroanalysis is going to be a tool used for a wide range of topics. Projects like Beyond Words which was covered last week will make the data available that otherwise would consume too much time for the average research project.

      I agree with your closing statement completely, knowing the words that people use to describe events and people is an excellent way to understand cultures of the past.

  11. Reading Jockers, I could not help but feel envious, because I could quite clearly see how macroanalysis of this sort would have greatly complemented a project I worked on last semester, if only the sources I was working with had been digitized. The project in question evaluated how the US Army and Marine Corps came to evaluate the purpose of armored warfare differently, and drew upon both government documents, as well as Armored Magazine and the Marine Corps Gazette. I would have loved to have been able to go through my sources and evaluate which words were being used to describe the important capabilities of tanks, and how successful tank actions were described.

  12. Jockers is a literary historian and as such is focused primarily on english literature, what unique benefits does this method of analysis provide historians of other persuasions?

    Plenty of others have said something similar, but I could imagine this tool being used by legal historians and historians of social movements. In the first, language to describe crime and criminality evolve over time. As other authors this week point out, we create with our language the world in which we live. The changing of language to talk about criminal activity may bring some insight to the way people thought about certain crimes, perhaps even before the laws were changed. Similarly, the language we use to describe different groups of people (like “negro” vs “Afro-American” vs “black” and “homosexual” vs “queer” vs LGBT+). These words (and their changing meanings over time) would be an interesting thing to analyze in order to see how attitudes change towards target groups over time and how they see themselves.

Leave a Reply

Your email address will not be published. Required fields are marked *