Google Ngram

Google Books has currently digitized 15 million books (and counting), or 12 percent of all books that have been published in history. While impossible to read this vast amount of literature, the tool Google Ngram allows any user with the click of a button to search linguistic trends spanning centuries.

Created by a team of researchers centered at Harvard, Google Ngram uses 5,195,769 books (roughly 4 percent of all that have been published) to conduct quantitative analysis of their contents, referring to this as “culturomics.” The concept and interface of Google Ngram is easy to use.

Below is a search for the word “history” in American English. Ngram allows you to switch between different languages like Russian, Spanish, or Chinese. Ngram also allows for comparisons between American and British English, and regular English and English fiction, which can make for fascinating results. Ngram contains the option “English One Million” which narrows your search to 6000 books per year from 1500 to 2008 for a more focused search. The user has the option of selecting from which years to search. Ngram gives you the option of “smoothing” the yearly results – which streamlines your results by averaging the occurrences of your search in the years immediately before and after each date on the results graph. I found the default smoothing of three to be effective. Perhaps someone with a better grasp of statistics could better explain this to the class?

It seems there’s more interest in “history” in the late 1990s than in the past 200 years!

Ngram also gives the user the opportunity to download the datasets on which it is built. For each language, the datasets for each “gram,” 1-5 terms long may be manipulated for further experiments. Although I had some difficulty doing this, it seems admirable that the creators of Ngram provide this to the public. Currently, the datasets available are from July 15, 2009, when Ngram was first created.

Ngram opens fascinating possibilities of research to historians. As mentioned in the Michel, et al. article, Ngram gives the opportunity to track the usage and evolution of words through printed history, and even government censorship, as in Nazi Germany. Below is another Ngram I ran comparing the use of the terms “USSR,” “Cold War,” and “Nuclear” in American English. It is interesting to see that by 2000, after the fall of the Soviet Union, the term Cold War has surpassed the other two.

After having experimented with this tool, what are your impressions? Are there shortcomings to how Ngram can be of use to historians? Ngram does acknowledge that information before the 1700s can be skewed because few books were published during this time. How do you envision using a tool like Ngram in your projects in the future?

3 Replies to “Google Ngram”

  1. Scott,

    After trying a few searches with Ngram, I think it can be very useful in searching for one word or terms. I searched “Republican, Democrat” and the results for “republican were significantly higher, but when I added “party” to them, “democratic party” the results were reversed. Like any major search engine, one of the shortcomings is a search term being too broad or too specific but I think that Ngram is still a very cool and interesting tool for historians doing research on places, events, people, etc.

    One of the shortcomings is the inability to choose a specific country for search results. There are various types of English listed, but French and Spanish are one category. There are different versions of French and Spanish spoken around the world and with so many books and words in each language, it could be quite difficult to add all of them but I wonder if that is something the creators viewed as an issue. If a historian was researching terms in Haitian French this program might not be as effective as researching terms or words in books from France.

    I like the option to click on a time period to see what titles and texts had the search term in them. The one setback to this is that they are listed on google like search results so it appeared difficult to further narrow them down to a specific year or years.

    Overall, I think this program can and will be very useful to historians in the future as we’re faced with data overload. Continuing to add books from the past and present will be difficult but it already has a wide variety that can be a great starting point for research.


  2. Scott,

    I see the biggest drawback to Google Ngrams being that the researcher doesn’t have full control over the corpus that he or she is searching. At an academic level, most Historians have a very focused set of texts that he or she might be examining. Data mining those texts could prove interesting, but the likelihood that those specific set of texts are already loaded into the Google database isn’t very high.

    On the other hand, if a researcher takes the time to comb through what texts are in the corpus he or she would be using, it can show trends on a macro level. I feel like macro level trends are the Holy Grails and Fountains-of-Youth to the history field. They are extremely attractive, everyone wants to be able to find one, but they are rarely predicted accurately and tend, after being debated to death, to fall out of fashion after a while. (Personal rant, my apologies) A tool like Google Ngrams allows researchers to trace word use and popularity through a huge chunk of material. Although assigning meaning to the results of these searches will always involve some degree of interpretation, it’s nice to have another tool in the historian’s toolbox.

    For example, last semester’s “Enlightenment in the 18th-century Caribbean” class, taught by Professor Shelford, had a strong focus on the concepts of commerce, trade, and economy as they were discussed by French intellectuals in and around the Enlightenment. If you search the google Ngram French corpus during the enlightenment, the term “Commerce” skyrockets during the second half of the 18th-century into the beginning of the 19th-century. It showed a trend that we had discussed all semester-long in one easy-to-read graph. It also showed that, beyond what we were able to read in class, these concepts of commerce and trade popular during the period. And if something was popular, it’s probably worth researching.

    Maybe Ngrams would be a quick and dirty way of brainstorming future research topics?


  3. Kelly & Laura,

    I think you both made great points about the potential shortcomings of Ngram for historians. I also encountered similar difficulties as you did Kelly in searching for terms – some worked and others did not. Certainly Ngram does excel at showing the macro cultural trends, but at some point all historians focus on the specifics of what it is they research. Overall, I feel Ngram can be of some service for historians.

Leave a Reply

Your email address will not be published. Required fields are marked *