Dude, Where’s My History?: A Look at Historical Mapping Interfaces

The advent of digital technology allowed a greater exchange of knowledge and ideas to enter homes at an astonishing new level. This change brought information and services straight to users that before may have required someone to actually leave their home to seek it. The advancement of mobile computing technology furthered the trend of information coming directly to people but without restricting its access in one physical place. Many cultural heritage institutions have noticed these changes and adapted to become not only places that house information, but resources that increasingly push it directly to their patrons wherever they may be. The affordances of this new media also allow institutions to bring their materials into geographic space, adding another layer of interpretation and context while bringing to the public’s attention that history is all around us.

Histories of the National Mall

One site that takes advantage of mobile application and a spatial understanding of history is Histories of the National Mall created by the Roy Rosenzweig Center for History and New Media run using our old pal, Omeka. Taking their own advice from their report Mobile for Museums, the site is device independent, made to run on a web browser allowing for use across desktop, laptop, and mobile and is not a native downloadable app that needs tailoring for each device. As the title indicates, the site is an interface for learning about the histories of the national mall through maps, explorations (short examinations based on questions people might have about the mall), people, and past events. Most of these sections can be filtered into different historical periods. Some of my favorite sections, and much to my chagrin,  are the great explorations of unmade designs of the national monuments. There are also a number of scavenger hunts that send you to a specific part of the mall and have images of places for you to find. Once you find the images, you tap or click them and can read or listen to more about it.

Histories of the National Mall Map

The key feature of this site is the map, which has over 300 points containing historical information, audio, video, images, and documents. The user can filter by each of those categories as well as by place and event. As stated above, the site is web browser based and largely looks the same when using on a desktop/laptop or a mobile device. Using GPS, Histories of the National Mall centers the map on the user’s coordinates and locates them within historical context. What is good about the map is that there are no set way to explore the points, you can wander around and discover new facts and events that shaped the environment all around. This allows the user to set their own narrative in a serendipitous combination of explorations.


Aris Games

While Histories of the National Mall is a ready made site, Aris Games is both an open source application to create geographically based games and a mobile app to play the games. The back end is not the scary coding or programming that some in the cultural heritage sector may fear, but a simple interface so even those without the technical skills can make the games with the infrastructure invisible to them. One downside to the Aris created games not encountered in the mall histories site is that the mobile app is only available on Apple products and has a much more limited audience because of it.


The Aris editor interface to create is simple but it is by no means easy to understand without first reading the manual or viewing the helpful video tutorials on certain topics. It is important to understand the different elements (especially non-obvious ones such as scenes, plaques, and locks) and how they function so you can create a working game. The games are largely tours or explorations of certain areas. Building a game is based on creating “scenes” or different scenarios that the user can encounter as they travel around. You can make conversations for the user to have at each location that can lead them further into the game. All of the features you create can be mapped to a certain location to create an exploratory geographic environment. This feature is unfortunately cumbersome to use as the only way to find your points is through precise GPS coordinates or by dragging the point to where you want with no way to search for your general location so you can get there quicker. Also there is no way to see how your game will look in app without having and opening the app. Since I have an Android device, I needed to borrow an iPhone to do this. Despite these drawbacks, Aris editor is a good way to make games without requiring programming experience.

Aris Editor

Playing the games is fairly simple but, as mentioned above, does require downloading their Apple based app. Inside the app you can play any number of games created with the editor. You can either find  games based on your geographic location, sort by popularity, or search for a specific title. Aris provides a demo that will give you a good overview of what it is like to play these games (avert your eyes if you dislike semi-obsolete media):

Overall, National Histories of the Mall and Aris Games are good examples of the creative ways spatial history and mobile technology can work together to engage the public. By embracing this new trend and the ubiquity of mobile phones, institutions will add layers of meaning, attract a wider audience than before, and bring content out from behind closed doors.


Google Ngram

Google Books has currently digitized 15 million books (and counting), or 12 percent of all books that have been published in history. While impossible to read this vast amount of literature, the tool Google Ngram allows any user with the click of a button to search linguistic trends spanning centuries.

Created by a team of researchers centered at Harvard, Google Ngram uses 5,195,769 books (roughly 4 percent of all that have been published) to conduct quantitative analysis of their contents, referring to this as “culturomics.” The concept and interface of Google Ngram is easy to use.

Below is a search for the word “history” in American English. Ngram allows you to switch between different languages like Russian, Spanish, or Chinese. Ngram also allows for comparisons between American and British English, and regular English and English fiction, which can make for fascinating results. Ngram contains the option “English One Million” which narrows your search to 6000 books per year from 1500 to 2008 for a more focused search. The user has the option of selecting from which years to search. Ngram gives you the option of “smoothing” the yearly results – which streamlines your results by averaging the occurrences of your search in the years immediately before and after each date on the results graph. I found the default smoothing of three to be effective. Perhaps someone with a better grasp of statistics could better explain this to the class?

It seems there’s more interest in “history” in the late 1990s than in the past 200 years!

Ngram also gives the user the opportunity to download the datasets on which it is built. For each language, the datasets for each “gram,” 1-5 terms long may be manipulated for further experiments. Although I had some difficulty doing this, it seems admirable that the creators of Ngram provide this to the public. Currently, the datasets available are from July 15, 2009, when Ngram was first created.

Ngram opens fascinating possibilities of research to historians. As mentioned in the Michel, et al. article, Ngram gives the opportunity to track the usage and evolution of words through printed history, and even government censorship, as in Nazi Germany. Below is another Ngram I ran comparing the use of the terms “USSR,” “Cold War,” and “Nuclear” in American English. It is interesting to see that by 2000, after the fall of the Soviet Union, the term Cold War has surpassed the other two.

After having experimented with this tool, what are your impressions? Are there shortcomings to how Ngram can be of use to historians? Ngram does acknowledge that information before the 1700s can be skewed because few books were published during this time. How do you envision using a tool like Ngram in your projects in the future?

The “True” Corpus of American English

In an analysis of the Corpus of Historical American English (COHA) and Google Books, Professor Mark Davies of Brigham Young University studies the effectiveness of both engines and their ability to properly read the English language. Both are corpuses of the American English language, but Google Books has 500 billion words compared to COHA’s 400 million. Davies argues that although COHA has a significantly smaller database, the trending patterns still mirror those of Google Books. Because of this, Davies argues that COHA is actually the more effective corpus—a smaller database means far fewer data to sift through and that means quicker search results and faster information.

COHA’s “toys” are what make it a more useful database in Davies’s eyes. While Google provides the same basic function (showing the frequency of word usage throughout the decades), COHA is able to track concepts, related words, and changes in meaning. Whereas Google tends to have a one-track mind, just like it’s general search engine, COHA manages to “think” about relations for the words being placed into the search. Looking for things such as relation, form, root words, or even cultural shifts, the searches are much more comprehensive.

Design is a huge issue for some researchers, and with this in mind, Google definitely has the upper hand. True, as Davies puts it, COHA is able to effectively portray the same statistics, but I had a genuinely hard time navigating the site. Bar graphs are nice for portraying how many dark-haired, blue-eyed people are in a class, but for a corpus of American English, I found them rather ineffective. Google Books had a much more pleasing site, nicer to the eye, and easier to follow the pattern of the language. Yes, COHA’s tables are nice for their alternative searches, but as I said, traversing the site is actually rather difficult. Google provides a much more streamlined site, and provides actions that are easy to follow

Unfortunately for Google, I think the amount of words in the database have made it impossible to create proper analyses for the grammar, word meanings, and word foundations that COHA is successfully able to analyze. If Google Books created an efficient way to sift through all of that information quickly enough, then it would immediately become the preferred site. However, because of its inability to process the large amounts of data (largely its own fault) has rendered it ineffective to be a “true” corpus of the American English language.


Millions of Digitized Books, Hundreds of Fascinating Conclusions

Jean-Baptiste Michel et al.’s short and sweet article Quantitative Analysis of Culture Using Millions of Digitized Books raises a number of bold points that show just how valuable Google’s bold (and originally considered foolhardy) Google Books project has been to historians.  The project uses nearly 5.2 million books (over 4% of those written, a very significant standard) containing over 5 billion words and searches them.  Let’s pause for a minute and think about what that means.  25 years ago, or even 10 years ago, if you said you wanted a search through a sizable sample of every book ever written for certain words, you would have had your head examined.  The paper points out that it would take a human 80 years just to read all the books written in one year, 2000.  Here’s this device that can go through the entire corpus in literally quicker than a blink of the eye.  Roughly 2/3rds of the 500 billion words are in English and there’s only a significant sample size for the books 1800 and on (though there are a fair amount from 1600-1800), but even with these limitations, the work allowed the researchers to come to some bold conclusions.
“What conclusions?” you ask?  Try this one on for size: they estimate that most dictionaries might only contain as little as 52% of the living lexicon at any given moment.  They estimate the total lexicon of 1-grams (single words, excluding symbols, numbers, typoes, etc) at 544k in 1900, 597k in 1950, and 1,022,000 in 2000 (counting n-grams that are used over 1/1,000,000,000 of all English words).  Some of these are not in dictionaries due to dictionaries’ traditional dislike of compound words, but others are inexcusable (they point to “deletable” as a particularly ironic example).  This lexical “dark matter,” in their charming expression, are words that are fresh for research.  No OED biography has ever examined every facet of these words, and no amount of looking up will find them.  The n-gram has saved these potentially valuable expressions from the invisibility of  their hidden nature.


Another bold feature the n-gram allows is to trace the rise and fall of terms over time.  Much has been made of the example of the engram for “World War I” vs “Great War,” where Great War holds strong until 1939, then falls off, while World War I rises to pick up the slack, but it’s hardly the only example.  You can do the n-gram test yourself and see the decline of a good many words and phrases, and the introduction of others.  Ever been curious to see if anyone said “Yadda-yadda-yadda” before Jerry Seinfeld?  Want to map “Reality Television” vs. “Situational Comedy” and see if you can identify the year Survivor was released?  Want to compare Claude Lamarck with Charles Darwin or Karl Marx with Sigmund Freud?  The world is your oyster.


The n-gram can also detect the death of older, archaic forms of words.  “Spilled” is becoming the past tense of “to spill,” but there is no use in crying over spilt milk about it, spilt had a long run.  Contemporary spouters of aphorisms think that all that glitters is not gold, but their fathers sagely opined that all that [i]glisters[/i] is not gold.  Indeed, past tense verbs that end in “t” are fighting a slow, steady, losing battle against “ed.”  Can they survive?  I feel I’ve spoilt the ending of this struggle, but I’ve been burnt on these predictions before.


The final section of the article struck (or will it become “striked?”) a more somber note: repression.  Examining the use of the word “Trotsky” in Russian language sources through the 1920s tells a harrowing tale, but everyone expected as much.  (I wanted to run a similar test on “New Economic Policy” vs “Five Year Plan,” but, alas, I speak no Russian, and the English results are pretty meaningless).  What is more interesting is the revelation of people never before suspected of repression.  The Nazi regime’s list of degenerate artists was apparently far more extensive than generally known, as people never included in the traditional narrative saw their mentions in German press fall off the face of the earth in the late 1930s.  Again, this was just a cursory exercise: this n-gram search opens up the possibility of a new way of looking both at the more blatant Nazi/Soviet repression, and the more subtle blacklisting preferred in the West.  There are millions of possibilities that n-grams open up for these millions of books.

Database as a Genre of New Media

Lev Manovich is an accomplished thinker in the field of new media.  In his short piece, “Database as a Genre of New Media,” he makes the case that databases represent a fundamental paradigm shift in the way that people think about the organization and presentation of information.  Databases as a non-narrative, not necessarily linear way of organizing data did not originate with the digital age – they were found previously in, say, encyclopedias or photo archives – but they have experienced a renaissance in that time.  Video games, your hard drive, and the Internet are all databases, and they all represent a way to present data free of the constraints of logic and coherence imposed by the narrative form.

As Manovich puts it, “As a cultural form, database represents the world as a list of items and it refuses to order this list. In contrast, a narrative creates a cause-and-effect trajectory of seemingly unordered items (events). Therefore, database and narrative are natural enemies.”  He argues that the very term “narrative” is abused in the interactive databases of the Internet and video games, where users may respond to preprogrammed variables, whether they are hyperlinks or Koopa Troopas.  A narrative is something carefully constructed by its author constituting “a series of connected events caused or experienced by actors.”  It is careless to assume that a user will automatically derive this experience from a database without considered input from its author – narrative is  “used as all-inclusive term, to cover up the fact that we have not yet developed a language to describe these new strange objects.”

Manovich argues that since databases are free of the “cause-and-effect trajectory” of the narrative form, they can, through ever-increasingly complex organizational forms come to represent a more complete simulacrum of reality.  The implication of his vision seems to be that databases will mimic real-life systems in incredible detail – a city, a historical figure, or even a whole historical society – and users will be able to interact with these simulacrums in apparently natural, non-narrative ways.

Imagine – if, instead of writing an exhaustive three volume biography of Theodore Roosevelt, Edmund Morris had programmed the entirety of his research into an algorithm which imitated Teddy himself.  Students of history wouldn’t need to read about Teddy – they could go bear hunting with a database that simulates his appearance, his behavior, his patterns of speech in virtual reality.  In this way, they could experience the man as he was – Teddy 2.0 would not shoot that simulated bear cub either.  Am I getting this right?

Each method – narrative and database – has its own merits to recommend it, but as the genre of database evolves into ever more sophisticated forms, narrative as a construct is likely to fall more and more by the wayside in favor of organizational techniques better suited to their unique matter.

A little help – am I overstating his argument?  Missing it completely?


TIME Magazine Corpus Practical Practicum

Mike Davies’ TIME Magazine Corpus of American English is a search tool of the online archives of TIME Magazine from the 1920’s through the 2000’s.  The tool is free and can be found here.  Once you have played around on the site it will ask you to create a free username so that BYU can keep track of how the site is being used.

On the front page of the website, Davies claims, “You can see how words, phrases and grammatical constructions have increased or decreased in frequency and see how words have changed meaning over time”.  The website certainly meets the challenge of the mission statement, however, it can be a little complicated to navigate the site.  The examples on the first page are good to play around with for beginners.  One of the examples given is –gate, and how the use of it changed in the 1990s (e.g. Monicagate).  Click on –gate and the top box will show words that use –gate.  Scroll down to Monicagate (number 5 on the right), this will pop up the year and magazine articles which you can click for further context.

Another useful feature is the option to compare multiple features in the search.  For example, you can compare two words like ‘husband’ and ‘wife’ and then you can further limit the search by adding the collocate ‘divorce,’ this can be even further restricted by choosing a time range in which to search.  Once you pick an actual article, TIME Magazine Corpus directs you to the TIME Magazine website where you can email the document to yourself, print it, or share it via blog, twitter, facebook, etc.

You have to be familiar with the specific ways to search the site in order to really be able to use it.  There are plenty of ways to find help on the site, take a look at the information that pops up when you click the question marks by the search boxes.


Even with this help, the site takes some getting used to and can be rather time consuming to use.  It is certainly easier to use than to try and go through the texts yourself to see how words have changed over time.

As far as complexity, TIME Magazine Corpus is similar to Voyeur.  It is also reminiscent of the Library of Congress’ Chronicling America website, though I find Chronicling America much easier to use.  The example page is great but perhaps some sort of short instructional video to go along with the example would be helpful.  At least a tutorial would be great.

Though the site is limited to TIME Magazine, the amount of material is huge, ‘100 million words,’ and still growing as TIME keeps releasing publishing.  A researcher could use this site to study almost anything, I conducted random searches in gender studies, film media, parts of speech, phrases, etc. and very rarely did the search conclude with less than three examples to pick from.  In fact, the amount of information that normally pops up can be overwhelming.

Please play around on the site and let me know if you think that it is a useful site.  Do you find it a bit difficult to navigate?

The Google Custom Engine: Refining Searching in a Few Steps

Sometimes it is a frustrating experience to search for a topic through the internet, only to have the search engine turn up results that are not related to what you are looking for. This problem is similar to what the Bing commercials looked to address with “search overload” during internet searches.

The Google Custom Search Engine provides its users with a search engine to put on their website; the main feature is that it is customizable to refine its search results based upon parameters set by the user.

This makes it easy to find information because the search engine will only look through the user-set websites and pages, and not through other places that are not topic-related.

Setting up a Google Custom Search Engine is an easy three-part step. The first step has the user setting the parameters of the search engine, listing the websites the search engine will use. The second step is only a setup of how the engine will appear on the website, and the third step provides the code to paste into the user’s website.

There are tons of smaller options that allow the search engine to be customized even further, from choosing sites to emphasize during the search, to making money from Google’s AdSense program.

One problem I could see with the search engine is that its usefulness is only as good as the sites that the user lists for the engine to use; if they do not know enough sites to put on the list, the search results may not be as complete.

One solution is that the search engine allows collaboration with invited users with limited access, letting them add sites and labels to the list as needed. The search engine can also choose instead to search through all pages, but emphasize the list of websites provided by the user.

The Google Custom Search Engine is basic in what it is used for, but can be further customized for advanced use in user interaction and how results are shown. Easy to set up, this search engine is one way for websites to ensure that their users are finding search results that are topic-related.

External Link to Example Search Engine
Smithsonian and DC Museums