Project Draft: Omeka and the Historical Society of the DC Circuit Court

For the past several weeks I have been working with my public history class to create an online catalog and exhibit for the Historical Society of the District of Columbia Circuit Court. While we still have a long way to go before completing the project, we have made substantial progress on the site and finally have a draft up and running at http://dcchs.omeka.net/!

So far the most substantial progress made on the project has been the completion of the online catalog of the historical society’s collection of judicial portraits. If you click on the archives tab you can see an entry for each portrait that includes information about the judge and the painting. While two of my other group members also helped with the catalog, I entered in the information for the judges between Joyce Hen Greene and Joseph C. McGarraghy (these judges are located on pages 6-8 in the archives section).

Now that we have completed the majority of the work on the catalog portion of the site, my group has shifted focus to creating an exhibit and lesson plans. While the majority of the work for these projects has not yet been included in Omeka, we have made substantial progress on writing the exhibit script and gathering information to create K-12 lesson plans. We plan to finish both of these projects by March 30th so that we will have time to make final edits to the site before the final drafts of our project are due.

In addition to the exhibit and lesson plans, we are also considering including a site survey to receive feedback from visitors. This would allow the historical society to track the progress on the site and learn of any problems that visitors experience while using the page. Can you think of anything else that we should include on the site? Are we missing anything that you think would really add to the web page? Please leave feedback here!

The “True” Corpus of American English

In an analysis of the Corpus of Historical American English (COHA) and Google Books, Professor Mark Davies of Brigham Young University studies the effectiveness of both engines and their ability to properly read the English language. Both are corpuses of the American English language, but Google Books has 500 billion words compared to COHA’s 400 million. Davies argues that although COHA has a significantly smaller database, the trending patterns still mirror those of Google Books. Because of this, Davies argues that COHA is actually the more effective corpus—a smaller database means far fewer data to sift through and that means quicker search results and faster information.

COHA’s “toys” are what make it a more useful database in Davies’s eyes. While Google provides the same basic function (showing the frequency of word usage throughout the decades), COHA is able to track concepts, related words, and changes in meaning. Whereas Google tends to have a one-track mind, just like it’s general search engine, COHA manages to “think” about relations for the words being placed into the search. Looking for things such as relation, form, root words, or even cultural shifts, the searches are much more comprehensive.

Design is a huge issue for some researchers, and with this in mind, Google definitely has the upper hand. True, as Davies puts it, COHA is able to effectively portray the same statistics, but I had a genuinely hard time navigating the site. Bar graphs are nice for portraying how many dark-haired, blue-eyed people are in a class, but for a corpus of American English, I found them rather ineffective. Google Books had a much more pleasing site, nicer to the eye, and easier to follow the pattern of the language. Yes, COHA’s tables are nice for their alternative searches, but as I said, traversing the site is actually rather difficult. Google provides a much more streamlined site, and provides actions that are easy to follow

Unfortunately for Google, I think the amount of words in the database have made it impossible to create proper analyses for the grammar, word meanings, and word foundations that COHA is successfully able to analyze. If Google Books created an efficient way to sift through all of that information quickly enough, then it would immediately become the preferred site. However, because of its inability to process the large amounts of data (largely its own fault) has rendered it ineffective to be a “true” corpus of the American English language.

 

Millions of Digitized Books, Hundreds of Fascinating Conclusions

Jean-Baptiste Michel et al.’s short and sweet article Quantitative Analysis of Culture Using Millions of Digitized Books raises a number of bold points that show just how valuable Google’s bold (and originally considered foolhardy) Google Books project has been to historians.  The project uses nearly 5.2 million books (over 4% of those written, a very significant standard) containing over 5 billion words and searches them.  Let’s pause for a minute and think about what that means.  25 years ago, or even 10 years ago, if you said you wanted a search through a sizable sample of every book ever written for certain words, you would have had your head examined.  The paper points out that it would take a human 80 years just to read all the books written in one year, 2000.  Here’s this device that can go through the entire corpus in literally quicker than a blink of the eye.  Roughly 2/3rds of the 500 billion words are in English and there’s only a significant sample size for the books 1800 and on (though there are a fair amount from 1600-1800), but even with these limitations, the work allowed the researchers to come to some bold conclusions.
“What conclusions?” you ask?  Try this one on for size: they estimate that most dictionaries might only contain as little as 52% of the living lexicon at any given moment.  They estimate the total lexicon of 1-grams (single words, excluding symbols, numbers, typoes, etc) at 544k in 1900, 597k in 1950, and 1,022,000 in 2000 (counting n-grams that are used over 1/1,000,000,000 of all English words).  Some of these are not in dictionaries due to dictionaries’ traditional dislike of compound words, but others are inexcusable (they point to “deletable” as a particularly ironic example).  This lexical “dark matter,” in their charming expression, are words that are fresh for research.  No OED biography has ever examined every facet of these words, and no amount of looking up will find them.  The n-gram has saved these potentially valuable expressions from the invisibility of  their hidden nature.

 

Another bold feature the n-gram allows is to trace the rise and fall of terms over time.  Much has been made of the example of the engram for “World War I” vs “Great War,” where Great War holds strong until 1939, then falls off, while World War I rises to pick up the slack, but it’s hardly the only example.  You can do the n-gram test yourself and see the decline of a good many words and phrases, and the introduction of others.  Ever been curious to see if anyone said “Yadda-yadda-yadda” before Jerry Seinfeld?  Want to map “Reality Television” vs. “Situational Comedy” and see if you can identify the year Survivor was released?  Want to compare Claude Lamarck with Charles Darwin or Karl Marx with Sigmund Freud?  The world is your oyster.

 

The n-gram can also detect the death of older, archaic forms of words.  “Spilled” is becoming the past tense of “to spill,” but there is no use in crying over spilt milk about it, spilt had a long run.  Contemporary spouters of aphorisms think that all that glitters is not gold, but their fathers sagely opined that all that [i]glisters[/i] is not gold.  Indeed, past tense verbs that end in “t” are fighting a slow, steady, losing battle against “ed.”  Can they survive?  I feel I’ve spoilt the ending of this struggle, but I’ve been burnt on these predictions before.

 

The final section of the article struck (or will it become “striked?”) a more somber note: repression.  Examining the use of the word “Trotsky” in Russian language sources through the 1920s tells a harrowing tale, but everyone expected as much.  (I wanted to run a similar test on “New Economic Policy” vs “Five Year Plan,” but, alas, I speak no Russian, and the English results are pretty meaningless).  What is more interesting is the revelation of people never before suspected of repression.  The Nazi regime’s list of degenerate artists was apparently far more extensive than generally known, as people never included in the traditional narrative saw their mentions in German press fall off the face of the earth in the late 1930s.  Again, this was just a cursory exercise: this n-gram search opens up the possibility of a new way of looking both at the more blatant Nazi/Soviet repression, and the more subtle blacklisting preferred in the West.  There are millions of possibilities that n-grams open up for these millions of books.

The Persistence of the Wasteland

I thought I’d give an update on my project.  You will recall that I’ve been using the Fallout series as a benchmark for examining changes in American nuclear culture from 1945-2011.  It is striking how prevalent the image of the high desert is in current American concepts of a post-nuclear world.  Below are just a few of the photographs from games and movies.  It is simply impossible to display a post-nuclear world without reference to what Jeffrey Womack calls “The Landscape of Death.”  Yet, significantly there is little to no use of this landscape trope prior to the early 1970s.  My argument is that this is the result of the environmental movement, a reinvigorated anti-nuclear/disarmament movement, and most importantly the release and wide dissemination of films and images from the New Mexico and Nevada above ground nuclear tests, which permanently associated the high desert and the mushroom cloud together in American minds.  There is, in fact an almost total switch of thematic focus.  Most of the books of the mid to late 50s feature a decimated or extinct humanity in a pristine world, a world wiped clean by bombs.  The latter movies, books and games feature a resilient, surviving and tenacious humanity in a world utterly devoid of nature.  This changing focus speaks to larger fears about the affect of technology on our environment which simply was not a part of the zeitgeist prior to 1970.

 

Also, notice the theme of the barren road, and the loan traveler.  I’m not sure how to interpret why that image is so striking, and used so repeatedly.  Anyone have any ideas?

Project Started!

Hey guys!

Here is the link to my video game blog : http://pixellatedculture.blogspot.com/

The first post is about Fallout 3 for those interested in the Fallout-verse, but knowledge of the game is not entirely necessary.

I chose Blogger as it is part of Google and links up to a bunch of other services. Blogger also gave me a bunch of tools to put on my blog, as in links to Twitter and Facebook, putting my own Twitter on the site, a page view tracker, an RSS button for subscribers and more! There’s even an option for Ad Sense, which I opted out of. Blogger also has a variety of templates and great customization. I wasn’t feeling to creative, so I just tweaked one of the basic ones. It was super easy to set up and get started. Blogger also has a button at the top for “Next Blog” so if you’re feeling random it takes you to another blog, thus potentially increasing my traffic. The only trouble I has was hyperlinks. Oh, they work and it’s fairly intuitive to get them in there, it’s just the field won’t let you copy the link in so you have to type it yourself. There is a “test the link” feature so you know if you typed it in right, but it’s still mildly annoying.

It’s a little early to tell how successful this will be, but I have hope. I didn’t realize Blogger has so many features when I signed up so that was a pleasant surprise. Anyway, check it out and leave comments telling me what you think, either here or at the blog itself.

Edit:

So now I have two posts up. The second is on Bioshock. So far, I haven’t gotten any comments, but I have gotten plenty of views. For the rest of the process for the project time limit, I do have the list of games I want to speak about. I do plan to keep the alternating schedule of history week vs lit week.

However, I do desire comments as I want this blog to be an interactive experience. I want people to leave suggestions and give me ideas. I’m hoping that I can encourage people to leave a comment and this is just new blog jitters or something like that. I also recently redesigned by template to look a little more classy and the background isn’t so distracting.

I’m also having trouble deciding format. I’m thinking my blog looks a little bland because the it doesn’t have pictures. However, I’m afraid that putting too many pictures would disrupt the flow. Right now I have hyperlinks for aspects of the game which are important. What’s the best way to go about the picture issue?