Voyeur/Voyant

Have you ever found yourself wishing you could find a web-based text analysis program that was created to theorize text analysis tools and text analysis rhetoric?  If such a specific desire has ever burdened you, fret no more!!  Your wish has been answered by the collaborators of hermeneuti.ca with their creation of Voyeur!

How does Voyeur work?  Users paste a URL(s) or text into the “add text” box and click on “reveal” for the program to calculate frequency of words in the text.  The results are shown two ways: one is visual (like wordle) with the most frequent words appearing the largest in a word cloud, the other is shown in the “summary” or “words in the entire corpus” box.  Both of these list the most common words in descending order.

Once the data has been analyzed, users have several options of what to do with it.  One of them is exporting it.  There are several options of how and where to export the data to.  For a historian doing research on multiple documents, this tool is very valuable.  If a user is looking for the frequency of a particular word, they can type it into the “search” box under “words in the entire corpus.”  Double-clicking on a word brings up three more boxes of information: “word trends,” “keywords in context” and “words in documents.”  If there is a favorite word users want to store they can click on the heart with a plus sign in the “words in the entire corpus” box to save it.  These features work for foreign languages as well (they must be text, symbols are not recognized).

While Voyeur has many positive attributes, it also has its negatives.  The most frustrating of which is the limited data type it can analyze.  Hermeneuti.ca acknowledges the flaws of this website-in-progress but it claims the ability to break down a variety of web-based texts.  When I entered the URL for a JSTOR article, an error message appeared.  I also tried entering the URL for blogs and it would not analyze those either.  I was not able to test an e-book with Voyeur but I would be interested to see if it would break it down.  Another downside to this program is that it analyzes common words like “the, and, of, in” etc.  Wordle does not show these common phrases in the word clouds it creates.  This is not a terrible feature but if it could be eliminated to focus on more key words that would improve it.

How useful can this program be for historians when it lacks the ability to analyze a variety of documents?  It would not be my first choice for text analysis if there are more versatile programs available.  However, for the documents it can break down, it is useful in comparing multiple texts at one time, finding the most frequent words from the documents combined.  The ability to export the data and store favorite words makes it convenient for some types of historical research.

What do fellow historians think of this?  Can programs like Voyeur be useful even if they have a limited capability for analyzing documents?  What should we be looking for in text analysis programs?

(posted at 10:26 pm on 5/5)

7 comments:

  1. Kelly,

    I agree with you that the issue Voyeur seems to have with a variety of sources is an inconvenience. I definitely share your concern with Voyeur’s inability to edit out words like and or the. Maybe the solution would be a program that allows for easy manipulation of the data? The ability to click on a word like “and” and then click an x box to remove the results from the data would be awesome. Clearly, a website that makes it easy to manipulate data like this seems to be golden egg many developers are trying for.

    It’s interesting to me that “easy” and “accessible” text-based visualization programs seem to be proliferating at an alarming rate. Clearly interest, or the perception of interest, is there, or people wouldn’t be making so many programs that are variations on the same basic idea. And yet, it seems that so many of these options have flaws that make them frustrating, confusing, and difficult to use.

    As a spoiled member of the generation that grew up with computers, I get frustrated with online applications that I can’t figure out quickly. It should just work, dang it! Unless it was very important that I produce data or visualizations like this, I wouldn’t go to the trouble of incorporating them into a paper. Essentially, I’m not sure that the benefits of a nifty visualization or count-up of commonly-used words in a text is worth the effort. What do you think?

  2. Caitlin,

    I agree with you about being able to manipulate the data and the irrelevance in using visuals or word counts. I think the ability to manipulate the data to choose what words you want to store or export would be a great addition to Voyeur. I doubt that historians using this program would want to know about every word and how many times it appears, usually we are looking for key words.

    While it is fun to have a word cloud appear with the data, it is unnecessary. If the point of the program is to analyze a document and provide frequency counts that can be viewed in tables or charts, a word cloud does not mesh well with the intention or look of the program. It might be useful for certain types of projects but as I said before, it includes common words like “the, of”, those do not stand out or make the visual representation interesting. Those words will be found in every text that is entered into the program.

    As we well know, historians can be skeptical of using statistics in historical research. This being the case, how can we find programs that are easy to use and flexible to meet our research needs?

    Kelly

  3. Caitlin and Kelly,

    I feel like Voyeur can be used (and most likely is used) when better programs are not affordable. Let’s face it – developing great technology is expensive, and the old adage “You Get What You Pay For” may still have some truth left in it. If one is pursing undergraduate research, or is facing some funding issues in the graduate sphere, Voyeur could be a decent solution should you have need of its text analysis. As to how do we find better programs? Personally, I’d do my research and consult an expert. I do not claim to have any expertise in the field of digital humanities, but (thankfully!) there are those who do. Isn’t that what we’re suppose to be advocating in the history discipline now – E. O. Wilson’s tried and true cry for consilience?

    Laura

  4. Laura,

    I think Voyeur is fine if users do not have access to better programs but at this point in time there are several improvements that need to be made. Most importantly, the type of documents it can analyze. What good is the program if it only has a limited number of types of documents it can break down?

    Voyeur is helpful in that it provides the frequency count for words and users can analyze more than one document at one time. For a researcher doing simple research I think Voyeur would be sufficient. However, for users who are conducting in-depth research, from my experience, Voyeur would not be my first choice. Also, if a researcher is working with documents that need to be analyzed that closely, they are more likely to be affiliated with an organization that could provide them access to necessary tools.

    As I mentioned before, Voyeur is still a work in progress so maybe these issues will be addressed and it will become the coolest kid in the text analysis world, but as of right now, this would not be a tool I would go to first if I had better options available.

    Kelly

  5. I agree with all the comments so far about the technical issues with this program or other word cloud & textual visualization tools. I also found the Voyeur program a little difficult to use. Hopefully technical issues with Voyeur or other programs like it can be resolved to make them more user friendly. However, I do think tools like this are valuable to historians.

    Especially with the “linguistic turn” in history the past few decades, textual analysis can reveal important trends. The example on the Voyeur wbesite analzying the speeches of Barack Obama and Jeremiah Wright with regards to race was very interesting. Textual analysis tools can not only count words, but also show historians how some words appear in conjunction with others, like how Obama and Wright discuss “race” in both their speeches. It is a tool that can have important applications for historians.

  6. Blog comment theme music: High Speed by Coldplay

    I’m going to speak a little more about the technology underneath this and why it’s important. Voyant Tools is built on a lot of interesting digital exploration tools. Voyant finds its roots in the early days of UNIX command line tools like grep and sort, simple but powerful tools used to analyze text in documents. The Voyeur Tools background page shows the progression of tools used to analyze documents, up to the creation of Voyeur Tools itself.

    Voyant Tools builds upon the toolkit provided by TAPoRware, a project maintained by McMaster University. At face value, Voyant seems to add little more value to TAPoRware beyond an easy to use interface. TAPoRware provides only command line tools. The command line is not necessarily an intuitive way to analyze text, especially for novice users.

    In the future, what will text analysis look like? Carnegie Mellon is working on an AI project called Read the Web that crawls websites trying discern meaning in documents. Currently on its 504th iteration, Read the Web’s AI program NELL is learning while it learns — becoming better at reading and comprehension as it consumes documents.

  7. Hi Folks,

    Some of my HIST2809 students have been having great success with Voyeur/Voyant tools. We use it as an initial exploration, a way to generate hypotheses about what we’re reading. One of my students posted his paper using Voyant on his blog, http://historyasthepast.wordpress.com/ .

    If you’re looking for something more powerful, take a look at what Rob Nelson did with the Richmond Dispatch http://dsl.richmond.edu/dispatch/ . He uses the Mallet toolkit, which is a bit intense for the uninitiated. However, I’ve put together a step-by-step guide to getting started with it; I’d appreciate feedback on anyplace where my steps falter:
    http://electricarchaeologist.wordpress.com/2011/08/30/getting-started-with-mallet-and-topic-modeling/

    and once you’ve got Mallet installed (which runs from the command line), a Java GUI to make life just that much easier:

    http://electricarchaeologist.wordpress.com/2011/11/11/topic-modeling-with-the-java-gui-gephi/

    Good luck! Hope this is useful for you,
    Shawn

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>