Digital Preservation Policy: Web Archiving for the Washingtoniana Collection

Introduction:

In my previous posts on this blog I have surveyed the digital preservation state of the District of Columbia Public library’s Washingtoniana collection. This survey was preformed via an interview with Digital Curation Librarian Lauren Algee  using the NDSA levels of digital preservation as a reference point.

In our survey we discovered that the DCPL Washingtoniana collection has very effective digital preservation which through a combination of knowledgeable practices and the Preservica service (an OAIS compliant digital preservation service) naearly reaches the 4th Level in every category of the NDSA levels of Digital Preservation. With this in mind my next step plan for the archive looks at a number of areas the archive has been interested in expanding and presenting some thoughts on where they could begin taking steps towards preservation of those materials.

Of particular interest in this regard is the collecting of website materials. Being dynamic objects of a relatively new media, collecting these items can be fairly complex as it is hard to precisely pin down to what extend is a website sufficiently collected. Websites may appear differently on different browsers, they may contain many links to other websites, they change rapidly, and they often contain multimedia elements. As such outlined below will be a policy which discusses these issues and specifically offers a digital preservation plan for websites.

Website Digital Preservation Policy for the Washingtoniana collection

The Washingtoniana collection was founded in 1905 when library director Dr. George F. Bowerman began collection materials on the local community. The collection stands as one of the foremost archives on the Washington, D.C area, community, history, and culture. Naturally it makes sense then with the increasing movement of DC social life and culture to online or born digital platforms that the Washingtoniana collection would consider collecting websites.

Selection

The same criteria for determining selection of materials for Washingtoniana materials should apply here. Websites should be considered if they pertain to Washington, DC or its surrounding areas, events that take place in or discus that area, pertain to prominent Washington D.C. related persons, DC related institutions, or websites otherwise pertaining to Washington D.C. community, arts, culture, or history.

Like any physical preservation decision, triage is an essential process. Websites that are likely to be at risk should be high priority. In a sense all web content is at risk. Websites that are for a specific purpose, or pertain to a specific event may have a limited operational window. Websites for defunct businesses, political election sites, and even an existent website on a specific day may be vulnerable and thus a candidate for digitization. In addition the materials in question should not be materials which are being collected elsewhere, and should be considered in relation to the rest of the collection.

Although automation tools may be used for identification, discretion for selection is on librarian hands. In addition, suggestions from patrons relevant to the collection should be considered, and a system for managing and encouraging such suggestions may be put in place.

Metadata

A metadata standard such as MODS (Metadata Object Description Standard ) should be used to describe the website. MODS is a flexible schema expressed in XML, is fairly compatiable with library records, and allows more complex metadata than Dublin Core and thus may work well. Metadata should include but not be limited to website name, content producers, URL, access dates, fixity as well as technical information which may generated automatically from webcrawlers such as timestamps, URI, MIME type, size in bytes, and other relevant metadata. Also, extraction information, file format, and migration information should be maintained.

Collection

A variety of collection tools exist for web archiving. The tool selected should be capable of the below tasks as outlined by the Library of Congress web archiving page

  • Retrieve all code, images, documents, media, and other files essential to reproducing the website as completely as possible.
  • Capture and preserve technical metadata from both web servers (e.g., HTTP headers) and the crawler (e.g., context of capture, date and time stamp, and crawl conditions). Date/time information is especially important for distinguishing among successive captures of the same resources.
  • Store the content in exactly the same form as it was delivered. HTML and other code are always left intact; dynamic modifications are made on-the-fly during web archive replay.
  • Maintain platform and file system independence. Technical metadata is not recorded via file system-specific mechanisms.

A variety of tools are capable of this task, a web crawler such as the Heritrix open source archival webcrawler or a subscription solution Archive-IT should be used. Both are by the Internet Archive, however the first is more of an open source solution while the second is a subscription based service which offers storage on Internet Archive servers.

Upon initial collection fixity should be taken using a Checksum system. This can be automated either with a staff written script or a program like Bagit, which automatically generates fixity information. This information should be maintained with the rest of the metadata for the digital object.

Websites should be kept in the most stable web archival format available. At the moment of this posts writing that format should be the WARC (Web ARChive) file format. This format allows the combination of multiple digital resources into a single file, which is useful as many web resources are complex and contain many items. Other file formats may be accepted if archived webpages are received from donors.

Preservation

Upon initial ingestion items may be kept on internal drives, and copied to at least one other location. Before the item is moved into any further storage system the file should be scanned for viruses, malware, or any other undesirable or damaging content using safety standards as agreed upon with the division of IT services. At this point fixity information should be taken as described above, and entered into metadata record.

Metadata should be described as soon as possible, as to which point the object with attached metadata should be uploaded into The Washingtoniana’s instance of Preservica.

Although Preservica automates much of the preservation process, a copy of the web archive should be kept on external hard drives. On a yearly interval a selection of the items within the harddrive should be checked against the items in Preservica to insure the Preservica fixity checks and obsolesce monitoring are working as desired.

References

Jack, P. (2014, February 27). Heritrix-Introduction. Retrieved November 14, 2016, from https://webarchive.jira.com/wiki/display/Heritrix/Heritrix#Heritrix-Introduction
Web Archiving-Collection development. (n.d.). Retrieved November 16, 2016, from https://library.stanford.edu/projects/web-archiving/collection-development
The Washingtoniana Collection. (n.d.). Retrieved November 16, 2016, from http://www.dclibrary.org/node/35928
Web Archiving at the Library of Congress. (n.d.). Retrieved November 16, 2016, from https://www.loc.gov/webarchiving/technical.html
Niu, J. (2012). An Overview of Web Archiving. Retrieved November 16, 2016, from http://www.dlib.org/dlib/march12/niu/03niu1.html
AVPreserve » Tools. (n.d.). Retrieved November 17, 2016, from https://www.avpreserve.com/avpsresources/tools/
Kunze, J., Bokyo, A., Vargas, A., Littman, B., & Madden, L. (2012, April 2). Draft-kunze-bagit-07 – The BagIt File Packaging Format (V0.97). Retrieved November 17, 2016, from http://www.digitalpreservation.gov/documents/bagitspec.pdf
MODS: Uses and Features. (2016, February 1). Retrieved November 17, 2016, from http://loc.gov/standards/mods/mods-overview.html
About Us. (2014). Retrieved November 17, 2016, from https://archive-it.org/blog/learn-more/

 

A Glass Case of Emotion: User Movitivation in Crowdsourcing

The web is inherently made up of networks and interactions among its users. But what is the nature of these interactions – participatory? collaborative? exploitative? These questions play out when cultural heritage institutions take to the web and attempt to engage the vast public audience that is now accessible to them. Crowdsourcing is a means to allow everyday citizens to participate and become more involved with historic materials than ever before. Similarly, these volunteer projects can overcome institutional monetary and time constraints to create products not possible otherwise. What most interested me in the readings is the motivations of those involved in these projects. Why do citizens choose to participate? Why are institutions putting these projects out there? How do they play on the motivations of their users? These questions link back to the overarching general ideas about the nature of interactions on the web.

Why Wasn’t I Consulted?

Paul Ford describes the fundamental nature of the web with the phrase “Why wasn’t I consulted” or WWIC for short. Ford claims that feedback and voice on content is what the web is run on. By giving people a voice, even through the basest form of expression in likes, favorites, +1’s, or “the digital equivalent of a grunt,” users are satisfied that they were consulted and that they can give their approval or disapproval.

User experience, in Ford’s mind, is centered on their emotional need to be consulted. Additionally, the expression of approval is what feeds other users to create content, receiving a positive emotional response from those who consume their work. Organizations create spaces that shrink the vast web down into communities where the WWIC problem can be solved. Essentially, these structures create a glass case of emotion.

Ron Burgundy in a Phone Booth

Libraries, archives, and museums have to deal with users’ emotions when creating their crowdsourcing ventures. How do we create places where the users will feel consulted and desire to participate? Like Ford, Causer & Wallace in describing the Transcribe Bentham project of University College London, and the Frankle article on the Children of Lodz Ghetto project at the United States Holocaust Memorial Museum, emphasize understanding users and volunteers as well as finding the appropriate medium is important in these undertakings.

Causer & Wallace identify a much more detailed set of motivations of their user groups than Ford’s WWIC idea. Many of their participants claimed they had interests in the project such as history, philosophy, Bentham, or crowdsourcing in general. Other than these categories, the next biggest reasoning for joining the project was a desire to be a part of something collaborative. The creators of Transcription Bentham failed to create an atmosphere where users felt comfortable collaborating which may have been why the project decreased in popularity over time. The Children of Lodz Ghetto project, on the other hand, is much more collaborative with administrators guiding researchers through each step of the process. Eventually they hope to have advanced users take over the role of teaching newcomers. The Holocaust Museum’s project is a much more sustainable model that could lead to lasting success.

Crowdsourcing (For Members Only)

While collaboration and having an interesting topic is a key factor in motivating participation, how do online history sites get the attention of the public to join in the first place? The push for the openness of both the internet and cultural institutions is something I greatly support, but I think motivating the populace to get involved in these projects needs a return to exclusivity. There is still a prevailing notion that archives and other cultural organizations are closed spaces that only certain people can access. In many European institutions this is still the case. Why don’t we use the popular notions of exclusivity to our own benefit?

Hear me out. What these articles lacked was the idea that many people desire what they cannot get or what only few can. I’m not advocating putting collections behind a paywall or keeping collections from being freely available online. Instead, I think participation in crowdsourcing projects should be competitive or exclusive in order to gain the initial excitement needed to gain a following and spur desire for inclusion.

Other social media platforms such as early Facebook and more recently Ello or new devices such as Google’s Google Glass, have made membership or ownership limited, creating enormous desire for each. In these examples, the majority of the populace is asking why wasn’t I consulted? and therefore want to be included. Thus, having the initial rounds of participation be limited to a first-come, first-serve, invite-only platform would spark desire for the prestige of being the few to have access to the project.

In Edson’s article, he wrote about the vast stretches of the internet that cultural institutions do not engage, what he called “dark matter.” While there are huge numbers of people out there who are “starving for authenticity, ideas, and meaning,” I think the first step should be creating a desire to participate and then growing the project. Without something to catch the public’s attention, create a community, and grow an emotional desire to participate, another crowdsourcing website would simply be white noise to the large number of internet users in the world.  The users, who are visiting the websites looking for a way into the projects but denied, could discover the free and open collections which are there right now. After this first limited period, once the attention is there, I think scaling up would be easier. Of course these ideas will only work if the institution has created a place that understands the emotional needs of its users and provides a collaborative and social environment where users are comfortable participating.

 

Bridget Sullivan Print Project Proposal

In recent years, museums and archives have made a concerted effort to take advantage of digital media in connecting with public audiences. These institutions have undertaken a multitude of projects to make their collection available to a greater audience through digital access to these types of collections. For my print project, I would like to take a closer look at some of these approaches to presenting historic material culture to a public audience and how digitization efforts have affected the way that the public engages with historical narratives through material culture.

 

Specifically, I would like to focus on the digital offerings of the National Archives and the Library of Congress. Historically, these are two of the most widely used research facilities for American history. As such, they have fallen into the category of most archives, which tend to discourage visitation from anyone outside of serious historical researchers. There is little opportunity to explore the holdings of these types of institutions and they can even be intimidating for newer researchers.

 

However, digitization has broken down the barrier between the public and these repositories of American public knowledge. Both have taken great strides to make portions of their collections available to all types of researchers through the Internet. Further, these efforts have been targeted at different audiences. The National Archives and the Library of Congress have both made documents and finding aids available through general search features of their websites. However, they have also gone beyond the basics of digitization. Each has created online offerings that are more suited to general exploration of their collections, as opposed to research with a specific focus and mission.

 

The National Archives offers the Digital Vaults, a way to digitally wander through their collections. Documents are linked by categorical tagging. It also allows explorers the ability to create their own collections of documents and artifacts that are interesting to them. Similarly, the Library of Congress has created MyLOC. Explorers can register for their own account and create collections of interest to them. These collections can incorporate all aspects of the website, including general information about visiting the Library of Congress as well as online exhibits.  

 

I will compare and contrast these two sites, focusing on the audiences they target and the various pathways these audiences have to interact with the collections of these institutions. Additionally, I will address how the ability to interact with collections online has affected the demographics of those who take an interest in these collections.

On the Potential Benefits of “Many Eyes”

In 2007 IBM launched the site Many Eyes, which allows users to upload data sets, try out various ways of visualizing them, and most importantly, discuss those visualizations with anyone who sets up a (free) account on Many Eyes.  As professor Ben Shneiderman says, paraphrased in the New York Times review of Many Eyes, “sites like Many Eyes are helping to democratize the tools of visualization.”  Instead of leaving visualizations to highly trained academians, anyone can make then and discuss them on Many Eyes, which is a pretty neat idea.

Many Eyes allows viewers to upload data sets and then create visualizations of them.  Many Eyes offers users the ability to visualize data in 17 different ways, ranging from the wordle type of word cloud, to maps, pie charts, bubble graphs, and network diagrams, just to name a few.  There are other sites or programs that will allow users to create charts in some of these ways, Microsoft Excel for example, but Many Eyes offers the advantage of multiple types of visualizations all in one place.

Additionally,  people in disparate locations can talk about the data sets and visualizations through comments.  The comment feature even allows for the “highlighting” of the specific portion of a visualization you might be referencing. The coolest feature of Many Eyes is that anyone can access and play with data uploaded by anyone else, in the hopes that “new eyes” will lead to surprising and unexpected conclusions regarding that data.

If you create an account on Many Eyes, you can access their list of “Topic Centers”, where people who are interested in data sets and visualizations relating to specific topics, can interact and comment with one another, as well as link related data sets and visualizations.  However, a quick perusal of the topic centers show that the vast majority of topics are being followed by only one user.  The few topics that have more than one user seem to be pre-established groups with specific projects in mind.

Unfortunately, it appears that a crowdsourcing mentality, where people who don’t know each other collaborate to understand and interpret data, hasn’t really materialized.  In this IBM research article, the authors even hint at how Many Eyes “is not so much an online community as a ‘community component’ which users insert into pre-existing online social systems.”  Part of the difficulty in realizing the democratizing aspect of Many Eyes might be a simple design problem in that the data sets, visualizations, and topic centers display based on what was most recently created, rather than by what is most frequently tagged or talked about.  This clutters the results with posts in other languages or tests that aren’t interesting to a broader audience.  Many Eyes developers might adopt a more curatorial method where they link to their top picks for the day on the front page in order to sponsor interest in certain universal topics.  But maybe the problem might be more profound; what do you think?

Ultimately, I’m not sure how relevant Many Eyes is to historians.  It seems that asking for a democratized collection of strangers to collaborate on visualizing your data seems unlikely based on the usage history of the site.  However, groups of researchers who already have a data set to visualize and discuss might be able to make use of this site for cliometrics-style research.  Classrooms and course projects in particular can benefit from this site, since it’s relatively easy for people with a low-skill level to use.  What do you think?  What other applications do you see Many Eyes having?  How relevant will it be for your work in the digital humanities?

Flickr

Flickr is a free photosharing site. It allows you to create a profile and upload photos to a format that makes them easy to share with friends, family and the general public. Flickr makes it easy to get started. In addition to step by step instructions when creating a profile, it also provides a tour of the site that explains all of its features. Aside from uploading photos, you can comment on other users’ uploads or mark images that are especially interesting to you as favorites, allowing you to easily return. Flickr also lets you add people to photos to easily alert other users who may like that image. One feature that I found interesting was the guest list. This feature allows access to images that you choose for people who do not have a Flickr account. On that note, it also contains privacy settings that limit who can see photos on an individual basis.

Two features that I thought were especially useful were the map and linking. Flickr allows you to upload collections of photos from your account to a separate website. This feature is helpful for institutional accounts because they can connect the photos on Flickr to their main webpage. It also could be used by bloggers to share Flickr collections through that medium. The map feature allows you to attach photos to a specific location. Again, this type of technology could be utilized by historical institutions to teach about events or themes through photos.

The search feature is a great way to explore the Flickr world. When searching it brings up photographs tagged with that term as well as groups, individual photographers and places associated.Flickr also allows you to comment on photos. One piece of this feature that was interesting was that you can comment directly on a photo.

The Flickr Commons is the most obvious historical aspect of this site. The Commons provides users the opportunity to help describe photo collections from various institutions across the globe, such as NASA, The National Archives, the New York Public Library, and Smithsonian. Users can add tags and comments to any of the photos available in The Commons.

Flickr also allows you to organize photos into sets and collections, as well as create groups to aggregate photos with a common theme. Some examples of historically minded groups are

http://www.flickr.com/photos/nersess/sets/72157603339444029/with/2066890192/

African American laborers at Alexandria, near coal wharf

The Google Custom Engine: Refining Searching in a Few Steps

Sometimes it is a frustrating experience to search for a topic through the internet, only to have the search engine turn up results that are not related to what you are looking for. This problem is similar to what the Bing commercials looked to address with “search overload” during internet searches.

The Google Custom Search Engine provides its users with a search engine to put on their website; the main feature is that it is customizable to refine its search results based upon parameters set by the user.

This makes it easy to find information because the search engine will only look through the user-set websites and pages, and not through other places that are not topic-related.

Setting up a Google Custom Search Engine is an easy three-part step. The first step has the user setting the parameters of the search engine, listing the websites the search engine will use. The second step is only a setup of how the engine will appear on the website, and the third step provides the code to paste into the user’s website.

There are tons of smaller options that allow the search engine to be customized even further, from choosing sites to emphasize during the search, to making money from Google’s AdSense program.

One problem I could see with the search engine is that its usefulness is only as good as the sites that the user lists for the engine to use; if they do not know enough sites to put on the list, the search results may not be as complete.

One solution is that the search engine allows collaboration with invited users with limited access, letting them add sites and labels to the list as needed. The search engine can also choose instead to search through all pages, but emphasize the list of websites provided by the user.

The Google Custom Search Engine is basic in what it is used for, but can be further customized for advanced use in user interaction and how results are shown. Easy to set up, this search engine is one way for websites to ensure that their users are finding search results that are topic-related.

External Link to Example Search Engine
Smithsonian and DC Museums

Victorian Researcher Finds Google Makes His Life A Lot Easier

If you thought “Googling the Victorians” was about something else, you’ll be disappointed. In this article, Patrick Leary discusses how Google has made his life as a researcher of the Victorian era so much easier.

That’s to be expected with anything in digital history — wouldn’t our lives as historians be so much harder without Google?

But what is so surprising and unique about Leary’s article is how he views Google’s usefulness as something of an accident.

Leary writes about his search for a phrase that appeared in the Sunday Review.. His search for this phrase appeared in a number of other sources as well.

Leary writes: “Such experiences reinforce the conviction that the very randomness with which much online material has been placed there, and the undiscriminating quality of the search procedure itself, gives it an advantage denied to more focused research.”

While Google has helped his work, Leary also writes that it is no silver bullet and that one should always verify the authenticity of a source that is returned in a Google search.

“A great many legitimate scholarly purposes can nevertheless be served by an array of online texts that are, to one degree or another, corrupt,” he writes.

Later in the paper, we hear with excitement the prospects of expanded digitization projects as well as improvements in optical character recognition, or OCR, the technology that enables the searching of 19th century Victorian documents. Leary is also excited about the expanded number of non-profit digitization initiatives, like the Internet Archive.

He then discusses how new generations will take this kind of research for granted.

“What we are seeing is arguably not merely an electronic supplement to traditional library and archival research, but a more fundamental shift in our relationship to the textual universe on which our research depends,” he writes.

In all, this paper is not at all surprising. It could be extrapolated and made applicable to other topics within history, or even other fields. But what makes it important is Leary’s anecdotes about how this has changed his life — and his field.

How Democratic Do We Really Want the Internet Anyway?

I posted my print project proposal last week, which I’m sure everyone read and enjoyed thoroughly. Anyway, as I think about the questions regarding whether the Internet has fostered elitist and institutional groups rather than egalitarian and democratic groups, I have to wonder: How democratic do we really want the Internet anyway?

In a democracy, the majority rules. Well, that means if people aren’t using a site anymore, like GeoCities, it’s just going to disappear. What if the people aren’t interested in historian-generated websites? Should we just give up and leave the Internet to those capitalist successes? We talked this week about planning a website and the need for developing an audience and trying to determine what people we want to gear our work toward. The problem is, we don’t want to filter important historical information just to make our websites more user-friendly for the general public, do we?

If historians are going to gear their digital offerings in the tradition way – directing information toward an academic audience – are we not destined ourselves to creating a solely elitist network of our own?

Of course, this all relates to another important question: Can a structure be both elitist and democratic at the same time? Perhaps this all relates to the type of republican-democracy we have in the United States today?

What do you think?

[Insert Clever Flickr Title Here]

An interactive tool for the amateur photographer, Flikcr creates a whole new playground for both beginners and experts on digital photo storage.

Flickr, created by Yahoo presents a home for photographers of all skill levels to post their photos in a community forum. Flickr is based off of the idea of sharing, and allowing others to access photos. While privacy settings allow some posters to restrict access to their photos, Flick “recommends” allowing anyone to access your photos.

The photo site runs off of a series of “tags,” which run on the same concept of “tagging” for any other site and allow users to quickly sift through several thousands of photos in a matter of seconds. By searching for tags on the site, only relevant or “tagged” photos show up on your searches, including people and places.

A global map allows users to put tags on places within feet of their photos, allowing users to search photos by city and region as well. For archiving purposes, this allows a unique way of storing and filing photos, separating them into various sorts of categories. Sure, it’s convenient for some users, but it also raises the question: what if things are tagged wrong? It might not be a national crisis, but still, users make mistakes, right?

Besides the basic download and search functions, Flickr has extended the option of editing photos in Picnik, a free alternative to Adobe’s Photoshop and also allows users to “group” the profiles that they view the most often. In this way, users can easily keep track of friends or other photographers who may have similar styles. With this comes the option of having a contact list, allowing the users to direct message each other about their photos or related things and could be used to contact

I spent some quality time on Flickr over the course of the week, and explored all of its different functions. While setting up an account is a little confusing, the general idea of the site is genius. True, I would like a little more space for my photos (you’re restricted to just 200 MB on your individual photos), but creating multiple accounts can circumvent that. Although previously restricting Flickr access to solely Yahoo users, Google and Facebook users have now been invited access to Flickr, competing with Google’s Picasa.
After playing around with the site, I’ve really got to compliment Flickr for making the site as easy to use as possible. As a newspaper photographer, I have an opportunity to label my photos with titles, add captions, and tag them as many times as I want. Using photos from my archives, I grabbed a bunch of photos from my old high school’s sports and plotted the events at the different locations as well. Now, when users are exploring Manchester, my photos will be included.

This presents interesting opportunities for historians and digital archivists. Because the photos are on a free platform with the ability to be tagged several times, historians have the ability to load photos to the site of important historical happenings—its an easy way to keep track of as many important free-domain photos as possible and a site I would definitely recommend going to check out, whether you’re an amateur or expert photographer.

Wikipedia: The Good, the Bad, and the Ugly

As Bonnie’s post below adroitly demonstrates, Wikipedia is a site with a deeply-ingrained ethos and traditions that might not be familiar to the casual user, a tribal society that debates the content of pages hidden behind talk pages that regular users rarely see, and end up producing articles that are more dependent on consensus than on expertise.  Sometimes, the pages that result are admirable, written in clear English and with a large number of citations at the end for further scholarly pursuit.  Often, these are prominent subjects with quality articles in many of Wikipedia’s innumerable language versions (including the admirable but somewhat bizarre “Simple English” Wikipedia, which tries to present topics like quantum mechanics at a sixth grade reading level).  Since this practicum called on me to analyze three pages on Wikipedia, I decided to present them in a classic format: the good, the bad, and the ugly.  I found one article on Wikipedia I found especially praiseworthy, one that was stunningly poor, and a talk page that was, to put it mildly, ugly.  Without further hesitation or preamble, let us examine Wikipedia.

How can you tell what the best articles on Wikipedia are?  Wikipedia itself has a handy answer: they have a “Featured Articles” category that lists what the site considers the best articles on the site.  There are currently over 3,100 featured articles, and they add roughly one a day.  These articles are Wikipedia’s self-proclaimed cream of the cream, the roughly .1% of its over three and a half million articles that it’s willing to say it stands by.  Indeed, the Featured Article I have chosen is an admirable encyclopedia article.  Slavery in Ancient Greece has sections analyzing every aspect of slavery, from a detailed examination of the various terms the Greeks used for slavery and their different connotations to an examination at the origins of Greek slavery from the Mycenaean age through the Homeric period, tracing references of slaves in pre-Classical Greece all the way down through Draco and Solon.  The article struggles to quantify the number of slaves in Classical Greece, arguing that though the wide-scale slavery of the Romans in terms of number of slaves per master was unknown, there was a widespread usage of slaves in most classes, and a rich man could have up to fifty slaves.  It argues that intentional slave breeding was a rare, if not unknown, phenomenon, and that the “slave/citizen” line was far blurrier than the strict separation of the antebellum American South.  It goes on to detail classical views of slavery and then, amazingly, gives a short modern historiography of the subject and even poses discussion questions.  This admirable article is followed by a lengthy list of twenty-nine sources, 170 endnotes, and fifteen books for further reading.  This article’s ending list of sources would be ideal for an undergraduate writing a paper on Ancient Greek slavery and needing academic sources: an amazing amount of historiography is present in the works listed (though, admittedly, 1/3rd of the books mentioned are in French).  All in all, this article is a great example of what a Wikipedia page can offer scholars.

At the other end of the spectrum, we find Wikipedia’s article on the 19th century Taiping general Loyal Prince Lee, or, as he’s known on Wikipedia, Li Xiucheng. I will admit that this is the third occasion on which I have cited this page as an example of Wikipedia’s defects, and it has changed every time, except no matter how much it changes, it remains unacceptable, year after year.  Wikipedia put up a disclaimer, almost apologetically, saying that “This article is a rough translation from Chinese. It may have been generated by a computer or by a translator without dual proficiency. Please help to enhance the translation.”   Now, read that quote back over.  The translation was generated by a computer or a translator without dual proficiency.  It’s no wonder the article is a shambles, an incomprehensible mishmash.   The level of incomprehensibility is best demonstrated by the section labeled “Write:” “In Zhong Prince Li Xiucheng Describes Himself (《忠王李秀成自述》), the autobiographical account of a prince of the Heavenly Kingdom written shortly before his execution(Pseudohistory saying Li was suicide admitted by Zeng Guofan gave Li a sword because Zeng respected Li, even Li Hongzhang had been read this describes and praised Li Xiucheng was a hero on a letter to Zeng).”  Is this at all comprehensible to anyone?  The faults are further demonstrated in the final section which gives the name of a professor at the University of London as “柯文南.”  One must be skeptical that that is, in fact, how he prefers his name to be rendered in English.  In a final confusing move, under children it lists a son, “Li Ronfar Battle of Shanghai (1861).”  Did he die in the Battle of Shanghai?  Was he born in the battle of Shanghai?  What does this mean?  The Loyal Prince Lee article demonstrates a major shortcoming of Wikipedia: articles featuring figures that are mainly of interest to speakers of non-English tongues can be extraordinarily poor, even if their article on the Wikipedia of the native language is fine or even exceptional.

Most of Wikipedia’s deliberations happen behind the scenes, on its talk pages.  Talk pages are attached to every article, yet are rarely seen by most casual users (many do not even notice them), leading to talk page conversations usually dominated by hard-core Wikipedians or cranks (and the two categories often overlap).   Many articles are subject to perennial flame wars: whether Wikipedia’s trickster sister Encyclopedia Dramatica deserves an article (warning: the author of this post strongly encourages you not to visit Encyclopedia Dramatica), whether a formerly-German, now-Polish city on the Baltic should have its name rendered “Danzig” or “Gdansk” and whether its most famous inhabitant, Nicolaus Copernicus, should be a “Pole” or a “German” (a distinction Copernicus would not have understood).  Yet many of the most contentious flame wars are on subjects that one would not expect: race in antiquity.  See the talk page of the Ancient Egyptian Race Controversy page.  For over a century, there has been vigorous academic debate on the subject, and the popular debate on Wikipedia makes that academic debate look positively civilized by comparison.  The page comes with an astounding twenty-three archives of discussion and warnings telling you that the Arbitration Committee has placed the article under probation, that the subject is controversial and in dispute, that the article had been Wikipedia peer reviewed (such a thing does, in fact, exist), that the page survived a vote on deletion, and, amusingly enough, a little dove image telling the user to remember etiquette.  The article’s first archive alone is enough to give one a major headache, and the implication that there are twenty-two more spanning half a decade of running argument boggles the mind.  That this much discussion hides in the shadow of a relatively modest article shows both how much work goes into Wikipedia and how much controversy the past can create even after a gap of two and a half millennia.

Wikipedia shows that history is alive and well on the Internet, still arousing passions and still leading to ferocious debates.  It does, however, demonstrate that not all articles are created equally, and that one should not presume that your average Wikipedia article is of equal caliber to the ones with that tell-tale star, and that maybe, just maybe, one should look at the talk page before accepting the article’s contents as truth.