Out of the Archives and Onto Your Screens: Crowdsourcing at the Library of Congress


On October 24, 2018, the Library of Congress (LC) launched a new crowdsourcing program at Crowd.loc.gov. The unveiling of this new program corresponds with Librarian of Congress Dr. Carla Hayden’s 5-year Strategic Plan that “puts users first,” and grants increased access to LC’s holdings. This application builds off of previous Library of Congress crowdsourcing applications such as Beyond Words, Roll the Credits, and the 2008 Flickr initiative, The Commons.

So how does it work?

Users are encouraged to transcribe, review, and tag documents grouped together in thematic “campaigns,” posted to the Crowd website. Presently, there are five listed campaigns where users can choose to interact with a variety of sources—from official correspondence to speeches and diaries.

The website tracks the progress of each campaign using a blue bar. This means that in just three months, over 15,000 images of the “Letters to Lincoln” campaign have been transcribed and marked ready for review.

Crowd does not require participants to create a profile to transcribe files, however an account is necessary to review or tag documents.

Here is an example of the transcription portion, using a letter from the “Mary Church Terrell: Advocate for African Americans and Women” campaign.

The website allows multiple volunteers to work on the same page, save partially finished pages, and edit transcriptions of other participants before submitting for review. Options for full screen viewing and zoom help users focus on cursive letters and miniscule punctuation marks. Two buttons at the bottom of the page provide volunteers with “Quick Tips” for transcribing or redirect to the History Hub forum (moderated by LC staff members) for more complicated questions.

So, maybe you’re done transcribing and you’ve decided to review another user’s work. Here’s a 1908 letter from Terrell’s collection that’s ready for review:

On this page, the transcription box is locked until you press buttons to edit or accept the text. Once you accept the reviewed document, a box prompts you to submit tags for identification and organization. Varying perspectives among users are expected to provide diverse subject terms and expand the current Library index for an increasingly accessible database.

Now what happens once you submit the document for final review?

This speech from “Letters to Lincoln” has been completely reviewed and finalized. It will now appear in the campaign with an orange hyperlinked description. Users are still able to view the document; however, no further changes can be made. Upon selecting a finished page, a table lists the percentage of progress and the number of contributors for that page.

Once the entire campaign is reviewed and finalized by LC staff, users can select a link to view the finished product in the official online collection. Maintaining the finished documents within each section of the campaigns permits volunteers to return to their previous work and personally connect to the official Library of Congress collections.

Although a multitude of documents from the “Letters to Lincoln” campaign have already been transcribed and finalized, there are still four more sections waiting to be accessed. Will you shuffle through the William Oland Bourne’s disabled civil war veteran collection, or perhaps get lost in the papers of Clara Barton?

5 Replies to “Out of the Archives and Onto Your Screens: Crowdsourcing at the Library of Congress”

  1. Thank you for a very helpful post; I found the images particularly great. I think it is so exciting that this project is this new.

    As a potential user and a person who hates signing up, I love that you don’t even need to create an account, you will miss some features, but you can do pretty much the most important aspects. Also, their very first transcribe-a-thon sounds like such a good idea as it encourages people to compete and improve their work.

    As a historian, I must say I would be a little bit nervous about using this transcribed material. As Elissa Frankle, an education consultant at the United States Holocaust Memorial Museum and, points out in one of our readings, in order for this initiative to be good for scholarship, there have to be enough of good incentive for volunteers to do a fair job and for experts to monitor the outcome. A lot of it does rely on trust. But even with some reservations in mind, the benefits outweigh the concerns, because I would rather rely (very gratefully so) on the transcribed text in addition to me looking at the original handwritten text. It would make the process a bit faster.

  2. Haley, I found your post extremely helpful and easy to comprehend, especially with your images.

    I couldn’t help but compare this to Transcribe Bentham article. After checking out the site for myself, they make it so easy to jump right in to transcribing. Unlike the Bentham project, it seems more accessible to people who are not academics or amateur historians. But as you pointed out Laura, how does that affect the quality of the transcribed work?

    I agree that crowdsourcing makes the researchers job just a bit easier, even if you need to compare the transcription to your own reading of the source. There have been so many times where I’ve read a handwritten source and wished I had another set of eyes to confirm my own assessment.

  3. Thank you Laura and Erica for your comments! I’m glad you enjoyed the images and I hope if/when you ever have free time that you’ll consider doing some transcribing. It was an interesting experience that I particularly enjoyed.

    Laura–I also really liked the lack of barrier to entry if someone wants to transcribe, but doesn’t want to make an account. Although I feel like it raises questions of contribution recognition of users–or if you wanted to return to the document you had previously transcribed, it might be harder to locate. I think the concern about quality is valid, and hopefully having accounts for the preliminary review process helps weed out anyone without serious intent.

    Erica–From what I can tell, LC staff members monitor all of the work that before it’s officially published, like what Elissa Frankle does with the Lodz Ghetto Citizen’s History Project. But yes–there’s always a certain amount of trust placed on users to take the program seriously.

    I also really like that multiple people can transcribe one document and check each other’s work even before submitting for review. Some of the cursive letters are very difficult to discern and it’s comforting to know that maybe 15 more people will read it and catch mistakes.

  4. Thanks Haley for a great post!

    I really like this project and I think it finds a good middle ground between putting the public’s interest in history to good use and taking advantage of (free) volunteer labor). Although I think work should be compensated, I understand that many cultural institutions need to balance their limited budget with accessibility concerns. What I like about this project is that it increases the accessibility of historical documents and brings people into the research process without removing the need for historians and archivists. As many of the commentators pointed out on Alison Miner’s blog post, history and archive professionals are not in the field to transcribe or sell photos, but instead to provide context, interpretation, and access to the materials. This project allows librarians at the Library of Congress to do what they do best without sacrificing the time it takes to transcribe every document.

    It also made me think of TED founder Chris Anderson’s quote in Michael Peter Edson’s Dark Matter about “radical openness.” He said, “By opening up our translation program, thousands of heroic volunteers — some of them watching online right now, and thank you! — have translated our talks into more than 70 languages, thereby tripling our viewership in non-English-speaking countries.” I wonder if the next step of this process is allowing for translation as well as transcription, furthering the accessibility of the historical materials to people who do not speak English or speak English as a second language. I know translation is trickier than transcription, but I’m curious to explore how that would work and how that could impact historical research.

  5. Thank you Haley for this great how-to post and your thoughtful assessment of the By the People project (crowd.loc.gov). And thanks to those who have taken the time to write comments! I’m Victoria Van Hyning, one of the BTP community managers at the Library of Congress, formerly of Zooniverse.org at the University of Oxford. My colleagues on this project include Meghan Ferriter, formerly of the Smithsonian Transcription Center. I mention our former affiliations to help place BTP in context. Meghan and I have each worked on crowdsourcing for over 5 years, and are familiar and friendly with many of the people who have created and run other crowdsourcing transcription projects. We are also deeply familiar with a lot of literature on this subject, as well as many of the volunteers themselves.

    I just want to respond to a few of the points raised in the questions/comments, and to say that Meghan and I are overdue to write a blog post about the whys and wherefores of BTP! We hope to do this in the next two months.

    1) The degree to which we check incoming transcriptions: We are trying the radical trust approach of not doing page by page checks of the transcriptions. A few spot checks, but nothing more. This is for a three reasons. First, volunteers generally come to these projects with altruistic intent and a commitment to civic engagement. The quality of the transcriptions and discussion forum conversations we’ve seen in our former jobs and across many other projects is generally very good, and the commitment of volunteers to doing their best is clear. Second, the open reviewing system and the fact that the final person to make a change to a document cannot also give it final approval, means that mistakes and vandalism are minimal. I can’t stress enough how rare vandalism is. Third, fully checking each document is time consuming–if we had time to check in depth, we’d almost have the time to do the transcriptions in the first place, which we don’t! Fourth, transcription or palaeography is rarely clear cut. Every editor or transcriber will develop different ways of doing things, and I have frequently had the experience of transcribing or reading something one way, and then going back and reading it differently. We ask everyone to do their best, but to bear in mind the flexibility of interpretation. Having multiple eyes on the transcriptions really does help though, and clears up many potential errors.

    2) Concerns about quality: Our view is that word searchable text that covers 90% or more of a previously untranscribed document is worth any errors or lacunae. 90% coverage and accuracy is at the lower end of what we tend to see in volunteer transcription projects, and is comparable or better than existing OCR or handwritten text recognition software for print and manuscript sources. Even as HTR improves, crowdsourcing on the web still appears to be faster and more efficient.

    One important difference between BTP and Transcribe Bentham is that we’re not aiming to create documentary editions, but rather a searchable text that will improve accessibility. For this reason, we’ve kept tagging to a minimum.

    3) Valuing volunteer labor. This is so important, and something we do not take lightly. We conceive of this project as an opportunity for people to donate their time to the Library of course, but also an opportunity for them to learn, explore, and make new connections with this institution. To that end we have three full time community managers who support the project and the volunteers. Part of valuing people’s time is publishing the results of their work. We have already started to import transcriptions back into loc.gov and there is an attribution label to the project. We decided against placing attribution to individual volunteers into this label for a few reasons, but the big one is that studies of other ‘crowds’ reveal that not all volunteers are comfortable receiving attention in this way and attribution can sometimes lead to infighting. Registered volunteers are able to keep track of their own contributions however, so those who are motivated by keeping count of their work have a mechanism to do so. We hope that some of the programing we will offer such as in person and virtual transcribathons and exhibition tours will serve as a way of thanking volunteers for their effort. Finally, senior stakeholders consulted with the Library’s union about the ethics of the project, and it was agreed that because this is a form of labor that Library staff do not typically undertake and were never slated to undertake for the collections we’re putting in BTP, there was no threat to current or future workers.

    4) Translations: some volunteers are very keen to offer translations of materials they are encountering in BTP and are busily discussing conventions on History Hub. We hope to develop and expand our translation functionality in the coming year.

    Thanks again everyone for your great feedback and questions. Much appreciated!

Leave a Reply

Your email address will not be published.