Now that you’ve learned all about the theories behind conducting, using, and preserving oral history interviews from Alex‘s post on sound studies, let’s dig into some other innovative things digital historians have been doing in terms of making audio files more accessible.
High Performance Sound Technologies for Access and Scholarship, or HiPSTAS, is a project created by the School of Information at the University of Texas at Austin to “develop a virtual research environment in which users can better access and analyze spoken word collections.”
This initiative began out of a 2010 report by the Council on Library and Information Resources (CLIR) and the Library of Congress (LoC) that identifies the risk of audio deterioration as a result of unprocessed and inaccessible audio acquisitions in archives. The report echoes the concerns about the life of audio files after the oral history project has been completed, as laid out by Doug Boyd and Michael Frisch.
Titled “The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age,” the report identifies the paradox of unprocessed audio files: if researchers don’t use them, archives are less inclined to spend time and money processing them. But if the files remain unprocessed, researchers won’t be able to access them. While most of these issues stem from insufficient indexing of audio files from the time of donation, the report also places blame on the lack of developed software for analyzing and generating metadata.
Since 2013, HiPSTAS has sponsored three conferences (called the HiPSTAS Institute) to discuss issues facing archivists, librarians, and technology scholars when dealing with digital sound files. Hosted both physically and online, these workshops aimed to create a network of scholars, build up published studies in the field, and develop new software tools and techniques to help label unknown recordings.
The HiPSTAS creators set two goals:
- To “produce new scholarship using audio collections with advanced technologies such as classification, clustering, and visualizations”
- To contribute “to recommendations for the implementation of a suite of tools for collecting institutions interested in supporting advanced digital scholarship in sound.”
So, how do they plan on doing this? I’ll tell you how: Beta-testing, collaboration, and hosting several meetings of the minds (i.e. academics, graduate students, archivists, and other digital humanists).
Outcomes
The major component of the HiPSTAS Institutes was to develop a program known as ARLO (Adaptive Recognition with Layered Optimization). ARLO is an open source machine learning application that was originally created to study and classify bird calls by extracting audio features and displaying the data as a spectral graphs.
HiPSTAS pushes ARLO’s disciplinary bounds from science to the humanities by sponsoring a project where 20 participants experimented with the application to analyze spoken word recordings. The intent was to develop a program that would be applicable to humanities scholars by supporting longer files, implementing play-stop-fast-forward keys, and allowing multiple users to create and share tags. The participants used ARLO to record time and frequency information into a spectrogram, like so:
This graph is brought to you by the HiPSTAS Final White Paper, courtesy of Gertrude Stein saying “some such thing” from a reading of her novel, The Making of Americans.
In what is described as “instance-based learning,” participants trained ARLO with 27,000 sample clips from PennSound and 150 hours of folklore from the Dolph Briscoe Center for American History. ARLO then matches the patterns in sound clips based on pitch, rhythm and timbre. Colors are assigned to a numerical value of energy—white is the highest energy whereas black is the lowest.
Results of unsupervised learning and clustering of the Radio Venceremos Collection from the Guatemala Police Archives by ARLO, as explained by Abhinav Malhotra.
Through collaboration with WGBH Educational Foundation and the Pop Up Archive (a speech-to-text tool), HiPSTAS has made strides in facilitating the use of ARLO to identify raw footage in collections such as the American Philosophical Society of Native American Projects and the Lyndon B. Johnson Library. The HiPSTAS website currently hosts a series of blog posts with Audio Labeling Toolkits and highlighting projects using ARLO to tag previously unidentified files.
The Ongoing Process
Currently, HiPSTAS is funded by National Endowment for the Humanities Preservation and Access and Institute of Museum and Library Services (IMLS) with the long-term goal to inspire digital innovations that will one day instantly convert speech to text. While this goal is still out of reach, the implications of this technology would make archives searchable and accessible for researchers, with a particular benefit to people with hearing or reading disabilities.
So, in what ways have you seen people and repositories responding to the issue of unlabelled audio files and deterioration? What kinds of problems do you think will accompany unsupervised computer batch classification?
I’m really intrigued by this because it is aiming to be more than a transcription. After taking the oral history class last semester, I’ve realized how much a transcription leaves out—cadences, volume, and emotion. It is up to the transcriber to retain the speech of the narrator with little interpretation, while still making the transcript as readable as possible—a harder task than it seems. This project seems to be trying to find a scientific way to document a narrator’s voice in addition to the actual words.
I think this would be extremely helpful not only because it would make oral histories more accessible, both in terms of the number of oral histories transcribed and available digitally, but also because it would allow people with hearing or reading disabilities to explore the nuances of an oral history without having to listen to the entire interview. I also think it would be interesting to have all oral histories transcribed by the same source. No two transcribers will transcribe an oral history the same way, so not transcript is without interpretation of some sort. Although these would be completely objective, a software allows for the same interpretation across all oral histories, which would help make clear what that influence is on the interview.
So, overall, I am very excited about the potential for this project!