Now that you’ve learned all about the theories behind conducting, using, and preserving oral history interviews from Alex‘s post on sound studies, let’s dig into some other innovative things digital historians have been doing in terms of making audio files more accessible.
High Performance Sound Technologies for Access and Scholarship, or HiPSTAS, is a project created by the School of Information at the University of Texas at Austin to “develop a virtual research environment in which users can better access and analyze spoken word collections.”
This initiative began out of a 2010 report by the Council on Library and Information Resources (CLIR) and the Library of Congress (LoC) that identifies the risk of audio deterioration as a result of unprocessed and inaccessible audio acquisitions in archives. The report echoes the concerns about the life of audio files after the oral history project has been completed, as laid out by Doug Boyd and Michael Frisch.
Titled “The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age,” the report identifies the paradox of unprocessed audio files: if researchers don’t use them, archives are less inclined to spend time and money processing them. But if the files remain unprocessed, researchers won’t be able to access them. While most of these issues stem from insufficient indexing of audio files from the time of donation, the report also places blame on the lack of developed software for analyzing and generating metadata.
Since 2013, HiPSTAS has sponsored three conferences (called the HiPSTAS Institute) to discuss issues facing archivists, librarians, and technology scholars when dealing with digital sound files. Hosted both physically and online, these workshops aimed to create a network of scholars, build up published studies in the field, and develop new software tools and techniques to help label unknown recordings.
The HiPSTAS creators set two goals:
- To “produce new scholarship using audio collections with advanced technologies such as classification, clustering, and visualizations”
- To contribute “to recommendations for the implementation of a suite of tools for collecting institutions interested in supporting advanced digital scholarship in sound.”
So, how do they plan on doing this? I’ll tell you how: Beta-testing, collaboration, and hosting several meetings of the minds (i.e. academics, graduate students, archivists, and other digital humanists).
The major component of the HiPSTAS Institutes was to develop a program known as ARLO (Adaptive Recognition with Layered Optimization). ARLO is an open source machine learning application that was originally created to study and classify bird calls by extracting audio features and displaying the data as a spectral graphs.
HiPSTAS pushes ARLO’s disciplinary bounds from science to the humanities by sponsoring a project where 20 participants experimented with the application to analyze spoken word recordings. The intent was to develop a program that would be applicable to humanities scholars by supporting longer files, implementing play-stop-fast-forward keys, and allowing multiple users to create and share tags. The participants used ARLO to record time and frequency information into a spectrogram, like so:
In what is described as “instance-based learning,” participants trained ARLO with 27,000 sample clips from PennSound and 150 hours of folklore from the Dolph Briscoe Center for American History. ARLO then matches the patterns in sound clips based on pitch, rhythm and timbre. Colors are assigned to a numerical value of energy—white is the highest energy whereas black is the lowest.
Through collaboration with WGBH Educational Foundation and the Pop Up Archive (a speech-to-text tool), HiPSTAS has made strides in facilitating the use of ARLO to identify raw footage in collections such as the American Philosophical Society of Native American Projects and the Lyndon B. Johnson Library. The HiPSTAS website currently hosts a series of blog posts with Audio Labeling Toolkits and highlighting projects using ARLO to tag previously unidentified files.
The Ongoing Process
Currently, HiPSTAS is funded by National Endowment for the Humanities Preservation and Access and Institute of Museum and Library Services (IMLS) with the long-term goal to inspire digital innovations that will one day instantly convert speech to text. While this goal is still out of reach, the implications of this technology would make archives searchable and accessible for researchers, with a particular benefit to people with hearing or reading disabilities.
So, in what ways have you seen people and repositories responding to the issue of unlabelled audio files and deterioration? What kinds of problems do you think will accompany unsupervised computer batch classification?