HiPSTAS Tagged Audio Before It Was Cool

Now that you’ve learned all about the theories behind conducting, using, and preserving oral history interviews from Alex‘s post on sound studies, let’s dig into some other innovative things digital historians have been doing in terms of making audio files more accessible.

High Performance Sound Technologies for Access and Scholarship, or HiPSTAS, is a project created by the School of Information at the University of Texas at Austin to “develop a virtual research environment in which users can better access and analyze spoken word collections.”  

This initiative began out of a 2010 report by the Council on Library and Information Resources (CLIR) and the Library of Congress (LoC) that identifies the risk of audio deterioration as a result of unprocessed and inaccessible audio acquisitions in archives. The report echoes the concerns about the life of audio files after the oral history project has been completed, as laid out by Doug Boyd and Michael Frisch.

Titled “The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age,” the report identifies the paradox of unprocessed audio files: if researchers don’t use them, archives are less inclined to spend time and money processing them. But if the files remain unprocessed, researchers won’t be able to access them. While most of these issues stem from insufficient indexing of audio files from the time of donation, the report also places blame on the lack of developed software for analyzing and generating metadata.

Since 2013, HiPSTAS has sponsored three conferences (called the HiPSTAS Institute) to discuss issues facing archivists, librarians, and technology scholars when dealing with digital sound files. Hosted both physically and online, these workshops aimed to create a network of scholars, build up published studies in the field, and develop new software tools and techniques to help label unknown recordings.

The HiPSTAS creators set two goals:

  1. To “produce new scholarship using audio collections with advanced technologies such as classification, clustering, and visualizations”
  2. To contribute “to recommendations for the implementation of a suite of tools for collecting institutions interested in supporting advanced digital scholarship in sound.”

So, how do they plan on doing this? I’ll tell you how: Beta-testing, collaboration, and hosting several meetings of the minds (i.e. academics, graduate students, archivists, and other digital humanists).


The major component of the HiPSTAS Institutes was to develop a program known as ARLO (Adaptive Recognition with Layered Optimization). ARLO is an open source machine learning application that was originally created to study and classify bird calls by extracting audio features and displaying the data as a spectral graphs.

HiPSTAS pushes ARLO’s disciplinary bounds from science to the humanities by sponsoring a project where 20 participants experimented with the application to analyze spoken word recordings. The intent was to develop a program that would be applicable to humanities scholars by supporting longer files, implementing play-stop-fast-forward keys, and allowing multiple users to create and share tags. The participants used ARLO to record time and frequency information into a spectrogram, like so:

This graph is brought to you by the HiPSTAS Final White Paper, courtesy of Gertrude Stein saying “some such thing” from a reading of her novel, The Making of Americans.

In what is described as “instance-based learning,” participants trained ARLO with 27,000 sample clips from PennSound and 150 hours of folklore from the Dolph Briscoe Center for American History. ARLO then matches the patterns in sound clips based on pitch, rhythm and timbre. Colors are assigned to a numerical value of energy—white is the highest energy whereas black is the lowest.

Results of unsupervised learning and clustering of the Radio Venceremos Collection from the  Guatemala Police Archives by ARLO, as explained by Abhinav Malhotra.

Through collaboration with WGBH Educational Foundation and the Pop Up Archive (a speech-to-text tool), HiPSTAS has made strides in facilitating the use of ARLO to identify raw footage in collections such as the American Philosophical Society of Native American Projects and the Lyndon B. Johnson Library. The HiPSTAS website currently hosts a series of blog posts with Audio Labeling Toolkits and highlighting projects using ARLO to tag previously unidentified files.

The Ongoing Process

Currently, HiPSTAS is funded by National Endowment for the Humanities Preservation and Access and Institute of Museum and Library Services (IMLS) with the long-term goal to inspire digital innovations that will one day instantly convert speech to text. While this goal is still out of reach, the implications of this technology would make archives searchable and accessible for researchers, with a particular benefit to people with hearing or reading disabilities.

So, in what ways have you seen people and repositories responding to the issue of unlabelled audio files and deterioration? What kinds of problems do you think will accompany unsupervised computer batch classification?

Perhaps I Need to Rethink My Day Job Transcribing Oral History….

In the HiPSTAS (High Performance Sound Technologies for Analysis and Scholarship) grant proposal, the authors express the hope that participants in their program “will understand better how to ‘imagine what they don’t know.’” The readings for this week make clear that the practice of oral history could be and probably should be so much more than it has been heretofore envisioned and practiced, where, at least in my conception of the subject, a historian interviews a bunch of people about a particular topic, has their tapes transcribed, produces a book or a documentary using some of the material in the recordings, and then files the tapes away in a box (possibly in an archive and maybe even with some cataloging) that is likely never to see the light of day again.

In terms of “doing” oral history, the two most conventional readings in this regard are Doug Boyd’s “Designing an Oral History Project” and Kara Van Malssen’s “Digital Video Preservation and Oral History.” Boyd points out that there’s a lot for the historian to think about beyond just the questions that will be asked of a subject when designing an oral history project, and both authors urge the practitioner to think holistically about the project ahead of time, to include not only pre-production and the point of capture, but also considering the entire lifecycle of the project, including editing, archival storage, and future access. “Early choices you make in a project will affect later opportunities,” notes Boyd. “Decisions have consequences.”

While Van Malssen’s discussion of video formats looks forward towards the future and considers issues of preservation of recordings, Jonathan Sterne instead looks backwards at the history of the currently ubiquitous MP3 audio format to examine how decisions going back at least 100 years have had implications for the specifications of this particular format, which reference, sometimes for no better reason than this is how it’s done now so let’s stick with it, specifications from other earlier formats. Stern argues that “encoded in every MP3 are whole worlds of possible and impossible sound and whole histories of sonic practices.” (2)

Particularly important in Sterne’s work is the notion of “format theory,” which I think boils down to the choice of format is not benign because “Format denotes a whole range of decisions that affect the look, feel, experience, and workings of a medium. It also names a set of rules according to which a technology can operate.” (7) The assumptions and specifications engendered in each format affect the user’s/listener’s experience of and relationship with the media, and thus in Sterne’s view, it is important to understand how the format mediates the material.

In “Oral History and the Digital Revolution,” Michael Frisch offers an example that I think illustrates the idea of format theory and provides a basis for redefining what we even think oral history is. Frisch’s work illustrates that the audio and videocassette format of oral history recordings have had a profound effect on accessing these resources and understanding the content of these tapes as well. An assumption of oral history practice is that linear analog tapes are a pain to work with and therefore transcoding if you will the content of the recordings from audio or video to text by means of transcription is the best and fastest way for a researcher to access and engage with the content of a recording, to the point that transcription is viewed as an essential procedure. Frisch argues, however, that a great deal of meaning is lost in the translation of sound into text. “Meaning inheres in context and setting, in gesture, in tone, in body language, in expression, in pauses, in performed skills and movements. To the extent we are restricted to text and transcription, we will never locate such moments and meaning, much less have the chance to study, reflect on, learn from, and share them.” (2)

Digital formats, however, offer new possibilities for oral historians. Using timecodes, annotation, and other metadata linked to content, it is easy to quickly dive into digitized materials directly at any point of particular interest in the recording. Thus the recording itself rather than the inherently different experience of a transcript of it becomes the object of study and in Frisch’s words “put[s] the oral back in oral history.” By studying the recording directly, the researcher can engage in what Nancy Davenport, cited in the HiPSTAS proposal, refers to as “deep listening” or “listening for content, in note, performance, mood, texture, and technology.” This additional information beyond the content of the recording in its strict, text-based sense may allow the researcher to gain new insight into the meaning of what has been recorded.

Ethnographer Wendy Hsu seeks to move away from the digital text as object of study paradigm however and “shift the focus of the digital from a subject to a method of research” by combining various quantitative, data-oriented computational analysis techniques with traditional qualitative ethnographic methods including direct observation and interviews to identify, document, and consider the meaning of patterns and processes related to her subject matter, which is musicians of the Asian diaspora. The data-generated patterns uncovered by the quantitative means inform questions that can be further explored qualitatively. Some of the methods she has employed in her work include mapping the geographic locations of bands’ fans by scraping location information from the bands’ MySpace friends’ pages, analyzing non-song sounds in song recordings to learn about the context of the creation of the recording, and using spectrograms to visually analyze stylistic qualities of music.

So how might historians apply similar “doing digital” techniques to their own work with audio and video artifacts? That is very much an open question and one that I’m not sure the readings answered very well. However, one of the stated aims of the HiPSTAS project is to bring together archivists/librarians, scholars, and computer scientists in an effort to create new tools to facilitate the study of sound recordings by means such as clustering, classification, and visualization. Archives are already storing quite a bit of oral history recordings that go unlistened to or unwatched, a valuable resource that Frisch notes goes “largely untapped.” And the HiPSTAS team makes a pretty good point that if researchers don’t start using existing audio collections, then repositories won’t have much incentive to keep storing the old recordings, let alone augment their collections with new materials, so it really is imperative for history scholars to find means to unlock the potential of these audio resources.

There was really a lot going on in these readings and I feel like I barely scratched the surface here of the many issues that the various authors raised. Returning to where I began this though, the readings did really challenge my perception of what exactly oral history is. It isn’t just about interviews or even necessarily the spoken word. A wide variety of preserved audio such as musical performances, ambient sound, speeches, poetry readings, and the telling of stories that have been passed through generations by way of oral tradition can reveal valuable information about the past (or present-day) life and culture. All sorts of sound-based documents could serve as potential primary source material given useful means of incorporating the information they provide or could reveal into one’s historical analyses. This may well be a bit of a “duh” to everyone else, but I guess that’s something that I just had never really considered before. Now I’m trying to imagine what else I don’t know.

What other issues did this week’s readings raise for you regarding the possibilities and potentials brought about by digital means and methods as applied to oral or audio history?