HiPSTAS Tagged Audio Before It Was Cool

Now that you’ve learned all about the theories behind conducting, using, and preserving oral history interviews from Alex‘s post on sound studies, let’s dig into some other innovative things digital historians have been doing in terms of making audio files more accessible.

High Performance Sound Technologies for Access and Scholarship, or HiPSTAS, is a project created by the School of Information at the University of Texas at Austin to “develop a virtual research environment in which users can better access and analyze spoken word collections.”  

This initiative began out of a 2010 report by the Council on Library and Information Resources (CLIR) and the Library of Congress (LoC) that identifies the risk of audio deterioration as a result of unprocessed and inaccessible audio acquisitions in archives. The report echoes the concerns about the life of audio files after the oral history project has been completed, as laid out by Doug Boyd and Michael Frisch.

Titled “The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age,” the report identifies the paradox of unprocessed audio files: if researchers don’t use them, archives are less inclined to spend time and money processing them. But if the files remain unprocessed, researchers won’t be able to access them. While most of these issues stem from insufficient indexing of audio files from the time of donation, the report also places blame on the lack of developed software for analyzing and generating metadata.

Since 2013, HiPSTAS has sponsored three conferences (called the HiPSTAS Institute) to discuss issues facing archivists, librarians, and technology scholars when dealing with digital sound files. Hosted both physically and online, these workshops aimed to create a network of scholars, build up published studies in the field, and develop new software tools and techniques to help label unknown recordings.

The HiPSTAS creators set two goals:

  1. To “produce new scholarship using audio collections with advanced technologies such as classification, clustering, and visualizations”
  2. To contribute “to recommendations for the implementation of a suite of tools for collecting institutions interested in supporting advanced digital scholarship in sound.”

So, how do they plan on doing this? I’ll tell you how: Beta-testing, collaboration, and hosting several meetings of the minds (i.e. academics, graduate students, archivists, and other digital humanists).


The major component of the HiPSTAS Institutes was to develop a program known as ARLO (Adaptive Recognition with Layered Optimization). ARLO is an open source machine learning application that was originally created to study and classify bird calls by extracting audio features and displaying the data as a spectral graphs.

HiPSTAS pushes ARLO’s disciplinary bounds from science to the humanities by sponsoring a project where 20 participants experimented with the application to analyze spoken word recordings. The intent was to develop a program that would be applicable to humanities scholars by supporting longer files, implementing play-stop-fast-forward keys, and allowing multiple users to create and share tags. The participants used ARLO to record time and frequency information into a spectrogram, like so:

This graph is brought to you by the HiPSTAS Final White Paper, courtesy of Gertrude Stein saying “some such thing” from a reading of her novel, The Making of Americans.

In what is described as “instance-based learning,” participants trained ARLO with 27,000 sample clips from PennSound and 150 hours of folklore from the Dolph Briscoe Center for American History. ARLO then matches the patterns in sound clips based on pitch, rhythm and timbre. Colors are assigned to a numerical value of energy—white is the highest energy whereas black is the lowest.

Results of unsupervised learning and clustering of the Radio Venceremos Collection from the  Guatemala Police Archives by ARLO, as explained by Abhinav Malhotra.

Through collaboration with WGBH Educational Foundation and the Pop Up Archive (a speech-to-text tool), HiPSTAS has made strides in facilitating the use of ARLO to identify raw footage in collections such as the American Philosophical Society of Native American Projects and the Lyndon B. Johnson Library. The HiPSTAS website currently hosts a series of blog posts with Audio Labeling Toolkits and highlighting projects using ARLO to tag previously unidentified files.

The Ongoing Process

Currently, HiPSTAS is funded by National Endowment for the Humanities Preservation and Access and Institute of Museum and Library Services (IMLS) with the long-term goal to inspire digital innovations that will one day instantly convert speech to text. While this goal is still out of reach, the implications of this technology would make archives searchable and accessible for researchers, with a particular benefit to people with hearing or reading disabilities.

So, in what ways have you seen people and repositories responding to the issue of unlabelled audio files and deterioration? What kinds of problems do you think will accompany unsupervised computer batch classification?

Sound Studies in the Digital Age

This week’s readings help us dive deep into the world of digital Audio. First, Doug Boyd gives us a run down on everything we should consider when designing an Oral History project. Then, Michael Frisch pushes us to reconsider how we use and organize the audio and video we collect. Finally, Wendy F. Hsu challenges us to think about sound differently to conceptualize a new methodological framework of augmented empiricism through Digital Ethnography.

 As Boyd points out, conducting Oral History is a great privilege and the work required to prepare, conduct, transcribe, disseminate, and relate the oral histories we collect to larger historical narratives is no small feat. But as they say, with great privilege comes great responsibility. The choices made in the design of a project influence the its overall success and development. Boyd thus encourages us to consider the following:

Why are you doing this project and what’s the desired outcome?

Think of a project mission statement and write it down (consulting and communicating with project partners where applicable.) This helps oral history project designers stay focused and on task. A mission statement works as a reminder of why the project was initiated in the first place. Along with that, will these interviews be used for broadcast or production? Will they be hosted online or adapted into another format? Figuring out the answers to these questions will influence choice made on the equipment purchased and used for the project.

What recording equipment works best for you and what are your budgetary needs?

What microphone, audio, or video equipment is best for your project? How familiar are you with these technologies and software? Do you have access to the necessary trainings? Another way of thinking about this question is to ask who the intended audience of this project is? Thinking about these questions will point oral historians to the direction of their most suited technology.

If the project will later be used for production, professional quality equipment will be necessary and this can add up quickly. Consumer equipment can work just as well, but this hinges on the project’s needs. You also have to consider how and where audio or video files will be stored. External drives and servers are also costly and if a transcription service is hired out to work on these files that’s another cost to consider as well. You also have to think about project dissemination, web space, and software

Next, you’ll need to consider your level of expertise.

If you’re unfamiliar with current audio, video, or computer technologies, you’re going to want to learn, attend workshops, read manuals, and practice. You’ll want to know how to use your equipment properly before the interview takes place. There’s nothing worse than completing an interview and realizing the recorder was off the whole time.

Your digital storage and archival strategy should also be thought out.

Digital records create massive files and you’ll want to be prepared and have pre-planned strategy for storage. Consulting your archive partner can help with this. They should have the means necessary to undertake the expensive and complex digital preservation and curation of audio and video materials. You’ll also want to ask them about the work flows, protocols, and release  policies they follow.

One of the biggest questions you’ll want to consider as a part of your project design are the legal and ethical issues you might encounter over the course of your project and how you plan to confront them. As Boyd points out, oral history can be incredibly intimate and the life stories that interviewees share can have wider implications after project disseminations. This is something that should be thoroughly contemplated by the project staff before making interviews publicly accessible with informed consent.

Boyd’s point about the end product of an oral history project is something Michael Frisch has many thoughts on. Frisch encourages users to think about the life of an oral history interview once the recorder is turned off. In many cases, interviews are transcribed, and its meaning and interpretation is derived from the transcript. The layers of meaning found within the context and setting of an interview and the interviewees gesture, tone, body language, expression, pauses, and movements that enrich and fully contextualize an interview are lost. Text has become the go to mode for the life of an oral history due to ease of use and sharing, but Frisch shows that with the digital revolution, we can put the oral back in oral history.

In the digital age, all data is relatively the same and can be expressed as digital information that can be organized, searched, extracted, and equally integrated and instantly and non-linearly accessible Because of this Frisch calls for a “post-documentary sensibility” for oral history, where digital audio and video are annotated, cross-reference, and organized by other types of descriptive or analytic meta-data linked to specific passages of interview content. In doing so the audio or video itself is the source that is searched, studied, and referenced by researchers and users, returning the actual voice and embodied meanings to oral history.

While these projects have been approached differently due to scale, Boyd provides several examples of what this can look like. You can look at the Survivors of the Shoah Visual History Foundation and work done by the Randforce Associates to compare.  These projects, and those Frisch hopes to see in the future, can provide accessible, meaningful, fluid, and non-privileged access to the content of oral history.

Wendy F. Hsu steps out of the oral history framework and thinks about sound a little differently than our friends Boyd and Hirsh. Hsu is an ethnographer who wants to further develop the methodologies of digital ethnography and expand the term’s definition. She looks at digital technology as a platform for collecting, exploring, and expressing ethnographic materials. Her project on Asian American Musicians and independent rock music shows how digital technology can provide new empirical perspectives on space and place to develop new methods of inquiry and visualization.

Hsu found that the Asian American musicians she studied spent much of their time on Myspace networking and promoting their music, rather than performing. She then built a webscraper bot to extract information on the location of users engaged with these bands. The bot was able to crawl through the information of more than 2,500 “friends” of the band which enabled her to physically map their location and thus the digital environment that these users exist in. This visualization allowed Hsu to uncover patterns of social behavior and cultural meaning that would otherwise be inaccessible. Her quantitative findings answered how and what questions while her ethnographic training helped her figure out the why.

Hsu was also able to look to the music of these bands directly to uncover new insights as well. Hsu used digital audio software DAW and Audacity to learn more about a group of nakashi musicians entitled The Wandering Blind Singers.[1] Hsu was able to identify that the recordings she had access to were recorded in mono, a form that was typical to the Taiwan music scene in the 1970s. She realized that these recordings likely took place in a tv studio which shed light on the fringe position of these musicians in the music industry and their lower-class status in society in general.

These are but some of the conclusions and observations that Hsu was able to make using close and distant listening, digital visualization, and other digital technologies. Her work shows that new methodological frameworks are necessary to expand the breadth of digital ethnography and its changing landscapes.

Boyd, Frisch, and Hsu gave us a lot of unpack. There are so many ways to think about and work with sound in the digital age. Share your thoughts about these readings on sound and the digital world below.

[1] Nakashi is a postcolonial itinerant music-culture in Taiwan.