Time for a deep dive into file formats

If your oral history project includes the creation of any sort of digital audio or video files—and it almost definitely will—you are going to need to make some informed decisions about what file formats you are going to use to store your data.

Kara Van Malssen’s article, “Digital Video Preservation and Oral History,” offers a highly practical introduction to how you might begin to make those decisions. If Van Malssen leaves you wondering what the big deal is about formats anyway, the “Format Theory” chapter of Jonathan Sterne’s MP3: The Meaning of a Format offers an interesting historical look at what formats mean and how they develop.

Digital video preservation

When you’re creating digital video files, it’s not great form to just pick up a camera and jet off to the races. The decisions that you make in the earliest stages of a video’s creation have lasting implications for its preservation later on.

Van Malssen provides a very helpful in-depth look at these decisions, but for the purposes of this blog post, we’ll content ourselves with understanding some basic information about file formats and reviewing some of Van Malssen’s overall recommendations.

Anatomy of a video file

The important components of a digital video file are the file wrapper and the encoded video and audio tracks.

The file wrapper dictates what we’d think of as the format, which gets represented as an extension. The file wrapper binds the video and the audio tracks together and stores metadata. Some common extensions for video files include:

  • .mov
  • .avi
  • .mpg
  • .wmv

When we talk about encoded tracks, we’re acknowledging that within the file wrapper, the audio and video tracks are created using different codecs. These codecs encode the tracks for storage and then decode them at the moment of playback.

Van Malssen offers several examples of popular codecs:

  • H.263
  • DV (Digital Video)
  • Apple ProRes
  • MPEG-2
  • MPEG-4

Understanding the makeup of your digital files is key to preserving them.  Now, let’s review some of Van Malssen’s best practices for preserving your files.

Recommendations for digital video preservation

  • Choosing a recording device: Get one that uses one of the codecs listed above (others might be hard to support, and may not even be playable one day) and that produces video at the highest bit rate you can possibly support.  You can always compress your video to reduce its file size, but you can never restore bits that weren’t recorded in the first place.  It’s like the opposite of seasoning while cooking.
  • Transcoding: Transcoding is moving a file from one encoding format to another.  It always results in a loss of quality, so transcode judiciously!  What does that look like?  Basically, make sure you keep different versions of your file for your different purposes.
    • Creating a preservation master file: The point of a preservation master is to keep your original footage intact at the highest possible resolution.  You can use it to create new versions of your file, but you want to preserve the original file’s integrity as much as possible.  Store this safely, and don’t replace it with any of your edits!
    • Creating a mezzanine file: This will be your working copy, which you use to create new edits and proxies as you need them.  If you don’t need to make any dramatic changes to your file size, you may not need a mezzanine.
    • Creating a proxy file: This is your low-resolution file that you use for distribution, especially online.
  • File naming: Use a clear, consistent file name convention to make managing your collection easier.
  • Metadata: Similarly, use consistent, descriptive metadata.  Van Malssen recommends using a tool like MediaInfo to collect technical metadata output attached to your files, and to use standards such as the Library of Congress’ VideoMD or the Corporation for Public Broadcasting’s PBCore to keep it consistent.
  • Storage: Store at least two copies of your preservation master in two different storage locations, even in two separate geographic locations.  Your files can degrade over time as your storage material decays, and having more than one copy of your master on more than one storage medium is a good way to safeguard against that!  Storing smaller mezzanine or proxy files in the cloud can be a good idea, but your preservation masters should be stored on hard drives, data tapes, or both.
  • Preservation planning: Use open-source, standard file formats and codecs, like those listed above, to keep your files accessible long-term.  Keep up with the technological landscape so that you know if the file formats you’re using are at risk of becoming obsolete, and keep your original files in as high of quality as possible to ensure for the best possible outcome if you do need to transcode them.

Format theory

After all that discussion of the practical implications of formats, Jonathan Sterne’s “Format Theory” chapter interrogates the idea of a format.  The MP3 is the most common audio storage format used today, but, as anyone who’s ever spoken to a person who really cares about headphones knows, it’s certainly not the audio storage format that allows for the highest quality.  So what gives?

Put simply, the MP3 is small.  It compresses recorded audio and uses significantly less bandwidth than other formats, which is ideal for transferring files and communicating.  In contrast with Van Malssen’s advice to keep your files in the largest format you can, the proliferation of the MP3—a super-compressed, lossy format—puts a premium on distribution over preservation of quality.

Sterne explains this by contextualizing the MP3 within a history of compression.  “As people and institutions have developed new media and new forms of representation, they have also sought out ways to build additional efficiencies into channels and to economize communication in the service of facilitating greater mobility,” he writes.  Over the course of time, so many of our attempts to make media more widespread and easier to share have resulted in compressing the media.

When he argues for the importance of format theory, Sterne encourages us to view formats as a part of history, entrenched in a context that reflects the cultural moment in which they become popular, as well as the operational and industrial needs of that moment. 

Both of these pieces of context inform which formats become popular.  Culturally, in the case of the MP3, many people prefer distorted audio over verisimilitude, and many prize easily sharable audio over very high-quality audio.  In terms of operational needs, I remember an instance years back, in which a friend shared an album with me in AAC format, which allows for higher-quality sound at about the same bandwidth as an MP3 file, and I was frustrated because I wanted to burn a CD to listen to in my car, but the software I had available would not let me burn AAC files to a disc. 

Considering this outside context complicates the idea of formats progressing in a linear fashion to higher and higher quality and explains why some formats succeed and others don’t.

How does understanding format theory enhance our understanding of digital file preservation?  What are the implications of the proliferation of the MP3 on the prospect of preserving modern files?

8 Replies to “Time for a deep dive into file formats”

  1. Great post Katherine, preserving digital data is a topic that my undergraduate professor discussed at length. She was very concerned with how we are to preserve digital data when technology keeps changing. I feel that if she read this collection of articles and your post she would have a better understanding of the steps being taken to preserve data in this ever-changing digital era. By understanding these different formats we can better prepare to preserve this data. By understanding that some formats have better quality, while others are better for distribution, allow preservationists to make more informed decisions when conducting their projects.
    As with the case of MP3, it is a format that makes distribution easier, which is good for sharing these oral histories and other digital data. I have found that the more something is shared, such as a popular cat video, the more likely it is to survive the technical changes. But this format does not have the best quality, putting its preservation at risk. I think Van Malssen’s file preservation practices save both the quality of the data and allow for changes when necessary. Having a master file that does not change ensures the original is preserved, and building other forms from that, such as ones that can be better distributed, provides sharing possibilities.

    1. Thanks! I think your undergraduate professor definitely raised a valid concern—anyone who’s ever worked in an archive and has had to work with digitizing VHS tapes knows firsthand how transient some of these formats can be. That’s one of the things Van Malssen offers a lot of helpful insight into: she points you in the direction of formats that are less likely to become obsolete any time soon, or at least formats that have a robust enough userbase that you can trust to offer support if obsolescence does start looming. As you point out, she also offers really useful practical advice for how best to preserve your digital files in consideration of format changes.

  2. Thanks for breaking this down in a straight-forward fashion! I think a lot of people in the industry of audio production, whether that’s music, podcasts, interviews, etc., also don’t have great knowledge on what files work best for preserving audio beyond immediate use. I edit audio probably every two weeks, and just export what works, not necessarily works best.

    1. I’m sure you’re not alone in that, either! I usually choose an image file type purely at random, depending on sunspot activity at the moment of decision, whether I’ve seen a bird that day, how whimsical I’m feeling, etc., and then just change it if I have to, like if I want to upload it to a website, but that website doesn’t accept PNGs. These readings have indicated to me that this is not necessarily the best practice!

      I liked Van Malssen’s point that you can always make your file smaller by compressing it, but you can never restore a file to high quality if it wasn’t saved in high quality in the first place. With the example of audio, I’d think that her suggestion would be to figure out, based on your recording equipment, your editing software, and your storage capabilities, what the highest possible quality at which you can save your work is, and to use that. From there, you can always transcode your juicy, high quality, lossless audio into an MP3 file for easy distribution (while keeping your preservation master, of course).

  3. I thought Van Malssen’s article was a really helpful tool for a small organization starting up a oral history project. It made me wonder though (mostly because Frisch in his article mentioned family videos, and Sean talked about it last class) how individual working on a family history could use her guide. Obviously they would want to have the best quality audio and video they could have, but may not have access to the tools like a high tech camera or microphone. There are still steps they can follow, for example creating the different files Van Malssen suggests to preserve and edit, but there are many practices that wouldn’t be feasible.

    1. I think you do point out a slightly weaker part of Van Malssen’s article—although she does make the point several times that you need to evaluate your own resources to decide what the highest possible quality you can manage is, the article on the whole seems geared toward audiences that have enough resources to get really high-quality recordings in the first place but that don’t know how best to preserve those recordings. (Her suggestion that you keep two copies of your preservation master in two separate geographic locations also seems to indicate that she expects you to have a decent pool of resources!)

      From my understanding, I think her principles would still stand for an individual without access to great equipment, and that her advice would still be to make the original the best it can be, even if that’s not going to be screamingly high quality. I also think her advice about choosing formats is perhaps even more important for such an individual, who might be poorly equipped to transcode their files well in the event of their original codec becoming obsolete.

  4. A very helpful post, Katherine! Van Malssen’s post was particularly useful and Sterne’s was especially insightful. Yet after having read them, I still feel at a loss. For all of the conversation about the eventual degradation of video files, I have little sense of how long it takes for certain files and the hardware that stores it to degrade. Does anyone happen to have some introductory information on this question?

    Also, your question about the implications of the proliferation of MP3 is pressing to say the least. I cannot begin to answer it. But your question raises yet another question: How will digital preservation experts and historians decide which files to preserve and which to let degrade? Are the methods to go about this different from how historians and document preservationists decide on which archival material to preserve and which to keep out of the archives? My guess is that the amount of recorded data produced by everyone on the planet must have, at some point, surpassed the sum of data preserved from years before, say, 1900. How have historians sorted through it?

    Finally, I’d be interested to learn about the preservation of digital image files, which are meant to extend the life of information that is made known in written documents and images that have degraded since their creation. Are the same questions and considerations made about video and MP3 files as they are about digitized images of archival and library sources?

  5. Thanks for this breakdown, as someone who is often overwhelmed by technology, this was really helpful. I remember taking the historical of curatorial practice last summer and learning that digital files degrade over time — and I was shook! At the moment I had a mini existential crisis about preservation of objects, if most physical objects break down over time, and even an electronic version of that object will eventually be distorted too, what is the point!?

    Nonetheless, I have since recovered from that, and I think information about file types and what not is key to being an effective historian in the digital age. I myself, though just starting out in the field, am an example of how a more traditional historical undergraduate education can lead to ignorance about digital preservation. Perhaps this will be less of an issue moving forward as we become a more and more tech savvy society and this kind of knowledge will be as ubiquitous as how to make a cup of coffee. Until then, texts like the above are key.

Leave a Reply to KMcCarthy Cancel reply

Your email address will not be published. Required fields are marked *