Curating texts and contexts

I was originally interested in this course for its focus on both arts materials and digital curation, both of which I’m hoping to work with in the future. Many of the concepts and issues we’ve been discussing over the last few months have stayed with me, coming to mind both in the course of my other archival coursework and beyond. As I’ve mentioned before, platform theory and format theory kept reemerging for me throughout our discussions. They provide useful frameworks for thinking about how a work’s context affects its creation and reception in powerful but not-always-obvious ways. These effects mean the various formats or platforms involved need to be considered not just for technical preservation purposes, but also for understanding the meaning and significance of the work.  Continue reading “Curating texts and contexts”

ThruYou – Archival Information Package

In creating the archival information package for ThruYou, I had several types of material to collect: The final Thru-You videos, the source videos used to create each video, the YouTube webpage for each ThruYou video, the ThruYou webpage, screenshots of each YouTube and ThruYou page, YouTube comments for each video, the Researching ThruYou webpage, and two commentaries from source video creators (a Reddit thread and a YouTube video).

For the eight ThruYou project videos, I used Complete YouTube Saver, a Firefox extension, to download the videos in mp4 format, along with each YouTube page, descriptions, and annotations. This was a slight modification from my original plan, which was to use the youtube-dl tool for the video downloads. However, since this extension had similarly extensive download quality options, as well as the ability to download the pages, I used it for the ThruYou videos instead. My intention was originally to collect all YouTube comments for each of the Thru-You videos, using a Firefox extension, Complete YouTube Saver. Unfortunately, recent changes in the way YouTube loads their pages broke this functionality in the extension. A smaller selection of the most popular comments does appear in the screenshots for each YouTube page, so this aspect is preserved to some degree; however I would ultimately prefer to find a way to pull the full comments list via the YouTube API.

For the source videos, I did use the youtube-dl program. This is a command line tool which offers a wide range of options. In order to streamline the process of downloading over 100 videos, I created a batch file containing the URLs for all source videos, which allowed me to download the full collection without having to restart the process for each video. Arranging and organizing these files was a challenge. The source videos are linked from the description section for each ThruYou video, with a brief descriptive note (“drums 1”, “toy piano”, etc.). However, these descriptive notes do not match the video titles, potentially making it tricky to line up which video is which once the YouTube links no longer function. In addition, some of the source videos have been removed from YouTube and are missing from the collection. I wanted a researcher to be able to connect the ThruYou videos to the source videos listed in the YouTube description. I ended up numbering the source videos 1-109 based on the order they are listed in the YouTube descriptions (which is roughly the order of their appearance in each video).

View of the source video folder for the first ThruYou video.
View of the source video folder for the first ThruYou video.

I then separated the source videos into folders for each ThruYou video, so each folder contains the relevant source videos in the order they appear in the YouTube description. To keep all this straight, and to note missing videos, I created a “key” to these source videos listing their original YouTube link, the video title, and the label Kutiman used for each in his description.

source video key

I divided the documents into four series folders: 1. ThruYou videos, 2. Source videos, 3. Web pages and screenshots, and 4. Contextual information. As a final step, I included a PDF finding aid briefly describing the collection and the arrangement. In the long run, I would want to prepare a more detailed finding aid with a full series and file listing and more detailed information on each type of file and on the project as a whole.


Overall, the process of finalizing the AIP was a useful reminder of how often unforeseen challenges appear once you begin to work with the collection in practice rather than theory. However, I found that thinking through the project in such detail beforehand not only minimized issues, but also provided a framework to turn to when I needed to make changes to the original plan.

My AIP can be viewed and downloaded here. Note that this is not actually the full collection; while I downloaded the full collection, the final zipped file was well over 1.5 GB, so I created a version with only the source videos for the first ThruYou video included. (I left the folders for the other 8 videos, but deleted the files.) This gives an idea of the structure to be used, but is a much smaller file (although still around 400 MB). In practice, I would envision my collecting institution maintaining an AIP similar to this for preservation purposes, and then producing an access copy for research use and general access.

Exploration at the intersection of material and digital

A researcher, looking to discover more information about an antique object, does a close analysis of the object’s original medium, ultimately finding enlightening evidence about the object’s creation in the traces of an erased text. This summary could apply equally well to the Library of Congress’s work discovering clues to the editing process of the Declaration of Independence, and to Matthew Kirschenbaum’s investigation into a diskette containing a 1980s computer program, as described in his book Mechanisms. One case involves a physical object, viewed through a digital representation; while the other centers on a born-digital work, studied through the lens of its physical storage medium. As this example demonstrates, many of the issues involved with examining (and preserving) born-digital works carry over to the process of digitizing physical objects. In both cases, the intersection of the physical and digital leads to similar challenges, as well as offering interesting possibilities. Continue reading “Exploration at the intersection of material and digital”

PreservingYou: A Preservation Plan for Kutiman’s ThruYou

ThruYou, a music and video mashup project created by the Israeli musician and producer Kutiman (and described in more detail here) is a set of eight videos (seven songs, plus a brief making-of video describing Kutiman’s process), hosted on YouTube and also available on Preserving ThruYou presents an interesting and complex preservation challenge. Unlike some musical or video works, ThruYou’s significance comes not only (or even primarily) from its inherent musical or visual qualities, but also from its place as a notable work in the context of YouTube and emerging developments in social media and user-contributed content. Minus a sense of this context, a future viewer might enjoy the music or the visuals but miss significant aspects of the work’s importance. In order to fully reflect the significant aspects of the work, I will try to preserve not simply the work itself, but also the source videos used to create it, some important and relevant reactions to the videos, as well as a sense of the YouTube context from which ThruYou was created and in which it was presented.

Image of ThruYou on YouTube and on the website

Preserving video and audio materials:

Preserving the core of the work itself may actually be the simplest task. Many YouTube video downloaders are available. For this project, I’ll be working with the youtube-dl downloader. This is a command line program for downloading video from YouTube and other sites. While it appears to be more difficult to use than the simpler YouTube download extensions I’ve used in the past, it offers a vast range of specifications and options. It’s recommended by ArchiveTeam, who have experience downloading YouTube videos for archival purposes, and who’ve conveniently listed detailed recommended settings for high-quality video and audio. The program also allows the user to capture description metadata and annotations (those pop-ups that appear over videos on YouTube); both are preserved in XML format. In general, my priority will be getting the best quality audio and video available (in the case of ThruYou proper, the highest resolution available is only 360p, though some of the source videos may include higher resolutions).

The full list of videos to be preserved from YouTube includes the seven music videos and the making-of video that make up the initial ThruYou project. In order to further preserve the full context of the videos’ creation, I’ll download and preserve the original source videos as well, when possible. Most of these still exist, but some have since been removed from YouTube. Lists of the source videos are included on each individual video’s YouTube page.

In the future, it will likely be necessary to migrate the video files into new formats. For this project, however, the specific encoding format is not essential to the work; as long as the resolution and quality are maintained, they may be transferred into new file formats as necessary to best allow for compatibility, preservation, and access.

Preserving the YouTube and website context:

The close integration of the work within the context of YouTube is critical to the meaning and significance of the work, so efforts will be made to preserve a sense of the visual and functional context of both the YouTube and website access points for the video. Setting up a fully functioning mirror of the YouTube context would be effectively impossible; so much of the context is dynamic, from the comments to the automatically-generated suggested videos on the right side of the page. While a simpler version that embeds the video in a YouTube formatted page would be easier, it would still be difficult and could become complicated to maintain and access over time. The importance of the YouTube context is in showing the context of the video within the social media setting of YouTube. As such, a combination of crawled and saved YouTube pages and screenshots should provide a good sense of the original context. Downloading the pages may present some difficulties. The Heretrix crawler is open-source, but seems to be inconsistently supported on Windows. I’ll give it a try to see if I can get it to work. If not, since a limited number of pages are involved, using Chrome or Firefox’s “Save Page As” functionality might suffice. The full comments for each page will also be captured using the Firefox extension Complete YouTube Saver, which uses the YouTube API to access and download comments (it can also save a complete copy of the page, which is another option for downloading the YouTube web materials).

The full original ThruYou website pages will each be downloaded as well. Since the visual setup and context are important, screenshots will be taken of each page on the ThruYou site as well as each YouTube page. While this remains a somewhat inaccurate representation in the latter case (since YouTube’s design has already changed since ThruYou’s release) it does give an idea of how the video and the original component videos appeared in the full YouTube context. Likewise, the crawled web pages will provide only a snapshot of the page–the dynamic generation of content in the related videos field of the page, as well as page view numbers, ratings, and comments, may have limited or no functionality or renderability outside the live YouTube context. While retaining this functionality would be prohibitively difficult in this situation, the combination of crawled pages and screenshots should provide a framework for future researchers to understand the original context of the project materials. The ThruYou website itself is more static, but its design (a distressed, parodied version of the YouTube layout) relies on knowledge of the YouTube context as well.

Kutiman’s own making-of video also includes video of him using YouTube, searching and clicking through pages and viewing some of the videos used in the project. So in addition to being a part of the project itself, this can serve as another contextual resource for future researchers, providing a (low-res) glimpse of YouTube at the time of the project’s creation and showing the site itself in action.

Preserving reactions to ThruYou:

Given the centrality of the broader context and conversation around the piece in demonstrating its significance, I also want to collect some external materials to show some of this conversation. Much of the reaction to the piece (news articles or tweets about the project, for example) is likely to be preserved elsewhere. A few items, however, warrant preserving along with the project. Most notably, I’ll download and preserve the Researching ThruYou website. Originally created by a fan when YouTube temporarily took down ThruYou, it collects a full list of credits, with links to the original videos. In most cases this site provides additional information about the source video creators (links to personal websites or Twitter accounts, for example). For two of the source creators, the site links to statements from the creator about their reaction to ThruYou. Since I wasn’t able to find much information on the reaction of the source video creators, and since it seems central to the collaborative aspect of the project, I’ll download and preserve both of these (one is a Reddit thread, the other a YouTube video; both will be collected using the same tools as above). In addition, as noted above, the full YouTube comment threads will be downloaded and preserved.


Much of the challenge in preserving this project resides in the conflict created in trying to fit a work created in a dynamic web environment into a more static preservation context. In general with this collection, I try to deal with this by collecting both the raw original materials (such as the videos and comments) along with elements and representations of the dynamic context (crawled web pages and screenshots). Hopefully the combination of these materials will allow future researchers to reconstruct a sense of the active context of the original project.

It will be interesting (and hopefully enlightening) to see how these preservation intentions will be challenged or changed by actually carrying out the process of creating an archival information package.

Preserving and clarifying collaborative contexts

The launch of YouTube in 2005 was quickly recognized as a watershed moment in the growth of social media and user-contributed content online. The ease of uploading and embedding video provided by YouTube made it accessible to a much wider non-specialist audience. Kutiman’s 2009 music and video project ThruYou builds on the subsequent explosion of homemade video content, using YouTube as its source material. Kutiman (aka Ophir Kutiel), an Israeli musician and producer, combed through YouTube, tracking down dozens of clips, musical and non-musical—homemade guitar lessons, piano recitals, amateur freestyle raps, random people screwing around with Theremins or synthesizers. He then used this raw material to create a set of seven original songs, looping and layering audio and video clips from a dozen or more sources to create each song and its accompanying video.

Continue reading “Preserving and clarifying collaborative contexts”