An AIP for a Digital Deep Cut: Kutiman’s Off Grid

Currently, my AIP for Off Grid is 100% “make believe,” so unfortunately there is nothing yet to download. Still, I will provide details regarding each folder series—web pages, videos, documentary materials, and working files—and subseries. Every series includes a readme document that provides contextual information to users. (Excuse the lack of good normalized file names! This factor was simply overlooked, as I was only creating dummy files to populate the model AIP.)


Web Pages
This is where the sites preserved using Archive-It will be housed. The folder includes:
• The current instance of (dedicated to Off Grid)
• The YouTube page for Off Grid
• The YouTube pages for all of Off Grid’s 95 component videos
• Websites that embed, critique, or provide write-ups on Off Grid

As discussed in my statement of preservation intent, the YouTube pages will be crawled at regular intervals in order to show change over time. Kutiman’s personal website will only be captured once since it is a temporary showcase for Off Grid. Of course, I’ll intermittently peek at the site just in case Kutiman develops the site in an unexpected direction.

The fourth bullet point gave me a bit of difficulty. Originally I had placed these web sites in the Documentary Materials folder because I figured that they helped a future user understand how Off Grid was received and spread throughout the internet. Yes, these sites do provide that function, but sticking them in a folder separate from the “official” Off Grid sites is betraying too much focus on the individual author. I am interested in preserving Off Grid as a window into participatory internet culture, not just as a set of cool videos. For this reason, I feel that my series and subseries need to walk the walk, even if it means that the Web Pages folder may look slightly intimidating to a user at first blush. But, hey, that’s what Readme files are for.

This folder contains the access copies of all the YouTube videos for Off Grid and its component videos. Using ClipGrab all the videos will be saved as MPEGS4s, as the format is the standard for streaming media and looks to remain well supported. The original format of the video will be documented in its metadata (ClipGrab easily allows one to identify the original format) and the videos are saved at the highest level of quality available. Metadata is generated for these videos and stored in a separate file. I went with a PBCore application profile since it is well-suited, recommended, and I am familiar with it.

Documentary Materials
This folder contains materials that help provide additional context to future users and, as such, holds the largest potential variety of formats. The folder includes:
• PDFs of interview questions and responses sent to creators of the component videos via email form (thanks to Alice for the tip! P.S. would this need to go through IRB?)
• “Making of” videos (likely stored as MPEG4s, though none currently exist so I can’t be 100% sure at this time)

The only lingering doubt I have with this folder is related to the earlier problem I noted regarding the fourth bullet point in the Web Pages folder. Materials that document Off Grid are also materials that tend to embed the work and spread it throughout the internet. This means that a website featuring an interview with Kutiman about the making of Off Grid could be included in both the Web Pages and Documentary Materials folders. I decided that the Documentary Materials folder would be best suited for static documents (non-web pages) that discuss the work. Most of these documents don’t exist yet and will likely be generated through the efforts of my institution, so that’s another way to look at them.

Working Files
The working files are something that can only be obtained through Kutiman and, as such, they aren’t here! I am under the impression that the files are created in Sony Vegas Pro, but this has yet to be confirmed. This folder series is thus a placeholder until more information can be obtained. Still, knowing how the affordances of the various platforms used to produce YouTube videos come into play is important toward understanding participatory internet culture.

Two of the big challenges in designing this AIP were my inability to contact Kutiman and the relatively small amount of buzz it has gotten online—especially when compared to Thru You (*raises fist toward sky* Eriiiiiiiiiiiiiiiiiic!!!!!!!). As I mentioned in my statement of significance, it is essentially viral-proof with its long running time and “out there” music. Also, the work was only released in February, so it simply hasn’t been out very long. However, I think that my planning for the AIP allows things to be added quite easily in the future, such as the working files and interviews.

Preservation Polandball: Archival Information Package

Herein lies the obscenity laden banter of Germanyball, Swedenball, Americaball (‘MURICA!), and all your other favorite national stereotypes. With the understanding that I would never be able to capture and preserve Polandball in its entirety, this AIP is an attempt to preserve just a small sliver of Polandball materials and culture.  The aim was to grab enough documentation from major Polandball sites, including the structure of the sites, the guidelines surrounding Polandball creation, and discussion amongst Polandball users to accurately represent its culture.  Additionally, I wanted to preserve the comics which in many ways become supplementary to the actual discussions that take place.  

The file structure of my AIP is fairly self apparent.  My AIP contains three top level folders (Webrecorded Sites, Supplemental Materials, and Selected Comics) with an introductory “read me” file containing several additional documents describing Polandball (using items such as Wikipedia’s Polandball page for a more popular descriptive source)  and the preservation intent for the collection.  In doing so, I hope that the user will understand the significance of the collection as well as its organization.  

Preservation Polandball File Structure 1

Webrecorded Sites

As indicated in the Preservation Intent Statement, I decided to use Rhizome’s webrecording tool to capture the structure of Polandball’s three major community sites: Reddit, Facebook, and Wikia.  I also saved PDFs, PNGs, JPEGs, etc. of the sites as a way of maintaining some quality assurance.  Should the code not render properly, the static images of site will help interpret the material more accurately.  As part of preserving the sites that have been recorded via, I’ve included the webarchive player so that the code can be rendered using the necessary tool.  Obviously, this could become a problem in the future as the tools and platforms evolve, but for now, at least the player is included. One major note on the utilization of the webrecorder tool: the idea behind using this tool would be to browse each page captured as much as possible in order to record the most material.  Otherwise, it’d be just as worthwhile to simply save the webpage as html.  However, the examples included in this AIP are limited given the time constraints of the project.  They stand as a prototype of what would exist in the full AIP.

Initially, I planned on using an annotation tool (Hypothesis) to add context to the material recorded.  However, Hypothesis annotation did not work well on images, and did not seem to download well as part of webrecorder.  My notes did not appear to display correctly.  If I were to have saved PDFs or JPEGs (or PNGs) of the annotation, it would have distracted from the item to be preserved as it would dramatically alter the presentation of the item. I would have to save the annotated PDFs separately from the material already being preserved.  This is a great issue to tackle in future Polandball preservation efforts as there could be tremendous value in describing the historical/political context for the conversations happening on Polandball.  However, the annotation tool didn’t seem to flow well with the overall AIP for this particular project as my goal was to capture pieces of the Polandball world instead of augmenting them.  Furthermore, in order to sufficiently annotate the conversations happening through the Polandball community would take an enormous amount of time and scholarly research which was outside the scope of this project.    


Supplemental Materials

One of the most important items to preserve from the world of Polandball were the rules and boundaries as defined by its users and moderators.  The most elaborate and well defined set of standards was outlined on Reddit and has been included in the section.  This also includes the FAQ, the statistics of use, as well as the “joke life preserve” which documents punchlines, themes, and techniques which have either been banned or temporarily retired due to overuse.   As was done in the previous file, the pages have been recorded and the code stored.  PDFs have been included for quality assurance.

The conversations that I’ve had with Polandball users have been of immense importance to this project.  With every interaction, users have emphasized the critical nature of the Polandball community conversation.  In this file, I saved some of those conversations between myself and the users as well as PDFs of the activity in chat rooms.  Because these were conversations that took place outside of direction relationship to the comics, I’ve saved them in supplemental materials rather than inclusion with the comic materials.

In addition to materials produced by the user community itself, I wanted to add external commentary and the reception of Polandball in popular society.  For this reason, I included articles written by unaffiliated journalists about Polandball.  Of all the pieces in my AIP, this is the section I would likely remove for copyright purposes.  I’m not certain as to how preservation copyright would operate on an international level.  Given that some of this material is through international magazines/newspapers, I suspect the rights to preservation would be limited.  Nevertheless, I’ve included them here until further action is taken with the AIP.  To preserve these articles, I took my now standard approach: storing the WARC and creating control images of the article.


Selected Comics

The goal of this section of the AIP was two-fold: preserve the comics in order to retain the artwork itself but also to capture the community conversation in its natural environment.  The substantial conversations between Polandball users do not happen in a void.  Rather, they develop following the spark a comic ignites.  Preserving these conversations is no easy task as hundreds of comments can springboard off a single comic based on its content.  Furthermore, preserving the comics in their entirety would literally be an impossible task as no one knows precisely how many exist.

I took a practical approach to the task.  Ideally, a series of ten comics would be selected from the opinions of Polandball users using a survey.  Given the time constraints of the project, I selected ten myself from comics mentioned in conversations with individual users and from those highlighted on Reddit and Wikia (the two sites of the three I’ve preserved that are most focused on the original intent of Polandball itself).

Preservation Polandball File Structure 2

In order to preserve the comics and their commentary, I took a three pronged approach.  I wanted to grab the conversation as it developed alongside the comic.  The most likely site to find well organized and thriving conversation was the Polandball Reddit.  Therefore, I once again used webrecorder to capture the code for each of the ten comics’ pages along with control PDFs.  In addition to this, however, I also saved the comics by themselves.  Where the WARC and the PDFs will help preserve the conversation, the images of the comics are rather small unless expanded through a link.  Because I wanted to maintain the selected comics just as much as the comment threads, I saved the larger versions of the comics separate from the commentary.



Despite the fact that this AIP contains such a small slice of the Polandball world, it’s enough to capture the essence of Polandball as it currently exists.  The comics and the culture continue to evolve, but these few files are a snapshot of this moment in time.  If given the opportunity, I’d like to see Preservation Polandball grow and refine.  In order for the AIP to be archived in something like the Internet Archive, I’d want to explore annotation with scholarly sources and description attached, permissions granted from all journalists and news outlets whose articles are included in the AIP, and a true survey done to collect the most important Polandball comics (hopefully a larger sampling!) according to the community.  Yet, heretofore no efforts have been made to preserve Polandball at all, so we made progress.

Polandball can into archive, but still cannot into space

ThruYou – Archival Information Package

In creating the archival information package for ThruYou, I had several types of material to collect: The final Thru-You videos, the source videos used to create each video, the YouTube webpage for each ThruYou video, the ThruYou webpage, screenshots of each YouTube and ThruYou page, YouTube comments for each video, the Researching ThruYou webpage, and two commentaries from source video creators (a Reddit thread and a YouTube video).

For the eight ThruYou project videos, I used Complete YouTube Saver, a Firefox extension, to download the videos in mp4 format, along with each YouTube page, descriptions, and annotations. This was a slight modification from my original plan, which was to use the youtube-dl tool for the video downloads. However, since this extension had similarly extensive download quality options, as well as the ability to download the pages, I used it for the ThruYou videos instead. My intention was originally to collect all YouTube comments for each of the Thru-You videos, using a Firefox extension, Complete YouTube Saver. Unfortunately, recent changes in the way YouTube loads their pages broke this functionality in the extension. A smaller selection of the most popular comments does appear in the screenshots for each YouTube page, so this aspect is preserved to some degree; however I would ultimately prefer to find a way to pull the full comments list via the YouTube API.

For the source videos, I did use the youtube-dl program. This is a command line tool which offers a wide range of options. In order to streamline the process of downloading over 100 videos, I created a batch file containing the URLs for all source videos, which allowed me to download the full collection without having to restart the process for each video. Arranging and organizing these files was a challenge. The source videos are linked from the description section for each ThruYou video, with a brief descriptive note (“drums 1”, “toy piano”, etc.). However, these descriptive notes do not match the video titles, potentially making it tricky to line up which video is which once the YouTube links no longer function. In addition, some of the source videos have been removed from YouTube and are missing from the collection. I wanted a researcher to be able to connect the ThruYou videos to the source videos listed in the YouTube description. I ended up numbering the source videos 1-109 based on the order they are listed in the YouTube descriptions (which is roughly the order of their appearance in each video).

View of the source video folder for the first ThruYou video.
View of the source video folder for the first ThruYou video.

I then separated the source videos into folders for each ThruYou video, so each folder contains the relevant source videos in the order they appear in the YouTube description. To keep all this straight, and to note missing videos, I created a “key” to these source videos listing their original YouTube link, the video title, and the label Kutiman used for each in his description.

source video key

I divided the documents into four series folders: 1. ThruYou videos, 2. Source videos, 3. Web pages and screenshots, and 4. Contextual information. As a final step, I included a PDF finding aid briefly describing the collection and the arrangement. In the long run, I would want to prepare a more detailed finding aid with a full series and file listing and more detailed information on each type of file and on the project as a whole.


Overall, the process of finalizing the AIP was a useful reminder of how often unforeseen challenges appear once you begin to work with the collection in practice rather than theory. However, I found that thinking through the project in such detail beforehand not only minimized issues, but also provided a framework to turn to when I needed to make changes to the original plan.

My AIP can be viewed and downloaded here. Note that this is not actually the full collection; while I downloaded the full collection, the final zipped file was well over 1.5 GB, so I created a version with only the source videos for the first ThruYou video included. (I left the folders for the other 8 videos, but deleted the files.) This gives an idea of the structure to be used, but is a much smaller file (although still around 400 MB). In practice, I would envision my collecting institution maintaining an AIP similar to this for preservation purposes, and then producing an access copy for research use and general access.

The Fixity of Transformation

The “Transforming” series are four digital paintings, looping every 2-3 hours, meant to reward the viewer for sustained engagement through subtly changing over the course of the work. To preserve the series, I decided to focus on the conversation around the works and to document the process of their creation as detailed in greater length here and here. In short, the pieces themselves are fairly well taken care of but the supporting documentation gets less attention and will still be useful in the future.


The end result is an Archival Information Package (AIP) divided into the larger sections of Web Articles, Audio and Video, and Viewer Reaction. Each larger section has a text file titled the same as the folder heading but appended with  “_description” describing the content of the folder.

Web Articles

One part of documenting the discourse about these works is through saving the media coverage. Using links available on the creator’s websites and using internet searches, I assembled a list of articles on the internet that covered the works in exhibitions, provided commentary, interviewed the authors, or documented the creation of the works.

After downloading a copy of the HTML for an access copy, I also searched the URL in the Wayback Machine and used the “Save Page Now” option to ensure there was also a preservation copy somewhere on the IA servers. In all, I saved the HTML and related files from 48 articles, of which 16 were not yet saved in the Internet Archive.

Audio and Video

The next section of the AIP contains 21 videos (and some audio) describing artistic themes in the works, how the works were made, visual effects breakdowns, and short excerpts of most of the final products. About half of these files I could simply “save as” from the Motion Picture Company website after inspecting the page source.

The other half of the videos were either on sites like YouTube or Vimeo or were streaming Flash video. I used simple available tools to download these videos, ClipGrab for videos on hosting sites and the “Flash Video Downloader” web extension for the flash videos. There were also two files from an audio tour at a museum that I downloaded as well. While I had pie in the sky ideas about using open source software and command line tools, there just wasn’t enough time to dig deep in the documentation and figure them out.


Once downloaded as MPEG4 files in their highest picture quality, I organized the videos by work or put them in a separate category if it covered multiple works.  I used MediaInfo to generate technical metadata sidecar files for the audio and video, which will be useful to both scholars looking to do research now and in case the video files get corrupted in the future. I exported it as PBCore 2.0 as it was a recommended schema in the FADGI report on preserving born digital video.

Viewer Reaction

In this section of the AIP, I gathered viewer’s reactions not told through articles. I divided the section into  interviews and comments and social media. While I did not end up having the time to do the interviews I planned this is where they would go.

I did save an extensive comment section  through screenshots from one of the web articles that wasn’t saving correctly with the Wayback Machine. I also saved a Youtube video of someone erasing the belly button of the woman in “Transforming Nude Painting.” They claimed that a goddess as depicted in the painting would not have one (in fact this was only one video of many more on this theme in other paintings). The internet is weird…

Final Product


I used the Data Accessioner tool to generate full collection technical and preservation metadata and checksums for each digital object in an xml file. This ensures the ability to check fixity and determine if anything changes in the files in the future. It also provides an easy way to browse the collection in an abstract way as one document.

Finally, I zipped up the AIP and uploaded it to the Internet Archive. You can download it here.

Archiving for Theatre: a Production of Its Own

Files and file structure

I’m going to direct you to an updated version of my lovely flowchart, which has a couple new additions and a few more callout notes to explain the intent behind certain categories. The same breakdown of item responsibilities and locations would reflect the file structure for saved items, and inherently the finding aid. The list of desired elements is framed in such a way that it also acts as a checklist for a digital finding aid. Any non-digital item that would only be located in a physical archive would still have an entry, containing the archival metadata and having a physical location notated instead of a URI. In this way, like the production itself, the digital and the analog continue to work side by side.

Shown here are some examples of some of the file types and backup files. In this instance while the stage manager's files are in proper file naming conventions, the master electrician's files need some work.
Shown here are some examples of some of the file types and backup files. In this instance while the stage manager’s files are in proper file naming conventions, the master electrician’s files need some work.

Files will have a common naming convention, determined as required by the metadata used by the holding archive. If the holding archive does not specify, Dublin Core standards will apply.

Structuring the files in the same way as the finding aid will help with accessibility, but proper metadata will also be important, as some files or aspect types are key in how they overlap: what design choices are requirements of the original creator of the work versus a choice by the designer; cue lights are set up and maintained by the master electrician, but controlled during a performance by the stage manager, and recorded only in their cuing script. The ability to sort and filter objects not only by file type or contributor, but design area or intent aspect (practical, timing, emotional, etc) are the kinds of details that not only will be useful to those recreating a performance, but to those studying the working intent of the designers and directors producing it.

Accessibility of files

The original files will be saved and treated as the preservation copy, while additional lower level access versions of the files will also be created. These will serve not only as the web-available access copies, in some cases, but also to ensure that some readable version will be available if the specialized software the theatre created its originals on is no longer accessible. To that end, CAD files will be exported (at full scale) to PDF/A-2, cue stacks will be converted to database or spreadsheet form, and any specialized visual effects that cannot be exported to other forms will be accompanied by a written artist’s statement describing not only the effect (and how it was created, where possible), but also the intent behind it. Designer’s statements should be included wherever possible, but they are most important for works that cannot be guaranteed to be sustainable. When created, these access files will automatically inherit the metadata of the parent file, in terms of performance information. Access rights and file information metadata will of course differ.

Metadata & archive selection

In terms of what archive this would go in, that kind of depends on a number of factors. Ideally it’d go into something like GloPAD or ECLAP, but it could also go into a local database like WAPAVA, in Big’s case, or the theatre company’s own archive, though the chances of them being able to fully exploit the metadata available is in that case less likely. However, by basing the metadata and extraction on open source tools and shared standards, collecting and displaying as much metadata as possible should be a simple matter.

Fortunately both GloPAD and ECLAP come with suggested metadata models, which employ standards from a variety of regular schema, most notably Dublin Core and VRA Core, but several other models as well. ECLAP also makes use of Linked Open Data (LOD), and generally seems to be more advanced, and actively developed, no doubt in great part due to its status as part of Europeana. However, both models should be scalable (or adjustable in terms of the OAI-PMH) to ensure that even while the holdings records of the online archives may be shared to larger Europeana-type collections, the items themselves may not, dependent on the reproduction rights, which are carefully documented in the metadata schemae. Ideally, in addition to reproduction and transmission rights, general dissemination permissions, of various levels, would be attached to the files, so that different DIPs could be created for the general public, scholars, and the individuals involved in the creation of the work.

In terms of collecting the metadata from the various digital files, a combination of traditional archival metadata extractors would be used. I had hoped to find some theatre-specific tools, but the biggest one I found, Rekall (mentioned in my last post), while at first promising, failed to recognize common lighting-specific file types, including CAD program files, instead lumping anything it didn’t recognize into an ‘octet stream’ category. Additionally I could see no obvious way to export the information out of the program, and documentation or support for the application doesn’t seem to be apparent — it looks like another case of a promising application that dried out with its funding. At least it’s open source, though, so if someone wanted to take it up they’d have a good foundation to build on.

Rekall pulls some amazing metadata out of the files on my hard drive, but it doesn't know any of the standard lighting file types (CAD, paperwork databases, or light board exports).
Rekall pulls some amazing metadata out of the files on my hard drive, but it doesn’t know any of the standard lighting file types (CAD, paperwork databases, or light board exports).

Additional materials

In addition to but archived separately from the show files are general documentation on the metadata schema, the various softwares used in the creation of the performance, and in the creation or acquisition of the metadata. These should all be incorporated somewhere into the larger archive, either in an ‘about’ section or a technical metadata section. The general file structure for each show, once completely uploaded, should be saved as a PDF/A-2 to act as a finding aid, in addition to the general searchability of a digital archive.

Other useful items to archive would be the various union contracts, also mentioned in my last post. If it were a large-scale archive, covering more than one specific theatre, having a section covering the various contracts longitudinally and departmentally would be an invaluable resource. Of course, in a similar fashion to the detailed information in the stage manager’s and directors’ portions of the archived works, privacy concerns would probably mean that a generalized standard contract, rather than one with any specific concessions for a specific theatre, would be most appropriate to archive.

Hopefully, though, some aspects of the archive would continue to grow over time. Allowing for a user-contribution section, as CircusOZ’s Living Archive does, will allow for additional reviews outside the scope of the professional theatre world, and commentary on all aspects of the endeavor to be added at any time even after the initial upload, to ensure that as new connections are made, by people working on the project or simply viewing it, they are not lost. The show may have been struck, but how the show strikes you will never go away.