Hopes and Dreams: An Undertale AIP

In my last post, I outlined some of the ways I hoped to structure an Undertale collection. Here, I have an actual image of what that might look like. Because there are a number of issues dealing with ownership here, I only have a mock-up with a few files, rather than the whole collection.

Undertale Model Archival Information Package
Undertale Model Archival Information Package

The three main folders are as follows:

  • “Datamined”: Information gathered from datamining, as presented on Mirrawr’s website. This includes a number of HTML pages and at least one PDF.
  • “Videos”: These are videos of Let’s Plays of Undertale. Includes a full Neutral/Pacifist run and a full Genocide run, with selected and compartative clips in the “Consequences” folder.
  • “Wiki”: Contains all pages from the Undertale Wiki.

Aside from the issues I raised in my last post, here are some considerations I made about this material.

File Types and Acquisition

For the purposes of this AIP, I have acquired the files simply by going to the page, right clicking, and hitting “Save Page as…” While this may have its problems (and indeed certain elements like ads broke), it saves the pages quickly and seemingly with a good bit of extra data. Because I am not concerned here with the look of the wiki as much as the text on it, these HTML documents do well even in a text editor to give users/readers the information needed to understand Undertale. Ideally, I think a software like Archive-it would be ideal, but not crucial. PDFs could also be obtained, but I rather like having a bit more of the HTML stuff available for readers, even if that isn’t my purpose in creating this.

A point raised in a number of the other posts was about how often such a website would need to be checked, to make sure the archive had current information. Although the game is still somewhat new, much of it has been rabidly consumed by fans, and so I do not anticipate great changes in coming years. So, perhaps once a year for the next five years, and then once every five years after that, an archivist could check for changes and update if necessary.

For the videos, I used a website to download the .MP4 of a let’s play. While this isn’t the best quality, because the graphics of Undertale are MEANT to be seen as a somewhat low-fi/throwback development, the lack of video quality is not a huge issue. Ideally, getting the original files from a gamer would be most ideal.

Link to Other Places

One great point Amy raised in a comment was how this could be a great opportunity to link from one section of the package to another, and you’ll see I did that here. In the Genocide video folder, I linked to the wiki entry on a Genocide run of the game– that way, some of the materials in the collection can more easily serve as documentation for other bits of the collection without taking up more storage space. Primarily, I see the videos having links to materials from datamining and the wiki, not the other way around. The exception to this would be perhaps videos that show all the fates that could reach one character (put in the “Consequences” folder); those could be linked to the wiki entries on individual characters.

This allows views to see the collection’s connections between different elements, rather than requiring that they make such connections themselves.

Documentation

I have a documentation word document in each folder. This would likely include a few things: an index of sorts, notes on authorship, and dates of access/acquisition of the materials, to start. The last is particularly important for the websites, which could change at any time, and if they did change, the documentation could help archivists keep track of how many versions they have.

This kind of data would be crucial for the videos, which when downloaded in the way I did, lose much of the metadata YouTube stores about them (user who uploaded them, date of upload, etc). Having some idea of when the videos were uploaded to YouTube helps place the LPer within the span of “Undertale’s history” as well: how they play the game depends a bit on how well they know it; if they made the LP in early days, then they might have been shocked/surprised by things, and that can be good data for the user to have, even if the LP does not contain commentary.

The set of items that needs documentation the most is the Mirrawrs datamined materials. While some of this is self explanatory, much is not, and it seems imperative to have a certain level of description about how the data/numbers work presented here. While I do not myself have such information, it might be worth doing a quick interview with Mirrawrs, include it in this section, and use the data from such an interview to inform the use of the documentation on the datamining section.

Conclusion: Is this the final battle?

I think there’s a good bit of room to tweak elements here. I like that this structure allows for users to enter into the videos or wiki section without knowing much about Undertale, and they’ll quickly get a good understanding of the material, but the datamined section is much more challenging to interpret, and even with detailed documentation, I think it would be a challenge. Still, I think this is a place to start documenting the video game known as Undertale, and hopefully additional resources will contribute to popular and scholarly understandings of this game.

The Lizzie Bennet Diaries- a Social Preservation

The TLBD end

Intro

The preservation plan for The Lizzie Bennet Diaries will consist of the main character’s social media accounts; including, but not limited to Twitter, Facebook, Tumblr and other related accounts for each character. I decided not to preserve the videos themselves, because they have already been properly preserved in the form of YouTube, streaming, and DVD formats. In addition to creating videos that were directed to their audience, the “characters” interacted with their audience in real time. The majority of the social media accounts are no longer active, however, they still remain live. I have chosen to utilize the company ArchiveSocial to capture and preserve the social media accounts.

Continue reading “The Lizzie Bennet Diaries- a Social Preservation”

Bot Preservation: Two Headlines AIP

To Begin…

I’m not going to try to not repeat myself too much as I’ve already a lot about Two Headlines here and here. So, before I get into the archival information package, Two Headlines is a small bit of programming that combines two headlines from the Google News API and posts them to Twitter through the Twitter API with the help of some bits of code that are freely accessible to programmers through Node.js. Two Headlines has been used to teach programmers about creating twitter bots and it is a form of social commentary. Its tweets are also funny and entertaining.

As there is software that needs to be installed involved, creating readmes that include instructions on how to install and operate the programs should be created. It does no one any good to include software that doesn’t have instructions, especially as the software is not designed to be used by people that have little to no programming experience.

Since this is just a model AIP, there are only a few files represented. The AIP will consist of three main folders, one for the bot’s source code and the software and documentation to edit that code, one for any interviews or comments about the bot, and the final one for the tweets themselves and the software that to read them and its documentation. While the file types for things like the source code for the bot and the installer files are already dictated by their creators, any new files created will be to current preservation best practices, PDF/A for the text files and .tiff for images.

  1. Code
Folders structure of the AIP, highlighting the source code for Two Headlines
Folders structure of the AIP, highlighting the source code for Two Headlines

This folder contains the source code of the bot, downloaded from GitHub, along with the software used to create and edit the code. The documentation for Two Headlines’ code and for the software that created it will also be included. Additional documentation for the Google News API and the Twitter API will also be added, as both APIs are used in the running of the code. A readme file with some instructions concerning installing and using the various software was created, mostly from the instructions and

Folders structure of the AIP, highlighting the software
Folders structure of the AIP, highlighting the software

other readme files associated with the programs, is also in the folder.

 

  1. Interviews

Any one that responds to questions about their interactions with the bot, its significance and influence, will have their responses preserved in this folder. News articles and blog posts will also be included here.

  1. Tweets

The tweets will have the third and final folder. As archiving the tweets will require special software to collect them and different software to read them, both the programs and their documentation will also be included. Another

Folders structure of the AIP, highlighting the tweets
Folders structure of the AIP, highlighting the tweets

readme file will be added so that any users know how to install and use the included software to view the tweets. It will also include metadata about the collection of the tweets, including the time and the code that collected the tweets and a record of any modification that was done to them post-collection. A few screenshots will also be provided to show the original Twitter interface that will not be archived with the tweets themselves.

 

Moving Forward

While this is a good start to preserving an entertaining bot, there is more work that could be done. The next steps for this project would be to actually conduct the interviews and acquire permissions for the news articles and blog posts and submit the AIP to the Internet Archive. There would also need to be a mechanism in place to collect the new tweets from the bot, as it posts every few hours, and add them to the preserved files.

Homestar Runner Archive AIP

IntroductionMountains Photo

Since my core content is already well preserved on the Internet Archive, YouTube, and the original site, since the contextualizing information has been meticulously captured by dedicated fans and posted to the HR Wiki, and since the community remains active through the Homestar subreddit, doing the actual preservation work was in the end highly redundant. The work has been done and well. So instead, here is the intellectual exercise of how to organize the data were it to be captured and preserved by a different kind of organization: a major research university.

While the argument has been made for preservation by original order, homestarrunner.com is organized in categories and then largely chronologically, but not purely so. The content here is organized to reflect as closely as possible the organization of the content as it appeared on the site (with major groups first and then release order) while still making it easy to navigate for novices and scholars. In addition I’ve broken out the content of the creative and auxiliary content discussed below.

HRA1

Archive Organization

Part 1: Administrative Data

The first part of the Archive is the Administrative Data organized under the ReadeMe folder. This folder contains the information that captures intellectual control of the materials contained in parts 2-4 of the collection.

The files in this section include:HRA2

Parts 2 & 3: Content

Sections Two and Three house the creative content of the Homestar Runner Archive.

Part 2: Primary Content. This folder includes the featured content from homestarrunner.com, the areas of the site that were updated most often, featured most heavily, or most often seen by new fans:

HRA3

Each of the individual folders for a primary video contains both a master copy of the video as well as a data file (to accompany the imbedded metadata). The Master Video file will be the best available copy of the most recent and accessible version, the least compressed; in short, the version from which future migrations should be made. The Data File will contain up-to-date curated content from the HR Wiki, plus additional information as relevant for each file. The Data Files will contain transcripts, references, links, Easter egg lists, and routing. For example, Teen Girl Squad #1 is actually a Strong Bad Email, so instead of duplicating the content, the folder for TGS #1 will route to the appropriate SBE folder.

HRA4Part 3: Secondary Content. Everything else. This folder contains everything else produced for the site.

Like the Primary content, this will consist mostly of master copies of AV content, such as the website homepages or music videos, and the associated data files. However, it also includes free downloads, collaborative materials, the live-action puppet videos, and merchandise lists to help collectors track licensed products.

 

Part 4: Auxiliary Materials

Section Four comprises materials relevant to a full understanding of the Homestar Runner Archive, but not a part of the creative content.

HRA5HRA6

 

 

Next Steps

With the content so well preserved in so many locations, the next logical steps for a research institution interested in the not only the social impact but the research potential of this collection would be to make overtures to the creators to see what interest exists in the preservation of the creating tools, what oral histories could be captured about the creation and creative process, and what original media still remains that might bolster a digital collection for the edification of future generations.

An AIP for a Digital Deep Cut: Kutiman’s Off Grid

Currently, my AIP for Off Grid is 100% “make believe,” so unfortunately there is nothing yet to download. Still, I will provide details regarding each folder series—web pages, videos, documentary materials, and working files—and subseries. Every series includes a readme document that provides contextual information to users. (Excuse the lack of good normalized file names! This factor was simply overlooked, as I was only creating dummy files to populate the model AIP.)

AIP

Web Pages
This is where the sites preserved using Archive-It will be housed. The folder includes:
• The current instance of www.kutiman.com (dedicated to Off Grid)
• The YouTube page for Off Grid
• The YouTube pages for all of Off Grid’s 95 component videos
• Websites that embed, critique, or provide write-ups on Off Grid

As discussed in my statement of preservation intent, the YouTube pages will be crawled at regular intervals in order to show change over time. Kutiman’s personal website will only be captured once since it is a temporary showcase for Off Grid. Of course, I’ll intermittently peek at the site just in case Kutiman develops the site in an unexpected direction.

The fourth bullet point gave me a bit of difficulty. Originally I had placed these web sites in the Documentary Materials folder because I figured that they helped a future user understand how Off Grid was received and spread throughout the internet. Yes, these sites do provide that function, but sticking them in a folder separate from the “official” Off Grid sites is betraying too much focus on the individual author. I am interested in preserving Off Grid as a window into participatory internet culture, not just as a set of cool videos. For this reason, I feel that my series and subseries need to walk the walk, even if it means that the Web Pages folder may look slightly intimidating to a user at first blush. But, hey, that’s what Readme files are for.

Videos
This folder contains the access copies of all the YouTube videos for Off Grid and its component videos. Using ClipGrab all the videos will be saved as MPEGS4s, as the format is the standard for streaming media and looks to remain well supported. The original format of the video will be documented in its metadata (ClipGrab easily allows one to identify the original format) and the videos are saved at the highest level of quality available. Metadata is generated for these videos and stored in a separate file. I went with a PBCore application profile since it is well-suited, recommended, and I am familiar with it.

Documentary Materials
This folder contains materials that help provide additional context to future users and, as such, holds the largest potential variety of formats. The folder includes:
• PDFs of interview questions and responses sent to creators of the component videos via email form (thanks to Alice for the tip! P.S. would this need to go through IRB?)
• “Making of” videos (likely stored as MPEG4s, though none currently exist so I can’t be 100% sure at this time)

The only lingering doubt I have with this folder is related to the earlier problem I noted regarding the fourth bullet point in the Web Pages folder. Materials that document Off Grid are also materials that tend to embed the work and spread it throughout the internet. This means that a website featuring an interview with Kutiman about the making of Off Grid could be included in both the Web Pages and Documentary Materials folders. I decided that the Documentary Materials folder would be best suited for static documents (non-web pages) that discuss the work. Most of these documents don’t exist yet and will likely be generated through the efforts of my institution, so that’s another way to look at them.

Working Files
The working files are something that can only be obtained through Kutiman and, as such, they aren’t here! I am under the impression that the files are created in Sony Vegas Pro, but this has yet to be confirmed. This folder series is thus a placeholder until more information can be obtained. Still, knowing how the affordances of the various platforms used to produce YouTube videos come into play is important toward understanding participatory internet culture.

Conclusion
Two of the big challenges in designing this AIP were my inability to contact Kutiman and the relatively small amount of buzz it has gotten online—especially when compared to Thru You (*raises fist toward sky* Eriiiiiiiiiiiiiiiiiic!!!!!!!). As I mentioned in my statement of significance, it is essentially viral-proof with its long running time and “out there” music. Also, the work was only released in February, so it simply hasn’t been out very long. However, I think that my planning for the AIP allows things to be added quite easily in the future, such as the working files and interviews.

Preservation Polandball: Archival Information Package

Herein lies the obscenity laden banter of Germanyball, Swedenball, Americaball (‘MURICA!), and all your other favorite national stereotypes. With the understanding that I would never be able to capture and preserve Polandball in its entirety, this AIP is an attempt to preserve just a small sliver of Polandball materials and culture.  The aim was to grab enough documentation from major Polandball sites, including the structure of the sites, the guidelines surrounding Polandball creation, and discussion amongst Polandball users to accurately represent its culture.  Additionally, I wanted to preserve the comics which in many ways become supplementary to the actual discussions that take place.  

The file structure of my AIP is fairly self apparent.  My AIP contains three top level folders (Webrecorded Sites, Supplemental Materials, and Selected Comics) with an introductory “read me” file containing several additional documents describing Polandball (using items such as Wikipedia’s Polandball page for a more popular descriptive source)  and the preservation intent for the collection.  In doing so, I hope that the user will understand the significance of the collection as well as its organization.  

Preservation Polandball File Structure 1

Webrecorded Sites

As indicated in the Preservation Intent Statement, I decided to use Rhizome’s webrecording tool to capture the structure of Polandball’s three major community sites: Reddit, Facebook, and Wikia.  I also saved PDFs, PNGs, JPEGs, etc. of the sites as a way of maintaining some quality assurance.  Should the code not render properly, the static images of site will help interpret the material more accurately.  As part of preserving the sites that have been recorded via webrecorder.io, I’ve included the webarchive player so that the code can be rendered using the necessary tool.  Obviously, this could become a problem in the future as the tools and platforms evolve, but for now, at least the player is included. One major note on the utilization of the webrecorder tool: the idea behind using this tool would be to browse each page captured as much as possible in order to record the most material.  Otherwise, it’d be just as worthwhile to simply save the webpage as html.  However, the examples included in this AIP are limited given the time constraints of the project.  They stand as a prototype of what would exist in the full AIP.

Initially, I planned on using an annotation tool (Hypothesis) to add context to the material recorded.  However, Hypothesis annotation did not work well on images, and did not seem to download well as part of webrecorder.  My notes did not appear to display correctly.  If I were to have saved PDFs or JPEGs (or PNGs) of the annotation, it would have distracted from the item to be preserved as it would dramatically alter the presentation of the item. I would have to save the annotated PDFs separately from the material already being preserved.  This is a great issue to tackle in future Polandball preservation efforts as there could be tremendous value in describing the historical/political context for the conversations happening on Polandball.  However, the annotation tool didn’t seem to flow well with the overall AIP for this particular project as my goal was to capture pieces of the Polandball world instead of augmenting them.  Furthermore, in order to sufficiently annotate the conversations happening through the Polandball community would take an enormous amount of time and scholarly research which was outside the scope of this project.    

 

Supplemental Materials

One of the most important items to preserve from the world of Polandball were the rules and boundaries as defined by its users and moderators.  The most elaborate and well defined set of standards was outlined on Reddit and has been included in the section.  This also includes the FAQ, the statistics of use, as well as the “joke life preserve” which documents punchlines, themes, and techniques which have either been banned or temporarily retired due to overuse.   As was done in the previous file, the pages have been recorded and the code stored.  PDFs have been included for quality assurance.

The conversations that I’ve had with Polandball users have been of immense importance to this project.  With every interaction, users have emphasized the critical nature of the Polandball community conversation.  In this file, I saved some of those conversations between myself and the users as well as PDFs of the activity in chat rooms.  Because these were conversations that took place outside of direction relationship to the comics, I’ve saved them in supplemental materials rather than inclusion with the comic materials.

In addition to materials produced by the user community itself, I wanted to add external commentary and the reception of Polandball in popular society.  For this reason, I included articles written by unaffiliated journalists about Polandball.  Of all the pieces in my AIP, this is the section I would likely remove for copyright purposes.  I’m not certain as to how preservation copyright would operate on an international level.  Given that some of this material is through international magazines/newspapers, I suspect the rights to preservation would be limited.  Nevertheless, I’ve included them here until further action is taken with the AIP.  To preserve these articles, I took my now standard approach: storing the WARC and creating control images of the article.

 

Selected Comics

The goal of this section of the AIP was two-fold: preserve the comics in order to retain the artwork itself but also to capture the community conversation in its natural environment.  The substantial conversations between Polandball users do not happen in a void.  Rather, they develop following the spark a comic ignites.  Preserving these conversations is no easy task as hundreds of comments can springboard off a single comic based on its content.  Furthermore, preserving the comics in their entirety would literally be an impossible task as no one knows precisely how many exist.

I took a practical approach to the task.  Ideally, a series of ten comics would be selected from the opinions of Polandball users using a survey.  Given the time constraints of the project, I selected ten myself from comics mentioned in conversations with individual users and from those highlighted on Reddit and Wikia (the two sites of the three I’ve preserved that are most focused on the original intent of Polandball itself).

Preservation Polandball File Structure 2

In order to preserve the comics and their commentary, I took a three pronged approach.  I wanted to grab the conversation as it developed alongside the comic.  The most likely site to find well organized and thriving conversation was the Polandball Reddit.  Therefore, I once again used webrecorder to capture the code for each of the ten comics’ pages along with control PDFs.  In addition to this, however, I also saved the comics by themselves.  Where the WARC and the PDFs will help preserve the conversation, the images of the comics are rather small unless expanded through a link.  Because I wanted to maintain the selected comics just as much as the comment threads, I saved the larger versions of the comics separate from the commentary.

 

Conclusion

Despite the fact that this AIP contains such a small slice of the Polandball world, it’s enough to capture the essence of Polandball as it currently exists.  The comics and the culture continue to evolve, but these few files are a snapshot of this moment in time.  If given the opportunity, I’d like to see Preservation Polandball grow and refine.  In order for the AIP to be archived in something like the Internet Archive, I’d want to explore annotation with scholarly sources and description attached, permissions granted from all journalists and news outlets whose articles are included in the AIP, and a true survey done to collect the most important Polandball comics (hopefully a larger sampling!) according to the community.  Yet, heretofore no efforts have been made to preserve Polandball at all, so we made progress.

Polandball can into archive, but still cannot into space

ThruYou – Archival Information Package

In creating the archival information package for ThruYou, I had several types of material to collect: The final Thru-You videos, the source videos used to create each video, the YouTube webpage for each ThruYou video, the ThruYou webpage, screenshots of each YouTube and ThruYou page, YouTube comments for each video, the Researching ThruYou webpage, and two commentaries from source video creators (a Reddit thread and a YouTube video).

For the eight ThruYou project videos, I used Complete YouTube Saver, a Firefox extension, to download the videos in mp4 format, along with each YouTube page, descriptions, and annotations. This was a slight modification from my original plan, which was to use the youtube-dl tool for the video downloads. However, since this extension had similarly extensive download quality options, as well as the ability to download the pages, I used it for the ThruYou videos instead. My intention was originally to collect all YouTube comments for each of the Thru-You videos, using a Firefox extension, Complete YouTube Saver. Unfortunately, recent changes in the way YouTube loads their pages broke this functionality in the extension. A smaller selection of the most popular comments does appear in the screenshots for each YouTube page, so this aspect is preserved to some degree; however I would ultimately prefer to find a way to pull the full comments list via the YouTube API.

For the source videos, I did use the youtube-dl program. This is a command line tool which offers a wide range of options. In order to streamline the process of downloading over 100 videos, I created a batch file containing the URLs for all source videos, which allowed me to download the full collection without having to restart the process for each video. Arranging and organizing these files was a challenge. The source videos are linked from the description section for each ThruYou video, with a brief descriptive note (“drums 1”, “toy piano”, etc.). However, these descriptive notes do not match the video titles, potentially making it tricky to line up which video is which once the YouTube links no longer function. In addition, some of the source videos have been removed from YouTube and are missing from the collection. I wanted a researcher to be able to connect the ThruYou videos to the source videos listed in the YouTube description. I ended up numbering the source videos 1-109 based on the order they are listed in the YouTube descriptions (which is roughly the order of their appearance in each video).

View of the source video folder for the first ThruYou video.
View of the source video folder for the first ThruYou video.

I then separated the source videos into folders for each ThruYou video, so each folder contains the relevant source videos in the order they appear in the YouTube description. To keep all this straight, and to note missing videos, I created a “key” to these source videos listing their original YouTube link, the video title, and the label Kutiman used for each in his description.

source video key

I divided the documents into four series folders: 1. ThruYou videos, 2. Source videos, 3. Web pages and screenshots, and 4. Contextual information. As a final step, I included a PDF finding aid briefly describing the collection and the arrangement. In the long run, I would want to prepare a more detailed finding aid with a full series and file listing and more detailed information on each type of file and on the project as a whole.

toplevel

Overall, the process of finalizing the AIP was a useful reminder of how often unforeseen challenges appear once you begin to work with the collection in practice rather than theory. However, I found that thinking through the project in such detail beforehand not only minimized issues, but also provided a framework to turn to when I needed to make changes to the original plan.

My AIP can be viewed and downloaded here. Note that this is not actually the full collection; while I downloaded the full collection, the final zipped file was well over 1.5 GB, so I created a version with only the source videos for the first ThruYou video included. (I left the folders for the other 8 videos, but deleted the files.) This gives an idea of the structure to be used, but is a much smaller file (although still around 400 MB). In practice, I would envision my collecting institution maintaining an AIP similar to this for preservation purposes, and then producing an access copy for research use and general access.

The Fixity of Transformation

The “Transforming” series are four digital paintings, looping every 2-3 hours, meant to reward the viewer for sustained engagement through subtly changing over the course of the work. To preserve the series, I decided to focus on the conversation around the works and to document the process of their creation as detailed in greater length here and here. In short, the pieces themselves are fairly well taken care of but the supporting documentation gets less attention and will still be useful in the future.

aip_structure

The end result is an Archival Information Package (AIP) divided into the larger sections of Web Articles, Audio and Video, and Viewer Reaction. Each larger section has a text file titled the same as the folder heading but appended with  “_description” describing the content of the folder.

Web Articles

One part of documenting the discourse about these works is through saving the media coverage. Using links available on the creator’s websites and using internet searches, I assembled a list of articles on the internet that covered the works in exhibitions, provided commentary, interviewed the authors, or documented the creation of the works.

After downloading a copy of the HTML for an access copy, I also searched the URL in the Wayback Machine and used the “Save Page Now” option to ensure there was also a preservation copy somewhere on the IA servers. In all, I saved the HTML and related files from 48 articles, of which 16 were not yet saved in the Internet Archive.

Audio and Video

The next section of the AIP contains 21 videos (and some audio) describing artistic themes in the works, how the works were made, visual effects breakdowns, and short excerpts of most of the final products. About half of these files I could simply “save as” from the Motion Picture Company website after inspecting the page source.

The other half of the videos were either on sites like YouTube or Vimeo or were streaming Flash video. I used simple available tools to download these videos, ClipGrab for videos on hosting sites and the “Flash Video Downloader” web extension for the flash videos. There were also two files from an audio tour at a museum that I downloaded as well. While I had pie in the sky ideas about using open source software and command line tools, there just wasn’t enough time to dig deep in the documentation and figure them out.

tzTimeAtLast

Once downloaded as MPEG4 files in their highest picture quality, I organized the videos by work or put them in a separate category if it covered multiple works.  I used MediaInfo to generate technical metadata sidecar files for the audio and video, which will be useful to both scholars looking to do research now and in case the video files get corrupted in the future. I exported it as PBCore 2.0 as it was a recommended schema in the FADGI report on preserving born digital video.

Viewer Reaction

In this section of the AIP, I gathered viewer’s reactions not told through articles. I divided the section into  interviews and comments and social media. While I did not end up having the time to do the interviews I planned this is where they would go.

I did save an extensive comment section  through screenshots from one of the web articles that wasn’t saving correctly with the Wayback Machine. I also saved a Youtube video of someone erasing the belly button of the woman in “Transforming Nude Painting.” They claimed that a goddess as depicted in the painting would not have one (in fact this was only one video of many more on this theme in other paintings). The internet is weird…

Final Product

Internet_Archive_logo_and_wordmark

I used the Data Accessioner tool to generate full collection technical and preservation metadata and checksums for each digital object in an xml file. This ensures the ability to check fixity and determine if anything changes in the files in the future. It also provides an easy way to browse the collection in an abstract way as one document.

Finally, I zipped up the AIP and uploaded it to the Internet Archive. You can download it here.

Archiving for Theatre: a Production of Its Own

Files and file structure

I’m going to direct you to an updated version of my lovely flowchart, which has a couple new additions and a few more callout notes to explain the intent behind certain categories. The same breakdown of item responsibilities and locations would reflect the file structure for saved items, and inherently the finding aid. The list of desired elements is framed in such a way that it also acts as a checklist for a digital finding aid. Any non-digital item that would only be located in a physical archive would still have an entry, containing the archival metadata and having a physical location notated instead of a URI. In this way, like the production itself, the digital and the analog continue to work side by side.

Shown here are some examples of some of the file types and backup files. In this instance while the stage manager's files are in proper file naming conventions, the master electrician's files need some work.
Shown here are some examples of some of the file types and backup files. In this instance while the stage manager’s files are in proper file naming conventions, the master electrician’s files need some work.

Files will have a common naming convention, determined as required by the metadata used by the holding archive. If the holding archive does not specify, Dublin Core standards will apply.

Structuring the files in the same way as the finding aid will help with accessibility, but proper metadata will also be important, as some files or aspect types are key in how they overlap: what design choices are requirements of the original creator of the work versus a choice by the designer; cue lights are set up and maintained by the master electrician, but controlled during a performance by the stage manager, and recorded only in their cuing script. The ability to sort and filter objects not only by file type or contributor, but design area or intent aspect (practical, timing, emotional, etc) are the kinds of details that not only will be useful to those recreating a performance, but to those studying the working intent of the designers and directors producing it.

Accessibility of files

The original files will be saved and treated as the preservation copy, while additional lower level access versions of the files will also be created. These will serve not only as the web-available access copies, in some cases, but also to ensure that some readable version will be available if the specialized software the theatre created its originals on is no longer accessible. To that end, CAD files will be exported (at full scale) to PDF/A-2, cue stacks will be converted to database or spreadsheet form, and any specialized visual effects that cannot be exported to other forms will be accompanied by a written artist’s statement describing not only the effect (and how it was created, where possible), but also the intent behind it. Designer’s statements should be included wherever possible, but they are most important for works that cannot be guaranteed to be sustainable. When created, these access files will automatically inherit the metadata of the parent file, in terms of performance information. Access rights and file information metadata will of course differ.

Metadata & archive selection

In terms of what archive this would go in, that kind of depends on a number of factors. Ideally it’d go into something like GloPAD or ECLAP, but it could also go into a local database like WAPAVA, in Big’s case, or the theatre company’s own archive, though the chances of them being able to fully exploit the metadata available is in that case less likely. However, by basing the metadata and extraction on open source tools and shared standards, collecting and displaying as much metadata as possible should be a simple matter.

Fortunately both GloPAD and ECLAP come with suggested metadata models, which employ standards from a variety of regular schema, most notably Dublin Core and VRA Core, but several other models as well. ECLAP also makes use of Linked Open Data (LOD), and generally seems to be more advanced, and actively developed, no doubt in great part due to its status as part of Europeana. However, both models should be scalable (or adjustable in terms of the OAI-PMH) to ensure that even while the holdings records of the online archives may be shared to larger Europeana-type collections, the items themselves may not, dependent on the reproduction rights, which are carefully documented in the metadata schemae. Ideally, in addition to reproduction and transmission rights, general dissemination permissions, of various levels, would be attached to the files, so that different DIPs could be created for the general public, scholars, and the individuals involved in the creation of the work.

In terms of collecting the metadata from the various digital files, a combination of traditional archival metadata extractors would be used. I had hoped to find some theatre-specific tools, but the biggest one I found, Rekall (mentioned in my last post), while at first promising, failed to recognize common lighting-specific file types, including CAD program files, instead lumping anything it didn’t recognize into an ‘octet stream’ category. Additionally I could see no obvious way to export the information out of the program, and documentation or support for the application doesn’t seem to be apparent — it looks like another case of a promising application that dried out with its funding. At least it’s open source, though, so if someone wanted to take it up they’d have a good foundation to build on.

Rekall pulls some amazing metadata out of the files on my hard drive, but it doesn't know any of the standard lighting file types (CAD, paperwork databases, or light board exports).
Rekall pulls some amazing metadata out of the files on my hard drive, but it doesn’t know any of the standard lighting file types (CAD, paperwork databases, or light board exports).

Additional materials

In addition to but archived separately from the show files are general documentation on the metadata schema, the various softwares used in the creation of the performance, and in the creation or acquisition of the metadata. These should all be incorporated somewhere into the larger archive, either in an ‘about’ section or a technical metadata section. The general file structure for each show, once completely uploaded, should be saved as a PDF/A-2 to act as a finding aid, in addition to the general searchability of a digital archive.

Other useful items to archive would be the various union contracts, also mentioned in my last post. If it were a large-scale archive, covering more than one specific theatre, having a section covering the various contracts longitudinally and departmentally would be an invaluable resource. Of course, in a similar fashion to the detailed information in the stage manager’s and directors’ portions of the archived works, privacy concerns would probably mean that a generalized standard contract, rather than one with any specific concessions for a specific theatre, would be most appropriate to archive.

Hopefully, though, some aspects of the archive would continue to grow over time. Allowing for a user-contribution section, as CircusOZ’s Living Archive does, will allow for additional reviews outside the scope of the professional theatre world, and commentary on all aspects of the endeavor to be added at any time even after the initial upload, to ensure that as new connections are made, by people working on the project or simply viewing it, they are not lost. The show may have been struck, but how the show strikes you will never go away.

Video