Plotting it out: Preserving a Big production

Part I: The What (and the Who)

So when it came to making a list of what files and types of items needed to be kept, I immediately geeked out (just a little) and spent several hours some time working up a little flowchart.

Not the most straightforward flow chart

…it got complicated pretty quickly. For now, it’s broken down into three concepts of items to be collected, based on creator intention. I’ll go into some amount of detail here, but I’m going to try not to make this a massive essay, so if I gloss over something you’re curious about, please feel free to comment and I’ll happily go into more detail. The three areas overlap and cross over in places, but they basically break down into ‘the production as art,’ ‘the production as profit vehicle,’ and ‘the production as published by the creator.’ Suggestions for cooler names also welcome.

Product as Art

Here we’re discussing the specific production to be archived, and the aspects that were designed for this specific production: the set, costumes, lights, sounds, props, etc. Things that were worked out in concert between the director and the designers to bring to life the specific message the director wishes to convey, with the talent selected and the funds available. Not all of these will have digital assets, or analogue outputs of digital processes may be the aspect chosen for archiving, but the production as a whole must be considered when creating a preservation plan. The actions in this document, though, will focus on the digital elements, taking current standards for archiving the analogue as given.

Fortunately the digital aspects of theatre design tend to be designed for portability and reproducibility, so getting the files themselves in most cases I could think of shouldn’t prove to be a problem. The question, however, of file type and playback is more of an issue, which I will get into more below. But first, the files themselves.

section of a light plot

Most design paperwork comes in drafts, and ideally several drafts would be collected:
-The initial concept drawings or descriptions, from bid/first production meeting. These would be received from the designers. If the designers were unable to supply, we would look to the production meeting notes to see if a detailed description was recorded.
-The first draft of the working renderings, as delivered to the shop. This could be retrieved from shop staff or the designer.
-The final post-opening draft with all changes/updates. This would be retrieved from theatre staff, as once load-in has taken place, they are in the best position to track changes to the physical implementation, over the designers. Examples of such staff members would be the technical director, master electrician, and sound engineer. If any additional in-between drafts are available, especially if they outline any significant changes (e.g. cutting a door), they could also have potential value for archiving, but the start and end points are the most key.
-Playback files such as lighting or sound cue stacks would be retrieved from the relevant department staff member, at any point after official opening, to ensure the final version is retrieved. Like other design work, multiple drafts would be useful, but here the final product is the most important thing.

Any digitally-based files would likely have been emailed or uploaded to a file-sharing service like Dropbox, so retrieval after the fact should be simple. In fact, after setting up a standardized listing of items to be archived, a copy could be sent to the archivist in the same fashion at the same time, or the archivist could establish periodical automated backups of the Dropbox. Playback files may require more work to add to the archive — as mentioned elsewhere, Adventure’s light board only exports to floppy disk, so getting the file onto a computer, even before verifying the file, would already take a little doing in today’s computer world.

Other elements that are more abstract, but equally important include the paperwork associated with people such as the director(s), dramaturg, and stage manager. These would be directly retrieved from the source creator, and may need to be filtered for privacy issues more so than other material. The director’s notes on both the play and production will provide essential insight into the motivations of actions onstage, while the dramaturg’s role is to provide illustrative research to the director, designers, and actors on key events of the setting. Sometimes basic information from these players is recorded in ephemera such as programs, but more detailed information must be collected from the source directly.

The stage manager’s book forms the key bible to recreating the production from night to night. It is designed such that if something were to happen to the stage manager, someone less familiar with the show could take the book and run the show, with no noticeable difference from the audience. It may also contain contents that don’t need to be archived, such as contact information or contracts, so the stage manager should be somewhat selective in what gets archived from their book, and they will be the one best equipped to make this decision.

Any physical artifacts/ephemera that are being kept or digitized:
-maquettes/white models (or 3D renderings)
-fabric samples (especially for instances where custom fabrics had to be made – see that one TD&D article)
-any analogue designs
-product info. This might also be digital, but in either case it is essential to record anything that ties into the affordances of the final design, such as limitations due to electrical resistance in the materials (this was the case for the LED tape, which limited how long a section could be), or simple availability of parts (how many color strollers are available, how many dimmers are available, weight capacity of casters, etc).

Product of the Creator

One important thing to note with the design work is what design elements are due to the designer, and which ones are requirements of the playwright (also possible is homages to earlier designs, but we will discard that option for now, treating it as part of the first category). While some playwrights’ descriptions of location and action are minimal or are mostly placed by editors, as is the case with Shakespeare (Thomson, 1988), other playwrights, such as Samuel Beckett, are notoriously exacting over elements of design, especially scenic (McDonagh, 2014). To that end, artistic statements or even a listing of elements specifically called out in the script by the playwright are another useful addition to the list of design elements. This can probably be compiled either as a separate document, or possibly flagged as a metadata element or class type in the individual design works.

It can also be useful to archive any communication with the creator in any instance where special exceptions to the general license has been made, such as any major change that has to be cleared so that proof can be provided of the special circumstances in case of later litigation. Exact records of what version of the script was used can also be useful, if it was newer version than the commonly published version (this happened with a Canadian play I stage managed in college), or if it’s a new play being produced for the first time. If the playwright habitually works with a certain theatre company, they may also find themselves generally associated with them, and having general information for researchers of that playwright would be an additional benefit.

Production as profit

In this section we come back to the reality that you’re not just making art, you also have to get an audience. You have to sell the production. The most key element here for our purposes is production stills, but it’s far from the only element that could be archived.

Production stills are either staged shots of key moments in the play, or photos taken of a running production, often a final dress rehearsal. They are used to advertise the production, and distributed to designers for their portfolios. These would be retrieved either from the photographer or a selected person in the publicity department.

Some theatres may also put together a press packet during the course of the run, which collects all the reviews of the show, to be given to any cast or crew member who desires one. Adventure doesn’t participate in this, but one or two other DC theatres I’ve worked with do this. One copy could be sent to be archived as well.

One element that I didn’t mention in the first section, but could potentially fit there as well as here, is the filmed production. Filming a performance for strictly archival reasons is relatively common practice in many places, though filming for distribution less so.

The publicity department might also have logos and merchandising designs to archive.

Something that might be useful for larger production companies to keep – I’m not sure about this, this is something if anyone has some input on I’d love to hear it – is cost info. Production bids, etc. This would probably be more useful in a general archive of the workings of a theatre company rather than in the archives of a specific show, but they could potentially be cross-linked, and it would still be useful information for restaging purposes, or when deciding what elements to save versus recycle or scrap.

Part II: The How

File formats:

Once we have the files, we then have the issue of determining whether or not the files are in a proprietary format, and the stability of those formats. Is the file type readable by other programs? How much does the software cost? How backwards-compatible is it? Does the proprietary format need to be kept? Or can simplified versions be kept? Or possibly both. Where is it more important to keep the intent or the final outcome of the tool rather than the manner in which it was created?

For example, in Adventure Theatre’s production of Big, the sound designer put all his sound cues for playback into a program called QLab. In the program you can do things like set sound levels, the start and stop times out of a larger file, which speakers the sound will go to, fade level and rate, and more. But if the show was being done in a more low-tech environment, with say a CD player and a manual board, then having all those QLab files perfectly archived wouldn’t do as much good as a listing of fade times, levels, etc. On the other hand, while having CAD file formats for the set drawings is certainly useful, as digital versions of physical blueprints, conversion to and storing of simple PDFs doesn’t present as much risk. Keeping the vector files of any logos created by marketing is useful, but if any future productions would be recast (a necessity with shows with children), then keeping the full set of files for the program is less useful.

Copyright/ethics:

While up to now we’ve primarily been talking about the people behind the scenes, we can’t forget those onstage. If the actors are equity they have rights over their image. The standard Actor’s Equity Association (AEA) contract has special waivers for recording, either for archival purposes or distribution, as mentioned above. There are several kinds of contracts available, depending on the type of theatre, and Adventure Theatre is not a union house, but we will use the League of Resident Theatres handbook for our purposes here. The handbook should also be referred to for general usage of an actor’s image, and of course AEA always has representatives available to contact if there are still uncertainties. All the potentially involved unions (directors, dramaturgs, playwrights, stage hands, etc) can be contacted as necessary.

Draft of lit set pieces

Designs remain copyright of the designers, so even in cases where we get files from theatre staff, approval to archive must still be negotiated with designers. Advertising material, programs, etc, however, would remain copyright of the theatre.

Big logo

There are a couple options for dealing with copyright in the archive: wherever possible, copyright and access to archived files should be included in the language of the initial designer contract. Standard exceptions should be made. Files should be given copyright/access restriction metadata, so that different levels of DIPs can be created. Distribution can also be controlled via requiring login for certain types of information, which registered accounts can be granted access to, with supporting documentation if needed. This way content can be distributed at a level the designer feels comfortable with – final renderings and static images only to the public, and working files only accessible to people associated with the original production, for example.

Next steps:

I’ve been considering contacting some larger theatres that already have archives, and are known to have technically complex productions, to see if they have a set procedure for archiving shows. For example, the National Theatre in the UK has its own archive, and a highly complicated technical system that goes well with their complex rep staging setup. I discovered this after I stumbled across their page on iTunes U, which has a lot of great introductory videos for their technical accomplishments.

I also need to do some brushing up on the FRBR model, and think about the best way to organize the archived information. And I have some software to explore. Software such as Rekall, which was created with the performance arts in mind, would be ideal. I’m looking for more performance arts oriented software; Rekall isn’t the only one, but it’s the most intriguing. Traditional archival software is another option that I’ll look into — it might integrate better with the larger archival structure.

Works cited:

McDonagh, L. (2014). Plays, Performances and Power Struggles – Examining Copyright’s “Integrity” in the Field of Theatre. The Modern Law Review, 77(4), 533–562. http://doi.org/10.1111/1468-2230.12078

Thomson, L. (1988). Broken Brackets and ’Mended Texts: Stage Directions in the Oxford Shakespeare. Renaissance Drama, 19, 175–193. Retrieved from http://www.jstor.org.ezp-prod1.hul.harvard.edu/stable/41917434

Preserving Homestar

Warning: There is a Spoiler in this Post.

 

Statement of Intent

           It is the intent of this project to keep Homestar Runner safe. As one of the contributors to the Internet Archive commented when he/she uploaded a large quantity of HR .swf files, the hiatus was making him/her paranoid and he/she just wanted the files to be safe (see below; sorry about the quality). Homestar Runner is good clean fun, it swept the nation and beyond. Its absurdity was also its access point. It sustained on fan support alone. And so I intend to preserve it in the same way: keeping it safe, keeping it accessible, and keeping the fans involved. It is intended to be open and amenable for as long as there is new material to be added.

paranoia

Content and Priorities

           The files released by the Brothers Chaps in the Homestar Runner Website are not confined to simply the Flash-born .swf files or html5 files, but include numerous other file types for the multitudes of content they produced. While it is important to the project to capture all of the content of the site, it must be recognized that the releases fall into two main categories: core narrative content and auxiliary materials.  

           While the project will seek to maintain all the releases, the core narrative content will receive top priority. Auxiliary materials such as downloads, games and merchandise specs, while interesting, must be prioritized as secondary to the core narrative content.

           The content of it site is extremely well preserved and has extensive contextualizing information created by fans. Failing to gain access to additional original materials from the creators, capturing these are the top two priorities for the project. The third priority is to try to capture how the site captured the world’s attention and spread before social media.

           It is also a priority – though I initially saw it as the death of the project – to keep the fans involved. The same fans that have kept interest alive for fifteen plus years and bought the merchandise to keep the site running are the same fans who have curated such a thorough wiki that it will form the core of the contextualizing data in the next section. Preserving the content for these fans and then discounting their curatorial efforts would be arrogant – they are the experts on the subject of Homestar Runner, all we need to do is filter it into the world of GLAM.

sbe.wki
Fan-created wiki.

Content Capture

           Since we have prioritized the core narrative content as the files that must be saved and contextualized before the other auxiliary content, the first step would be to encapsulate as best as possible a Flash emulation bubble (like the WayBack Machine does now) that could render the original content in its original format without requiring the casual viewer to download outdated software. This would moreover allow all the hidden content to remain valid when subsequent interfaces, such as DVD players, render it unreachable.

           With the Death of Flash and the sites recognition and migration to HTML5, the creator’s established a directive of migration. Following this directive and wishing to continue the trend of non-proprietary migration for maximum accessibility, the content should be migrated through a Content Management System, ShareStream in particular was recommended because it can handle the additional content below.

           The importance of the Homestar Runner collection cannot be underestimated. Huge amounts of curation have already been accomplished by volunteers on the Homestar Wiki and this content should not be discounted. Just as the site itself grew organically, this content was created over the course of years by dedicated fans. This content includes detailed transcripts of both the original content and the DVD commentaries, any Easter Eggs and how to activate them, copious lists annotating trivia, glitches, internal and external references, as well as how the DVDs differ from the original online content. Having examined a few of the transcripts, the thoroughness is more than adequate for inclusion in a professional setting. Updates and spiffing up could be done on a case by case basis, as time allowed or changes deemed it necessary.

           By using ShareStream’s features, content can be locked for licencing purposes, transcripts can be synced with video, and it produces standardized output, simplifying content migration moving forward, a must for content of long-term value. Combining in a single CMS the video, metadata, and robust contextualizing information will create not only an academic resource, but a user-developed time capsule of what was feeding into the creation of each video and where each is referenced in the larger world.

           

Content Creation

           As the bulk of the content to capture already exists, much of the work will be to migrate and merge it into a single interface, standardizing inconsistencies where necessary. Moreover, each of the videos in the core content would need to be catalogued separately for the purposed of future academic research so that searches and filtering could be easily accomplished based on specific needs.  

           As new content is released – which according to one half of the Brothers Chaps is likely, though intermittently – the system should be open to allow for new additions in both core narrative content, auxiliary materials, and contextualizing content.

           For any of the auxiliary materials that are known to be in particularly vulnerable file formats, an alternative format should be immediately sought or a video and written description of the item created and archived in its place.

           To meet the third priority of the project, a data collection form will have to be created and, like many things on the internet, it will have to run on the honor system. I have chosen to call this part of the project the Leotard-Postcard project because it combines two aspects of the Strong Bad emails: postcards from his vacation email and the weapon of mass destruction that will wipe out the zombies from the funeral email. Unlike the HRwiki, it cannot be re-edited once incorrect information has been submitted. To chart the spread of Homestar Runner, the Leotard-Postcard project will be a form similar to Where’s George, the dollar bill tracking website. Linking from the HR archive, interested fans can share information about how they heard about HomestarRunner: where they lived, in what year, and how they heard. This will track the spread of the site like the zombie virus that Strong Bad inevitably contracts.

http://http://https://www.youtube.com/watch?v=smWyKu1q0Cc

The Preservation Plan

  1. Acquire ShareStream license and dedicated server space to host content at Internet Archives
  2. Run Homestar Runner content from WayBack Machine through ShareStream interface to pull standardized files.
  3. Develop a file structure to mimic the homepage of Homestar Runner so that navigation of the content is as close to the original as possible – should the original site and the WayBack machine sites have both failed.
  4. Copy and paste data from HR wiki into individual video file catalog records bespoke metadata fields.
  5. Link to additional content
    1. The Original Site & Store
    2. The WayBack Machine
    3. Leotard-Postcard Project
    4. THe HR Wiki
    5. HR Reddit 

Preservation of Twitch Ecosystem

twitch-ecosystem

WHY PRESERVE TWITCH?
Twitch—a series of minimal one button game—is an epitome of the Processing evolution. Preservation of Twitch, therefore, allows the game to be a gateway to the history of Processing. Moreover, preservation of Twitch can be a pilot project for the growing numbers of software artwork that incorporate Processing.

The latter rationale is especially of import, concerning the immediate stakeholders of Processing. According to Casey Reas, co-creator of Processing and the creator of Twitch, the creative community is the primary audience of Processing. In a 2008 interview, Reas spoke about Processing as follows:

It’s not very common for artists and designers to be the primary authors of programming environment, but this is changing. I hope Processing has helped to demonstrate that we don’t need to rely only on what software companies market to us and what engineers think we need. As a creative community, we can create our own tools for our specific needs and desires.

As a matter of fact, within seven years since Reas and Ben Fry released Processing under the Open Source policy, the developing community has developed 70+ libraries. Processing users’ fields include: 12K and higher education, music industry, journal publishing, design and art industries. As a result, the language initially developed to teach computational graphic design literacy can now process audio, electronics, and animation. (You can learn more about the diverse user base in this Vimeo of a Carnegie Mellon lecture delivered by Reas and Fry.)

WHAT TO COLLECT OF TWITCH?
Preserving the source code of Twitch, together with the historical narratives surrounding its production would serve as the documentation of how an Open Source project thrives and comes to be. Here is a preservation package for the Twitch ecosystem.

1. Source Codes: Twitch is composed of (at least) 13 files*—3 JavaScript and 10 HTML files. They need to be stored in a file titled “play.” The following are the existing files that are accessible from any users’ web browser.

source-code-structure

* see the ACQUISITION PLAN below

File Name Notes on Credits
init.js This code searches for all the <script type= “application/processing” target=”canvasid”> in a page and loads each script in the target canvas with the proper id. It is useful to smooth the process of adding Processing code in a page and starting the Processing.js engine.
processing.js Build script for generating processing.js. The version used for Skitch is 1.4.8. Written by John Resig (http://ejohn.org/), MIT Licensed (http://ejohn.org/blog/processingjs/).
windowScript.js Prompts the size and position of windows as they open in sequence.
window0.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. TWITCH written by REAS (www.reas.com). Ported from Processing to Processing.js.
window1.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. FLOW written by REAS (www.reas.com). Ported from Processing to Processing.js.
window2.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. PERILOUS BELT written by REAS (www.reas.com). Ported from Processing to Processing.js.
window3.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. SLENDER VINES written by REAS (www.reas.com). Ported from Processing to Processing.js.
window4.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. BOOMING CANNON written by REAS (www.reas.com). Ported from Processing to Processing.js.
window5.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. ELECTRIC PYRAMID written by REAS (www.reas.com). Ported from Processing to Processing.js.
window6.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. SMUGGLING written by REAS (www.reas.com). Ported from Processing to Processing.js.
window7.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. SLIPPERY STREAM written by REAS (www.reas.com). Ported from Processing to Processing.js.
window8.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. BATTLE FRONT written by REAS (www.reas.com). Ported from Processing to Processing.js.
window9.html Pattern example written by Casey Reas and Ben Fry, saved on http://ejohn.org/apps/processing.js/examples/topics/pattern.html. EERIE LABYRINTH written by REAS (www.reas.com). Ported from Processing to Processing.js.

2. Hosting Environment: In order to allow users an interaction with Twitch, the source files need to be generated on a web browser. Web browsers need to support HTLM5 <canvas> element. As of April 2016, browsers that quality this requirements are: Chrome 4.0, Internet Explorer 9.0, FireFox 2.0, Safari 3.1, and Opera 9.0.

browser

3. Creators & Contributors: There are at least five key figures who contributed to the fruition of Twitch. While there are many more developers, artists, and educators who shape the Processing community, the following persons have playd the crutial roles.

  • Casey Reas: See my Statement of Significance
  • Ben Fry: See my Statement of Significance
  • John Resig: In 2010, Resig—an author of jQuery—developed Processing.js (a JavaScript port of Processing) to enable better implementation of visualization and animation. Written in JavaScript, Processing.js converts Processing codes written in Java and uses HTML5’s <canvas> element to render images. The idea of translating Java to JavaScript suggests a wonderful adaptation strategy to the changing programming environment. Owning to Resig’s work, Processing users do not need to abandon Processing (which operates in now-close-to-obsolete-Java). All Processing users need to do is to include Resig’s JavaScript file “processing-1.0.0.min.js” to their .pde file, just as they have always done.
  • John Maeda: See my Statement of Significance
  • Muriel Cooper: See my Statement of Significance

ACQUISITION PLAN?
1. Source Codes: It is best to acquire source codes from Casey Reas. This approach allows me to acquire .pde source file. *Otherwise, a web brower’s “View Page Source” function (such as the one that Google Chrome offers) lets me have a look at most of the source codes. With the latter method, I can reconstruct the initial Twitch page (window0.html) and the first game window (window1.html). However, as this approach only allows me to inspect the source code generated on the web browser; the files listed above in the table are not complete enough to proceed to the second and consequent game windows.

2. Hosting Environments: The current common web browsers support HTML5 <canvas> elements. Should the future update of web browsers results in malfunction of Twitch, there is a web browser emulation project such as oldweb.today that may be of use.

3. Documentations of Creators & Contributors: Documentation of each individuals’ contribution requires some research. Reas and Fry’s rationale behind the development of Processing has become a part of core philosophy of Processing community and is thus well documented in the official websites such as Processing.org and Processing Foundation. So, too, are the anecdotes and praises of John Resig. His blog post entry, interview, and the reddit’s reaction to Processing.js are some of the records that can be preserved. Additionally, Resig’s role as a leading figure of Khan Academy’s Computer Science and how he utilizes Processing.js as a part of his pedagogy are relevant; and these records should assist us in understanding the climate of Twitch ecosystem.

john-resig copy
John Resig. Image Credit: https://twitter.com/jeresig

Saving XKCD for the Future: Statement of Preservation and Acquisition

After taking into account the cultural importance of the web comic XKCD created and authored by Randall Munroe it has been concluded that an effort should be made to preserve the web comic and related materials. The following statement of preservation and acquisition plan have been created to clarify and guide the preservation process.

Statement of Preservation

The purpose of this preservation project is to preserve as much of the comic XKCD as possible to the best quality we can insure.  It has been concluded that XKCD should be preserved because of its significant cultural value.  XKCD’s unique content provides valuable insight into multiple communities in addition to capturing certain facets of internet culture, making it valuable to both future researchers and the community it services. It has been decided that the best way to preserve the comic is to work with other groups that are already working towards preserving it. The group that the project decided on working with/on in order to best preserve XKCD is the Internet Archive.

The Internet Archive and XKCD

The Internet Archive is an organization whose goal is to preserve as much of the Internet as possible for cultural reasons through a variety of ways.  The method that this project is concerned with is their general web archive, which they call ‘The Way-Back Machine’.  The Way-Back Machine is an archive of website pages that records what the site looked like on a certain day. For example, if I wanted to I could look up and see what the XKCD webpage looked like on November 1st 2010 in the archive. This is accomplished by taking the webpages URL and making a permanent copy of it.  As a result this system allows the internet archive to give users a reasonably authentic experience of the website at that period of time, meeting the quality standards for the project.

In addition to this The Internet Archive has already archived a significant portion of the XKCD webcomic already.  This includes over 800 saved pages and counting. however, this collection is not complete and is missing a number of entries.  There are numerous comic panels missing from the Internet Archive,  the most notable being a period of three months in 2009 where no comics were recorded and entered into the archive.  For this reason the goal of this project is to fill in any gaps in the XKCD collection at the Internet Archive, insure that any future missed content is swiftly added to the Archive, and to make sure the entries function properly.  Doing this would successfully preserve XKCD for the future and fulfill the original intent of the project.

If there are issues in accomplishing this during the process of entering the missing XKCD comics into the Internet Archive the project will preserve those pages using the ‘Archive It’ service.  Archive It is a sister program of The Way-Back Machine and is operated by the Internet Archive as well.  It is stronger, more compatible, and more secure that the Way-Back Machine however, it is a paid service.  If it becomes necessary to use Archive It the project will seek the required funds in order to preserve the problematic entries.

Acquisition Plan

The projects plan for acquiring permission to preserve is rather simple, we operate on the assumption that we already have it.  Because the web comic is in the Internet Archive’s collection already and the XKCD homepage notes that permanent URL it is safe to assume that Munroe has already decided to permit people to archive the comic.  This is doubly so when you consider how the Internet Archive actually acquires things.  The Internet Archive acquires webpages in two ways, crawlers and personal submission.  The Internet Archive uses crawlers to regularly crawl both the internet and the websites selected for preservation.  When a crawler encounters a webpage that is not in the Internet Archive it will submit the pages URL automatically.  Personal submission works just like it sounds, people directly submit a sites URL to the Internet Archive which preserves it by making it permanent.  For this reason it can be concluded that the project has ethical and moral permission to submit XKCD webpages into the Internet Archive since literally anyone is able to do so.  However, if it becomes necessary to use the Archive It service provided by the Internet Archive explicit permission from Munroe will be sought.

In regards to how the project will acquire the actual comic that to is rather straight forward.  Because the goal of the project is to ensure the Internet Archive’s collection of XKCD is complete and has no gaps in content the method of acquisition is the same as the Archives but focused solely on the web comic itself.  The project would set up a dedicated crawler that will regularly crawl the XKCD webpage and compile a list of new URLs as they occur.  Additionally a person(s) chosen by the project will also go over both the website and crawler generated list in order to make sure no entries were missed.  The results will then be compared to the Internet Archives’ collection and if there are any comics missing we will submit the appropriate copy/copies into the archive. Finally if any of the entries do not function properly in the Internet Archive they will be submitted to Archive It.  Overall using this method should guarantee the complete preservation of the web comic.

Conclusion

In conclusion XKCD is consider worth preserving for the future and that is best done by working with and assisting preexisting efforts to do so.  Not only is the web comic rich with cultural of its community but it also acts as an excellent record of their values and interests, making it very valuable to future researchers.  This makes the comic worth preserving and the best way for project to accomplish that is to work with the Internet Archive.  Not only is the Internet Archive already trying to preserve the web comic it also has all of the tools, services, and permission to do so.  This project can assist in this effort by acting as both a back-up and a form quality control, catching and submitting any missing entries and ensuring they function properly.  Overall this project fits a niche in the effort to preserve XKCD that needed to be filled.

Snow Fall: A Preservation Plan

homepageSnow Fall: The Avalanche at Tunnel Creek, the online story by New York Times staff, clearly set an example for online journalism. The article itself won a Pulitzer for reporter John Branch and the overall online story won a Peabody and a Webby. The reaction from readers, journalists, designers and programmers has remained strong since its release in 2012. Considering the story’s impact, it will be important to retain as many features of the multimedia project as possible.

The Website

The online experience of Snow Fall is an immersive multimedia show with numerous moving parts. It includes photographs, slideshows, motion graphics, video interviews, and additional looping videos that are used as graphic elements. However, all of these components may behave differently, or possibly not work at all, depending on the platform you are using. Based on interviews with New York Times staff, it is clear that the main experience on modern web browsers was the primary concern. In order to experience the full effect, a user needs to be on a desktop or laptop.

As a result, the need to maintain a complete view of how the multimedia pieces work within the overall story is critical. A useful first step in this direction, is to check how successful the Internet Archive’s Wayback Machine has been to preserve the entire website. The project has been crawled numerous times over the last several years and the Internet Archive has done an incredible job of capturing the majority of the multimedia elements. It will also provide a way for the article itself to be read into the future. The only items not working however, are the videos interspersed throughout the story.

In order to capture the site in its entirety with the videos intact, web recording seems like the best choice to preserve the overall experience on a user’s screen. The beta version of Webrecorder, an open source web archiving platform developed by Rhizome, is one option worth exploring. A user can type in a URL and either record the site immediately or preview how it will appear. The preview of Snow Fall reveals that only the individual photographs appear along with the article text. Slideshows and videos do not work and the motion graphics don’t render at all.

The biographical text appearing next to mug shots of the skiers does not display properly on Webrecorder.
The biographical text appearing next to mug shots of the skiers does not display properly on Webrecorder.

Additionally, there is some biographical text that appears next to mug shots of the skiers, which does not display properly. A simple solution to this issue would be to use the screen capture function of QuickTime to record how the Snow Fall site currently renders as a user moves through the story. This will record all the multimedia features, including the videos and will preserve the ability to see the videos full screen and how they are linked within the story.

Individual Multimedia Items

An additional possibility for preserving the multimedia items within the website would be to maintain each of them on their own, as final published individual elements. A web recording of the site itself wouldn’t necessarily do justice to each individual element and a user may want to focus in on a few specific multimedia items. It is a common practice for newspapers to save the photographs from every assignment and eventually maintain them in their own archive or donate them to an outside organization. However, the final destination for videos and motion graphics is less clear. At a minimum, it would be useful to preserve the final published versions of the video interviews and the motion graphics as MPEG files. This would allow a user to view and spend more time on select items.

One consideration regarding the multimedia elements however, is copyright. New York Times staff were responsible for the majority of the videos and motion graphics, but some of the photographs and videos were provided by the skiers, their families, or other entities. Unless these were permanently donated to the New York Times, the ownership and resulting ability to archive them is questionable. A potential future project to obtain rights for preserving them is a possibility.

Website Comments

When Snow Fall was released on December 20, 2012 it generated 1155 comments through December 28th. These comments posted directly to the project site will be important for capturing the variety of reactions from readers, but also to gauge the impact of the story around the United States and the world.

PDF of online comments and responses from the New York Times.
PDF of online comments and responses from the New York Times.

Additionally, there were comments selected by the New York Times, which John Branch and one of the skiers responded to. These responses will be especially important to preserve as they add additional context to the published story.

The simplest way of providing access to these will be to output the webpage containing all of the comments and the webpage with official responses as two individual PDFs. Maintaining them in the original website format is not necessary as the content itself is the most meaningful and will be searchable as a PDF.

Online Documentation

The online documentation surrounding the production of Snow Fall is fairly significant. Several media outlets conducted interviews with the staff involved in creating the story, a designer outlined how he produced one of the motion graphics on his personal website, and a blogger defined how most of the code behind the project worked. These outside sources documenting how Snow Fall was conceived and finally constructed are especially valuable because there is no known documentation originating from the New York Times. A designer who worked on Snow Fall explained to me in an email that these types of projects are not generally seen as a tool they are building, but rather unique approaches to specific stories. As a result, they might build off of successful previous ideas, but there is no “Snow Fall maker document” or procedure.

One method for archiving these outside resources would be to save the websites as PDF files. Another option is to create a document, which would point to these sources on the Internet Archive. The latter option seems to be the most useful as it would possibly maintain the links included in the articles, thereby providing additional context. One problem area is capturing the designer’s personal website, which contains videos hosted on Vimeo. These videos documenting the design behind one of the motion graphic elements does not play on the Internet Archive, so using QuickTime to screen capture the videos is a viable alternative.

Conclusion

Online journalism has radically changed over the last twenty years and continues to develop and grow. Preserving this work for the future seems to have finally gained some traction, including one effort at the Reynolds Journalism Institute at the University of Missouri. They have begun to explore this issue by conducting research, leading collaborative projects and generating communication between various stakeholders. The goal of preserving Snow Fall through these various methods is to provide the most useable and accessible elements to users, and it will hopefully serve as a valuable example of what is possible when archiving individual online stories.