Digital Preservation Policy: Web Archiving for the Washingtoniana Collection

Introduction:

In my previous posts on this blog I have surveyed the digital preservation state of the District of Columbia Public library’s Washingtoniana collection. This survey was preformed via an interview with Digital Curation Librarian Lauren Algee  using the NDSA levels of digital preservation as a reference point.

In our survey we discovered that the DCPL Washingtoniana collection has very effective digital preservation which through a combination of knowledgeable practices and the Preservica service (an OAIS compliant digital preservation service) naearly reaches the 4th Level in every category of the NDSA levels of Digital Preservation. With this in mind my next step plan for the archive looks at a number of areas the archive has been interested in expanding and presenting some thoughts on where they could begin taking steps towards preservation of those materials.

Of particular interest in this regard is the collecting of website materials. Being dynamic objects of a relatively new media, collecting these items can be fairly complex as it is hard to precisely pin down to what extend is a website sufficiently collected. Websites may appear differently on different browsers, they may contain many links to other websites, they change rapidly, and they often contain multimedia elements. As such outlined below will be a policy which discusses these issues and specifically offers a digital preservation plan for websites.

Website Digital Preservation Policy for the Washingtoniana collection

The Washingtoniana collection was founded in 1905 when library director Dr. George F. Bowerman began collection materials on the local community. The collection stands as one of the foremost archives on the Washington, D.C area, community, history, and culture. Naturally it makes sense then with the increasing movement of DC social life and culture to online or born digital platforms that the Washingtoniana collection would consider collecting websites.

Selection

The same criteria for determining selection of materials for Washingtoniana materials should apply here. Websites should be considered if they pertain to Washington, DC or its surrounding areas, events that take place in or discus that area, pertain to prominent Washington D.C. related persons, DC related institutions, or websites otherwise pertaining to Washington D.C. community, arts, culture, or history.

Like any physical preservation decision, triage is an essential process. Websites that are likely to be at risk should be high priority. In a sense all web content is at risk. Websites that are for a specific purpose, or pertain to a specific event may have a limited operational window. Websites for defunct businesses, political election sites, and even an existent website on a specific day may be vulnerable and thus a candidate for digitization. In addition the materials in question should not be materials which are being collected elsewhere, and should be considered in relation to the rest of the collection.

Although automation tools may be used for identification, discretion for selection is on librarian hands. In addition, suggestions from patrons relevant to the collection should be considered, and a system for managing and encouraging such suggestions may be put in place.

Metadata

A metadata standard such as MODS (Metadata Object Description Standard ) should be used to describe the website. MODS is a flexible schema expressed in XML, is fairly compatiable with library records, and allows more complex metadata than Dublin Core and thus may work well. Metadata should include but not be limited to website name, content producers, URL, access dates, fixity as well as technical information which may generated automatically from webcrawlers such as timestamps, URI, MIME type, size in bytes, and other relevant metadata. Also, extraction information, file format, and migration information should be maintained.

Collection

A variety of collection tools exist for web archiving. The tool selected should be capable of the below tasks as outlined by the Library of Congress web archiving page

  • Retrieve all code, images, documents, media, and other files essential to reproducing the website as completely as possible.
  • Capture and preserve technical metadata from both web servers (e.g., HTTP headers) and the crawler (e.g., context of capture, date and time stamp, and crawl conditions). Date/time information is especially important for distinguishing among successive captures of the same resources.
  • Store the content in exactly the same form as it was delivered. HTML and other code are always left intact; dynamic modifications are made on-the-fly during web archive replay.
  • Maintain platform and file system independence. Technical metadata is not recorded via file system-specific mechanisms.

A variety of tools are capable of this task, a web crawler such as the Heritrix open source archival webcrawler or a subscription solution Archive-IT should be used. Both are by the Internet Archive, however the first is more of an open source solution while the second is a subscription based service which offers storage on Internet Archive servers.

Upon initial collection fixity should be taken using a Checksum system. This can be automated either with a staff written script or a program like Bagit, which automatically generates fixity information. This information should be maintained with the rest of the metadata for the digital object.

Websites should be kept in the most stable web archival format available. At the moment of this posts writing that format should be the WARC (Web ARChive) file format. This format allows the combination of multiple digital resources into a single file, which is useful as many web resources are complex and contain many items. Other file formats may be accepted if archived webpages are received from donors.

Preservation

Upon initial ingestion items may be kept on internal drives, and copied to at least one other location. Before the item is moved into any further storage system the file should be scanned for viruses, malware, or any other undesirable or damaging content using safety standards as agreed upon with the division of IT services. At this point fixity information should be taken as described above, and entered into metadata record.

Metadata should be described as soon as possible, as to which point the object with attached metadata should be uploaded into The Washingtoniana’s instance of Preservica.

Although Preservica automates much of the preservation process, a copy of the web archive should be kept on external hard drives. On a yearly interval a selection of the items within the harddrive should be checked against the items in Preservica to insure the Preservica fixity checks and obsolesce monitoring are working as desired.

References

Jack, P. (2014, February 27). Heritrix-Introduction. Retrieved November 14, 2016, from https://webarchive.jira.com/wiki/display/Heritrix/Heritrix#Heritrix-Introduction
Web Archiving-Collection development. (n.d.). Retrieved November 16, 2016, from https://library.stanford.edu/projects/web-archiving/collection-development
The Washingtoniana Collection. (n.d.). Retrieved November 16, 2016, from http://www.dclibrary.org/node/35928
Web Archiving at the Library of Congress. (n.d.). Retrieved November 16, 2016, from https://www.loc.gov/webarchiving/technical.html
Niu, J. (2012). An Overview of Web Archiving. Retrieved November 16, 2016, from http://www.dlib.org/dlib/march12/niu/03niu1.html
AVPreserve » Tools. (n.d.). Retrieved November 17, 2016, from https://www.avpreserve.com/avpsresources/tools/
Kunze, J., Bokyo, A., Vargas, A., Littman, B., & Madden, L. (2012, April 2). Draft-kunze-bagit-07 – The BagIt File Packaging Format (V0.97). Retrieved November 17, 2016, from http://www.digitalpreservation.gov/documents/bagitspec.pdf
MODS: Uses and Features. (2016, February 1). Retrieved November 17, 2016, from http://loc.gov/standards/mods/mods-overview.html
About Us. (2014). Retrieved November 17, 2016, from https://archive-it.org/blog/learn-more/

 

Pottermore – the Archival Information Package

I was able to put my Preservation Plan into action by uploading a Pottermore Collection to the Internet Archive in addition to saving the collection on my laptop. Here’s a brief recap of my Preservation Plan:

  • Capture this YouTube video that announced the launch of Pottermore in 2011, saved by the youtube-dl downloader.
  • Archive the Pottermore Wikia, using their own archiving tools to download the xml files.
  • Download the images from the Pottermore Wikia separately, since the xml files don’t include them.  This was going to involve the command line method, or if that didn’t work, to curate a selection of images from the collection.
  • Save this Pottermore entry from the Harry Potter Wikia, which details the description and history of the site.
  • Save Let’s Play videos that can be found on YouTube to capture the interactivity of Pottermore, using the youtube-dl downloader.

I’ve officially uploaded what I’ve collected so far to the Internet Archive, check it out here: https://archive.org/details/Pottermore.

Internet Archive Pottermore
What my Internet Archive collection looks like!

The first file I included was a PDF of the Pottermore entry from the Harry Potter Wikia.  This entry gives a full description and history of Pottermore.  I concluded that since it was only one entry, and the text is what matters more than anything else, a PDF would suffice.  The next folder includes a selection of images from the Pottermore Wikia.  This is what I was really happy about, since this is a feature that a lot of people enjoyed from the first Pottermore that isn’t as present in the newer version.  Since I couldn’t figure out that command line method that I had written about in my Preservation Intent Statement, which was supposed to capture all of the images from a Wiki, I had to go through one by one on the Pottermore Wikia image directory and download them.  Since there are 51 pages of images, with each page containing at least 40 images, I will be uploading one page’s worth of images at a time (as of this post, I have two pages’ worth of images uploaded to the Internet Archive). I save all of the images in their original format, which are either .jpg or .png files.  The final folder contains the XML files of the Harry Potter Wikia, which I had downloaded using the tools provided by the Wikia itself.

What I did not upload to the Internet Archive (due to copyright uncertainties) but have saved to my Pottermore folder on my computer are the videos.  I used the youtube-dl downloader to save the Pottermore launch video from 2011 as well as some Let’s Play videos to capture the experience of playing Pottermore.  All of the videos were saved in .mp4 format.

Below is a screenshot of the collection I have on my computer:

screenshot of my Pottermore collection
Screenshot of my Pottermore collection on my laptop.

I arranged the folders according to the different aspects of Pottermore that were saved.  The first folder contains the history of Pottermore, which includes the Harry Potter Wikia entry.  The second folder involves the Let’s Play videos, which capture the experience of playing Pottermore.  The next folder contains the Pottermore images, which are either in .jpg or .png format.  Some of the images are labeled either with descriptions, usually the names of the characters in the images (for example, “Hokey” or “Hooch”).  However, most of the images are named after their location within Pottermore.  For example, B1C11M1 = Book 1 (Harry Potter and the Sorcerer’s/Philosopher’s Stone), Chapter 11 (“Quidditch”), Moment 1 (“Charms Homework”).  This will help orient the viewer as to the order of images within Pottermore.  The next folder is Pottermore Launch, which includes the 2011 YouTube video that announced the coming of Pottermore.  The final folder contains the Pottermore Wikia pages in xml format.

What this collection really comes down to is trying to capture the essential elements of a website that, for our present purposes, no longer exists.  I am hoping that with the xml files of the wiki, the images that provided the interactive layers, and the let’s play videos that show how the game was played, that this goal was accomplished.

Gone in 6 Seconds: “Transforming” Preservation Intent Statement

The preservation plan for Rob and Nick Carter’s “Transforming,” a series of 4 digital paintings created as an homage to centuries old artworks (detailed here) is, much like the works themselves, more complicated than at first glance.

While I had some concern about legal protections written into these works,  according to an email with Rob Carter, there is no digital rights management, mainly they rely on certificates of authenticity that go along with the 12 editions and 5 artist’s “proofs” (which can be given to museums for display to the public). The certificate also entitles the owner to another copy should anything happen to the original.

The cost of obtaining this certificate and the original work is huge with one “Transforming Still Life” selling for $105,000 and one “Transforming Nude Painting” selling for 100,000 pounds. Even though there are artists proofs, I believe that they are only for display in exhibition. To acquire all of these items and their constituent parts in a permanent collection would be prohibitively expensive for most institutions not even considering preservation or the inability to make these items widely accessible due to copyright.

Beyond the cost, Rob and Nick Carter, and whoever else may be involved, are fairly attentive to their pieces currently.  They have backups in three separate locations and also in a “data safe.” Additionally, they actively upgrade the technology and software to improve display quality and to keep them working into the future. The artists also recently had to upgrade their “Transforming Diptych” work so that it would run on a new OS.

With the works under fairly good control and somewhat unattainable, it seems less pressing and unrealistic to focus on preserving the works themselves (at least the video finished product and the application code). Additionally, there are shortened example versions of these videos available online which I plan to preserve and serve as representative stand-ins. What is more important then is to document the process of creating this new genre of art and the conversation and reaction to these pieces.

At the end of my statement of significance I said that:

“Therefore, documenting “Transforming” means documenting the cultural conversation around media consumption in the early 21st century.”

This was a summation of the Carter’s goal to create works that awarded viewers for engagement beyond the average 6 seconds an average museum patron looks at an artwork. To reach the goal of documenting this cultural conversation, I plan to preserve the various video interviews from art historian Kate Bryan commenting on the themes in the paintings. These videos provide a rich and invaluable context and a present day scholarly perspective on the works which will be valuable to art scholars in the future.

Additionally, there are several video interviews with the artists themselves about what inspired them to create these works. All of these will be essential in preserving the scholarly communication about these pieces and the originals that they were inspired by. I have contacted Rob and Nick once and they are too busy to do an extensive interview, so these will have to suffice.  The information is the key part about these videos, rather than their look and feel. Therefore, it is not that important to ensure their visual quality or that they remain in the same format.

Furthermore, I plan to preserve all of the videos from Motion Picture Company explaining their role in creating the artworks. Their videos provide insight into the technological processes and difficulties behind the scenes that will be valuable to scholars of new media, animation, and design in the future. Because the visual nature is more important in these videos, I will preserve and maintain these in the highest quality formats available.

I plan to reach out to some people that worked on the team to see if they will do more interviews on the challenges and their experiences creating these pieces. I will compile these together as a complete document (most likely as a PDF document) that scholars and future artists will be interested in the years to come. This will further the goal of documenting the creation of this new genre of art without dealing with any concerns of providing access to the original substituent digital objects.

mediainfo_logo

Most of the videos mentioned above are either embedded directly on a webpage or in video players like vimeo or youtube. Thus, I will download them either by using a simple “save as” or by using youtube-dl. In addition, I will use tools like mediainfo to extract technical metadata (most likely in PBCore 2.0)  to accompany these videos. Furthermore, I will use FFmpeg (and this handy ffmprovisor to decrease the learning curve) to generate MD5 hashes for future fixity checks and also transcode any videos that may be in an at risk format.

Much of the online press coverage of these works, whether video or web articles have not been saved in the Internet Archive. I will ensure that these pages are preserved as part of the record of reaction and scholarly communication mainly using the “save page now” functionality of the Wayback Machine. Additionally, I will save a copy of the html as it is delivered to my screen and create a collection that can be accessed in a central location.

Actual viewer reaction is much more difficult to ascertain, with only a few mentions in news articles or in generic postings from gallery attendees saying something similar to “this is cool” along with a picture on social media. In order to preserve the audience’s experience, I will go beyond the limited reactions and try to interview some of these people that posted on social media and see if they will expand further (hopefully they remember!). These interviews, as above, will most likely be saved as a compiled PDF document.

In the end, I plan on uploading all of these objects as a collection in the Internet Archive ensuring long term access.

Project Reflection: stereomap

Instead of boring you with every inane detail of my project, this post will weave a narrative of the most important trials, tribulations, and things I learned from constructing my project: stereomap, a site devoted to geocoding animated stereographs.

stereomaplogo

Trial 1: Overcoming a Dead End

Many (or should I say the few?) of you who read this blog outside of the students in the class might be thinking “hey, isn’t that the guy that was doing that project of mapping unbuilt spaces in Washington, D.C.?” Yes, you are right, it was me but shortly into starting the project I discovered a number of distressing details that made me switch my topic. First, it turns out the Histories of the National Mall site is in the process of doing a number of explorations on my very subject and will be releasing them sometime soon. To make matters worse, I learned the National Building Museum did an exhibit called “Unbuilt Washington” in 2011 and created an online map for it detailing the unbuilt spaces. My exact project idea! This was my lowest point in this process, I had no clue where to go from here.

stereogranimator

Enter: the Stereograminator

Having attended a MITH digital dialogue earlier this year, I learned about the Stereogranimator, a tool from NYPL labs for animating stereographs and it came back to me when I was racking my brain for a new project idea. In an “MTV Cops” moment I thought, “wouldn’t it be cool if you could take these animated stereograms and map them in the style of HistoryPin?” These images typically feature a distinct location and could benefit from the context of geographic space. I chose NYC for the ease of using the over 3,000 stereographs focused on the city and held by the NYPL. With my crisis averted by deciding to create a map and website to fulfill this project idea, I started figuring out the logistics of its implementation.

Trial 2: You Can Map GIFs, Right?

While there is a glut of mapping software out there, few handle animated GIFs well in their information boxes, often cutting off images, making them static again, or not displaying the images at all. Finding a tool that overcame these challenges became my top priority in making this project feasible. Along with my main goal, I hoped to find an easy-to-use, mobile friendly, free, and still decently attractive interface. Looking through many map options (Google My Maps, Mapbox, OpenStreetMap, CartoDB, WorldMap, Scribble Maps, and on and on), I finally found one that actually would work: ZeeMaps. While not gaining full points on the attractive interface scale, this site fulfilled the rest of my requirements mentioned above. In finding the right mapping service, I learned a lot about evaluation of digital tools, compromise, and to understand practical limitations. With this crucial element decided, I started building the map and the website to host it.

stereomapzoom
stereomap in action, GIFception!
Trial 3: Building Diversity

As I began constructing my site and its elements, I started to learn more about the collections themselves. It was difficult to create a diverse mix of selected points due to the biases towards certain subjects and areas. If historians were to look at the collection as a documentary example of the late 19th to early 20th century, then it could summed up as a white man wearing a bowler hat in lower Manhattan.

While lower Manhattan was a cultural center then as it is today, the collection overlooks important segments of the Black population in Harlem and other parts of the city. Even in stereographs focused outside of New York City where Blacks are subjects, they are depicted in racist ways as minstrel characters. Women and the lower classes were also seldom depicted other than to emphasize their need of saving from destitution. These characteristics made it difficult for me to create a wide ranging selection of subjects, however, it drove home the point of the photographers’ biases and the frequent inadequacy of the documentary record.

Trial 4: Becoming a Bot

As I was building, I also was promoting the site at the same time. Taking an idea from the Trevor Owens, I decided to “curate in the open” and publicly share each image I made and considered using as I went. This was both to generate interest and to aggregate all the links to use in the project. I chose Twitter as my main sharing platform because I already had an account (although not too many followers) and all my tweets were open to the public. Overall, judging from my Twitter analytics, my tweets were mainly seen by my followers but some of them did seem interested. Some of them seemed disturbed:

stefanreaction
He’s normally a nice guy.

I realized that Twitter may not have been the best platform for this part of my project. In sending out multiple tweets in rapid succession, it seemed to my followers that I was becoming a bot, taking over their timelines like the bots of conviction we read about earlier this semester. Certainly some were alright with this, but I’m sure many did not appreciate having these images forced upon them. Perhaps a more image focused site like tumblr would have served this purpose better. Whether or not I chose the right social media platform, I do believe the effort was worthwhile and drew more attention to my project than simply keeping it behind closed doors until a big reveal in the end.

Conclusion

From all my trials I learned how to weigh options, choose between resources, and create a deliverable product. In the end, I overcame my trials and created a usable website that met the goals I set when beginning this journey. Thank you, dear reader, for following along with me throughout the semester and in this post. I hope you take a look at the site and send me your thoughts.

 

History Unmade: Physical Space Reimagined in Washington D.C.

Historians place emphasis on revealing a part of the past by showing not only what was, but also what could have been. In particular, many focus on how different groups had agency in their situations and the possibility to shape outcomes very different than what actually occurred. What if we bring this notion of agency to the history of the built environment? Few people realize how different the world around them could have been had one building design been chosen over another. These decisions are often contested battlegrounds and the history of Washington, D.C.’s design is no different.

A Very Different Capital City

The designers of D.C. itself made it as a monumental city to represent America to the world. The decisions made about where and what was built were each scrutinized tremendously and the structures that came out of these decisions have become the iconic symbols associated with this country. Notwithstanding their current greatness, wouldn’t it be cool if this was the Lincoln Memorial sitting at the end of the mall?

03704_2011_001_PR
John Russell Pope’s Competition Proposal for a Pyramid with Porticoes Style Monument to Abraham Lincoln. Credit: National Archives

The Library of Congress has highlighted some of these designs and their history in the book, Capital Drawings: Architectural Designs for Washington, D.C., from the Library of Congress. While this book does a good job of explaining the context of these drawings in history, I think that placing them in the context of the space they would have occupied through the visualization on a map is much more powerful. The Center for History and New Media has created a great interactive map site called Histories of the National Mall where users can interact and learn the history of the mall as they walk around. While this site is excellent for actual histories that have taken place, it still leaves room for the histories of the imagined spaces on the mall that never were.

Deliverable

Similar to HistoryPin and PhilaPlace,  by using the Google My Maps application, I will create an interactive map, placing designs never built into the landscape, using images from the Library of Congress, National Archives, Maryland Historical Society, among others. I will start with the monuments and public buildings surrounding the national mall, and expanding to other locations should time and resources permit. Building off the map, I will create an exhibit site for this topic using the Omeka content management system and embed the map on it. The images used on the map will be placed on this site as well, making them browsable and usable in online exhibits on each building. Through the exhibit pages, I will provide the context of each design’s history, found in Capital Drawings and other books on the subject.

Histun
Current Status of Omeka Site

Audience

So there will be a map and website, but who will use it? This idea percolated in my mind for a while and oddly enough, at the beginning of February, the History Channel website posted an article called The Lincoln Memorial’s Bizarre Rejected Designs. This article received 24,000 likes and 8,500 recommends on Facebook. Clearly, there is a sizable audience for this topic in the wider community of amateur history buffs, local Washington, D.C. residents, or even the general populace that has grown up with the iconic monuments. Scholars of architecture, historic preservation, and history would also be interested in examining and learning about the possibilities of a cityscape that does not exist in reality today.

Publicity

To gain interest in the project, I will contact the repositories whose collections I am utilizing, in hopes that they would advertise it on their website, social media, and to patrons. Furthermore, the Center of History and New Media is a good partner to spread the word, as their Histories of the National Mall Site is closely connected and they know the constituents who would be interested in this type of project. Beyond these routes, I will contact local media outlets and use personal social media accounts to publicize.

Evaluation

Once the site is active, I will solicit feedback from users on the user experience and content of the site. Suggestions for future places would be useful to both have new material to post as well as tailoring the website to what the users want. Ultimately, there is no way to know if the users learn from the site, only that it has reached them through usage numbers. Hopefully, this site will give users an understanding that the space they inhabit is not static and encourage them to imagine what can be.

Project Idea Brainstorming

1) For the past two semesters I’ve been working with patient file records from a nineteenth century asylum in Washington D.C. One of my main ideas for a project would be to digitize the patient files and photos, as well as other records from the asylum, and create an online exhibit of the daily lives/experiences of the female patients during their commitment. I would also want to include newspaper articles that discussed when patients had medical trials or escaped from the asylum and analyze the characterizations of the mentally ill and poor in print media.

2) Another idea I have also relates to print media. I would like to analyze advertisements for abortifacients and “feminine hygiene” products before and during the Comstock Act and compare/contrast more modern advertisements since Roe v. Wade. I could focus on concepts/perceptions of femininity and public health, as well as medical knowledge and advice.

Create an archive of old newspaper and almanac articles, political cartoons and commentary concerning women and voting in New Jersey between the Constitution and 1808.

Bridget Sullivan Project Ideas

1. Create a digital exhibit surrounding the expansion of the slave trade in Newport, RI prior to the American Revolution. This exhibit would reflect the recent scholarship concerning the state of slavery in the Northern states.

2. Compile a digital archive of documents concerning brothers John and Moses Brown. These men were civic and political leaders in Providence, RI in the early republic period and embody the beginning of the abolition conflict in the United States.

3. Work with Smithsonian Gardens to further develop their online presence through digital programming as well as social media outlets such as Facebook and Twitter.

4. Analyze the current state of museum and archive digitization efforts and how these efforts are affecting the way that the public interacts with historical artifacts.

5. Compare, contrast and analyze how major historical institutions, ex. Smithsonian, National Archives, use social media for development and marketing purposes and how the advent of the digital age has affected their target audience.

 

Corey’s Project Ideas

I am currently interning with the Smithsonian Gardens and we are looking for a way to virtually map out various community gardens around the nation. It would be great to model this off a similar open source program (perhaps Ushahidi?), creating an interactive map where the public could share stories of how community gardens preserve their cultural heritage. This outreach platform would allow people to engage in discussions about the relationship between community gardening and history at a national level.

Smithsonian Gardens is also expanding its social media objectives. One possible project I could work on would be to analyze and evaluate the effectiveness of Facebook and Twitter in getting people to the gardens. We are currently designing panels to be placed in the gardens highlighting our newest program, Let’s Move! with Smithsonian Gardens. It may be interesting to see how people are responding to weekly challenges on Facebook and Twitter by having them visit these panels and use a provided text service to respond to the questions.

Another Digital Project – History of Children

Our class might be over, but I’ve really enjoyed the possibilities of creating digital projects to aid researchers in history. Recently, I’ve been trying to do some research that deals with the history of children and childhood. Unfortunately, I’ve found it difficult to find very many sources on this subject, but those that I have found I’ve compiled. Rather than just keeping it all in my own records, I thought, why don’t we keep more of our sources that we find in some online directory to help others who may have the same interests? Well, I already have my dreamhost account, and it’s only another $10 to register another site, so I went ahead and registered a new website on the History of Children and Childhood.

 

I don’t know if anyone from my original digital history course will come back to this website, but I’d be interested in any feedback anyone might have on this new project of mine. So, if you have a moment, check out www.childhistory.org and let me know what you think.

Best,

Dennis