digital objects and determining value

This week’s readings were widespread in their content and at times had me feeling a bit at sea with the detailed descriptions of hard drive technology, digital forensics, file formats, etc. (There’s nothing like reading these kinds of things to remind me that I’m nowhere near as technologically proficient as I’d like to think.) I’m grateful for Prof. Owens’ book since it describes digital media and their structures in an accessible, understandable way. I’ll briefly recap his three key points laid out in chapter two, since I saw these ideas echoed throughout the other readings.
1. “All digital information is material.”
Such a basic fact, and yet (as the book mentions) I generally think of my personal digital files in abstract terms, like being lost “in the cloud” or behind this mysterious wall, because my technological know-how is limited.
2. The logic of digital media and computational systems is “the logic of database.”
People interact with digital objects much differently than they engage with analog media. Since databases are ordered based on the query asked of them, digital information can and will always be presented in a myriad of arrangements.
3. “Digital systems are platforms layered on top of each other.”
This one took me a little longer to understand, but I take it to mean that every digital object has multiple informational layers which people are often unaware of. Depending on what someone is studying or looking for, they are going to care about preserving certain layers of the object over others. And these layers are often interdependent on each other.

While reading, I kept thinking of how much we take for granted as we use all of our various devices to function in the world, and the enormous amounts of data and media that will be left behind once we are gone. This quote from the Kirschenbaum article sums up my questions perfectly: “ […] how do these accumulations, these massive drifts of data, interact with irreducible reality of lived experience?” Within the digital preservation field, how do we reconcile that tension between the materiality of our digital footprints and the ephemeral, intangible stuff of life? I’m personally not convinced that you can fully capture someone’s working or personal environment through their digital papers, even with emulation of their computer (thinking of the Salman Rushdie anecdote from the Digital Forensics report). Or even from an ethical standpoint that it’s always advisable. How do we know what digital information is worth saving or recovering, and who deserves access to it?

As the Digital Forensics report points out, it is not immediately clear what digital items are going to have historical or cultural value in the future, making it harder to know what to preserve. And then how can professionals adequately preserve relationships between different items, events, and media (a random aside–the Jackson Citizen Patriot is my hometown’s newspaper. I did a big double take when I read about the photo of the snowmobilers’ accident and its significance.)? This reminded me of our conversation last week about authorial intent, as well. If a creator doesn’t wish for their entire digital footprint to be saved indefinitely (or saved at all), but there is potential cultural value to their information, whose concerns are prioritized? I have a lot of mixed feelings about this. As mentioned before, Trump would love to cover his tracks–he tries daily, either by literally tearing up memos or obfuscating and lying. But the office of the Presidency is bound by laws that prevent this (or try to), and I doubt anyone would argue that these records are not necessary for future generations. Perhaps the question should be, at what point does a person become so culturally or historically influential that their wishes about their data are overridden by other, more pressing concerns?

The Chan & Cope article address these questions from an institutional standpoint. As a museum studies student I was both fascinated by their argument and struggled with it. I definitely like “the stuff” of museums. While I visit museums to be engaged and to relax, oftentimes what draws me to an exhibit (particularly art museums) is a particular piece or an artist whose works I love. I agree with Chan & Cope that collection strategies should serve a different purpose today; there should be a real intention behind acquisition that goes beyond prestige or hoarding mentality.

However, I’m not quite convinced that a “post-objects curatorial practice” is the natural solution. And is it really “post-objects” if a museum instead exhibits the contextual documents surrounding a systems design? The piece that was missing for me was, does a “post-objects” approach reflect the needs of a museum’s community? Collecting a contemporary, provocative item (digital or analog) might generate a lot of buzz, but will it mean something to the average museum-goer beyond taking a selfie with said object? Relevance isn’t necessarily about what’s trending in the moment (btw, The Art of Relevance, a book by Nina Simon, is an excellent read that explores this topic in the museum field in depth).

I realize I have more questions than definitive ideas or opinions in this post. Interested to share more thoughts and discussion with everyone in the week to come!

My Life as a Warning to Others

Hello, everyone. My name is Andy Cleavenger, and I am beginning my fourth year of this two year program.

My life up to this point has been spent as a photographer and multimedia specialist at a government contractor. I work in their Communications department. My interest in this class stems from my role as the sole caretaker of our department’s image collection. For over 17 years I have been the only one capable of performing image searches, and the only one concerned with the preservation of those images. I’m in the Digital Curation track to learn how to effectively turn my collection into a self-service resource available to all employees. And I’m in this class specifically to make sure I’m doing everything possible to ensure the long-term preservation of our image collection.

I must admit that the first axiom listed in Owens – “A repository is not a piece of software” – just about made me stand up from my chair and shout “see, I told you!” at my former boss. We have always treated the image collection as a problem that can be solved with a magic-bullet purchase of DAM software.

“We bought it… we’re done!”

This is of course, extremely common. Like most offices, they forget about the systems that will come after the present one, or the unceasing march of technological progress that dictates both the increasing complexity of the images as well as the expanding diversification of their use. This was nicely summed up in Owens’ last axiom: “Doing digital preservation requires thinking like a futurist.”  I fear that they may regret some of the decisions they’ve made such as stripping all filenames from their videos, throwing everything into a single directory, and then depending on an external proprietary catalog file to save all related metadata.

We are now married to that system… and it’s failing us.

The remaining articles on either side of the digital dark age debate made some equally compelling points. Ultimately, I felt that Lyons and Tansey both came closest to hitting the mark on what form a digital dark age would take, as well as the forces that would drive it. Lyons frames the problem as one of cultural blindness. That is to say that institutions that exist within and serve a particular society tend to have difficulty in recognizing the value in – or even being aware of – the records of other communities. As such, the digital dark age will manifest itself in the silence of these socio-politically disadvantaged communities within the archival record.

This is not an unfamiliar argument, but I tend to think the motivations for its reality are less a conspiratorial omission than they are due to a sad pragmatism driven by extremely finite resources. This point was reflected well in Tansey. She makes the point that the long trend of cuts to budgets and staff force institutions to set priorities that obviously leave gaps in the archival record. In other words, even if an institution has an awareness of fringe communities, and possibly even has a sympathetic collections policy for including those records, the pragmatism of limited resources may still dictate their omission as the institution focuses on its highest priorities.

I have certainly seen this in my position in the Communications department. I’m curious if others in class have seen examples like this in their own workplaces?

Digital Preservation Policy: Web Archiving for the Washingtoniana Collection

Introduction:

In my previous posts on this blog I have surveyed the digital preservation state of the District of Columbia Public library’s Washingtoniana collection. This survey was preformed via an interview with Digital Curation Librarian Lauren Algee  using the NDSA levels of digital preservation as a reference point.

In our survey we discovered that the DCPL Washingtoniana collection has very effective digital preservation which through a combination of knowledgeable practices and the Preservica service (an OAIS compliant digital preservation service) naearly reaches the 4th Level in every category of the NDSA levels of Digital Preservation. With this in mind my next step plan for the archive looks at a number of areas the archive has been interested in expanding and presenting some thoughts on where they could begin taking steps towards preservation of those materials.

Of particular interest in this regard is the collecting of website materials. Being dynamic objects of a relatively new media, collecting these items can be fairly complex as it is hard to precisely pin down to what extend is a website sufficiently collected. Websites may appear differently on different browsers, they may contain many links to other websites, they change rapidly, and they often contain multimedia elements. As such outlined below will be a policy which discusses these issues and specifically offers a digital preservation plan for websites.

Website Digital Preservation Policy for the Washingtoniana collection

The Washingtoniana collection was founded in 1905 when library director Dr. George F. Bowerman began collection materials on the local community. The collection stands as one of the foremost archives on the Washington, D.C area, community, history, and culture. Naturally it makes sense then with the increasing movement of DC social life and culture to online or born digital platforms that the Washingtoniana collection would consider collecting websites.

Selection

The same criteria for determining selection of materials for Washingtoniana materials should apply here. Websites should be considered if they pertain to Washington, DC or its surrounding areas, events that take place in or discus that area, pertain to prominent Washington D.C. related persons, DC related institutions, or websites otherwise pertaining to Washington D.C. community, arts, culture, or history.

Like any physical preservation decision, triage is an essential process. Websites that are likely to be at risk should be high priority. In a sense all web content is at risk. Websites that are for a specific purpose, or pertain to a specific event may have a limited operational window. Websites for defunct businesses, political election sites, and even an existent website on a specific day may be vulnerable and thus a candidate for digitization. In addition the materials in question should not be materials which are being collected elsewhere, and should be considered in relation to the rest of the collection.

Although automation tools may be used for identification, discretion for selection is on librarian hands. In addition, suggestions from patrons relevant to the collection should be considered, and a system for managing and encouraging such suggestions may be put in place.

Metadata

A metadata standard such as MODS (Metadata Object Description Standard ) should be used to describe the website. MODS is a flexible schema expressed in XML, is fairly compatiable with library records, and allows more complex metadata than Dublin Core and thus may work well. Metadata should include but not be limited to website name, content producers, URL, access dates, fixity as well as technical information which may generated automatically from webcrawlers such as timestamps, URI, MIME type, size in bytes, and other relevant metadata. Also, extraction information, file format, and migration information should be maintained.

Collection

A variety of collection tools exist for web archiving. The tool selected should be capable of the below tasks as outlined by the Library of Congress web archiving page

  • Retrieve all code, images, documents, media, and other files essential to reproducing the website as completely as possible.
  • Capture and preserve technical metadata from both web servers (e.g., HTTP headers) and the crawler (e.g., context of capture, date and time stamp, and crawl conditions). Date/time information is especially important for distinguishing among successive captures of the same resources.
  • Store the content in exactly the same form as it was delivered. HTML and other code are always left intact; dynamic modifications are made on-the-fly during web archive replay.
  • Maintain platform and file system independence. Technical metadata is not recorded via file system-specific mechanisms.

A variety of tools are capable of this task, a web crawler such as the Heritrix open source archival webcrawler or a subscription solution Archive-IT should be used. Both are by the Internet Archive, however the first is more of an open source solution while the second is a subscription based service which offers storage on Internet Archive servers.

Upon initial collection fixity should be taken using a Checksum system. This can be automated either with a staff written script or a program like Bagit, which automatically generates fixity information. This information should be maintained with the rest of the metadata for the digital object.

Websites should be kept in the most stable web archival format available. At the moment of this posts writing that format should be the WARC (Web ARChive) file format. This format allows the combination of multiple digital resources into a single file, which is useful as many web resources are complex and contain many items. Other file formats may be accepted if archived webpages are received from donors.

Preservation

Upon initial ingestion items may be kept on internal drives, and copied to at least one other location. Before the item is moved into any further storage system the file should be scanned for viruses, malware, or any other undesirable or damaging content using safety standards as agreed upon with the division of IT services. At this point fixity information should be taken as described above, and entered into metadata record.

Metadata should be described as soon as possible, as to which point the object with attached metadata should be uploaded into The Washingtoniana’s instance of Preservica.

Although Preservica automates much of the preservation process, a copy of the web archive should be kept on external hard drives. On a yearly interval a selection of the items within the harddrive should be checked against the items in Preservica to insure the Preservica fixity checks and obsolesce monitoring are working as desired.

References

Jack, P. (2014, February 27). Heritrix-Introduction. Retrieved November 14, 2016, from https://webarchive.jira.com/wiki/display/Heritrix/Heritrix#Heritrix-Introduction
Web Archiving-Collection development. (n.d.). Retrieved November 16, 2016, from https://library.stanford.edu/projects/web-archiving/collection-development
The Washingtoniana Collection. (n.d.). Retrieved November 16, 2016, from http://www.dclibrary.org/node/35928
Web Archiving at the Library of Congress. (n.d.). Retrieved November 16, 2016, from https://www.loc.gov/webarchiving/technical.html
Niu, J. (2012). An Overview of Web Archiving. Retrieved November 16, 2016, from http://www.dlib.org/dlib/march12/niu/03niu1.html
AVPreserve » Tools. (n.d.). Retrieved November 17, 2016, from https://www.avpreserve.com/avpsresources/tools/
Kunze, J., Bokyo, A., Vargas, A., Littman, B., & Madden, L. (2012, April 2). Draft-kunze-bagit-07 – The BagIt File Packaging Format (V0.97). Retrieved November 17, 2016, from http://www.digitalpreservation.gov/documents/bagitspec.pdf
MODS: Uses and Features. (2016, February 1). Retrieved November 17, 2016, from http://loc.gov/standards/mods/mods-overview.html
About Us. (2014). Retrieved November 17, 2016, from https://archive-it.org/blog/learn-more/