Challenging traditional archival principles

Our readings this week covered description and arrangement in digital preservation and challenged the effectiveness of archival principles respect des fonds and provenance for new media, objects.

Database nature of new media objects

Lev Manovich details how new media objects are essentially databases. Digital objects are a layered collection of items. Users can interact with the same digital object in a variety of ways, meaning the objects lack a linear narrative.

Manovich introduces videogames as an exception. On the surface level, players interacting with the game follow a narrative and pursue defined goals. However, Manovich goes on to clarify that to create a digital object is to create “an interface to a database” and that the content of the work and its interface are actually separate. Even while playing a video game, which seems to follow a narrative, players are only going to points mapped out by the database creators. The database nature of new media objects contrasts the narratives often provided by analog objects, meaning new methods for describing and arranging digital objects are needed.

Describing New Media Objects

Professor Owens details Green and Meissner’s suggestion of More Product, Less Process (MPLP). Green and Meissner believe that organizations should avoid putting preservation concerns before access concerns. Collections should be minimally processed so that they can be accessed by researchers sooner. Item level description should be provided rarely. For arrangement and description, archivists should strive for the “golden minimum.”

Owens provides the 4Chan Archive at Stanford University as an example of using the MPLP approach for digital objects. The archive is available as a 4 GB download, an example of quick and easy access. Stanford opted to include limited but informative description, including the scope of the collection and metadata for the format, date range, and contributor.

Owens also states that digital objects are semi-self-describing due to containing machine-readable metadata. Owens uses tweets as an example. Underneath the surface, tweets contain a lot of informative metadata, such as the time and time zone.

In an effort to describe Web Archives, Christie Peterson tested Archivists’ ToolKit, Archive-It, DACS, and EAD. Peterson found that the “units of arrangement, description, and access typically used in web archives simply don’t map well onto traditional archival units of arrangement and description.” Discussing Archive-It, Peterson describes the break-down of the tool. Archive-It uses three categories: collections, seeds, and crawls. An accession of a collection of websites would be a crawl. Peterson found that there were no good options for describing a crawl. She could not say what the scope of the crawl was or explain why certain websites were left out. This means current tools and methods leave archivists unable to document their activity, creating a lack of transparency.

Challenging Archival Principles

Owens defines original order as “the sequence and structure of records as they were used in their original context.” Original order maintains context and saves time and effort from being spent reorganizing and arranging content, leading to faster access. However, maintaining original order can be difficult for digital objects.

Jefferson Bailey describes an issue with following traditional archival principles with digital objects. Since every interaction with a digital object leaves a trace of that interaction, there is no original order. Bailey explains that with new media objects, context can “be a part of the very media itself” since digital objects can be self-describing. Attempting to preserve original order is unnecessary as meaning can be found “through networks, inter-linkages, modeling, and content analysis.”

Bailey also gives a history of respect des fonds. This principle comes from an era of, and thus is designed for, analog materials. Respect des fonds made the organization of records focus on the creating agencies. Some critiques of the principle are that there is not always a single creator, those who structured the documents may not be the creators, and that original order “prioritizes unknown filing systems over use and accessibility.”

Jarrett Drake asserts that provenance is an “insufficient principle” for preserving born-digital and socially inclusive records due to its origins rooted in colonialism. The provenance principle asserts that records of different origins should not mix. The principle became popular in the United States in the early 20th century, when few were able to own and control their records.

When it comes to digital objects, Drake states “the fonds of one creator are increasingly less distinct from the fonds of other creators.” He provides the example of Google Drive, which allows multiple people to collaborate on document creation. Another change in the times that affects provenance is the rise in people who are able to create and own their records. Nowadays, people are able to name and describe themselves. According to Drake, archivists should support this and name creators in archival description according to their self-assertion.

According to Owens, using community-provided descriptions is becoming popular. To create the online exhibition The Reaction GIF: Moving Image as Gesture, Jason Eppink asked the Reddit community for canon GIFs and descriptions of them. Eppink wanted to mark what GIFs meant to those who used them and getting the description directly from the community enabled him to do that.

Our readings also assert that, when dealing with multiple copies, it’s easier to keep all of them. As Catherine Marshall states, “Our personal collections of digital media become rife with copies, exact, modified, and partial.” One copy may have better metadata, another better resolution, and so on. We have so many copies that the “archival original” is decentralized and not straightforward to determine. Marshall states that it is better to keep these copies than delete them. This is due to people having too many copies, storage being so cheap, and people not knowing which copy they’ll want in the future.

Discussion Questions

Our readings lately have been asserting the value in allowing communities to describe their records. In chapter 7, Owens points out that giving description over to the end user can “easily result in spotty and inconsistent data.” How can archives maintain a balance between empowering communities and keeping quality, consistent data?

What are your thoughts on permitting anonymity in archives? Do you think that it’ll lead to doubt over the validity of the record later on? How can archives demonstrate truthfulness in a record while protecting the creator’s identity?

Preservation intent without the racism

Preservation intent

This week’s readings all build off the concept of preservation intent. In Chapter 5, Owens raises the two key questions of preservation intent: What about the object do you want to preserve? What do you need to do to preserve this aspect of the object? These questions should not be asked just at the beginning of the process, but continually during the processing of the content.

Some files will not be kept or preserved after considering preservation intent. The Rose Manuscript, Archives, and Rare Book Library of Emory University opted to remove such unnecessary files after acquiring Salman Rushdie’s laptop. After realizing preservation intent, Owens notes that archivists may choose not to preserve the object itself, but documentations of the object’s use. This was the case with Nicole Contaxis of the National Library of Medicine choosing to preserve a how-to on using the Grateful Med database, rather than the entire database itself. Owens also uses the example of preserving a screenshot for the Form Art website, rather than an emulation. The National Library of Australia (NLA) opted to use this preservation method for their PANDORA web archive as well.

This week’s readings also introduced the concept of significant properties, which are the aspects of digital objects that must be preserved if the object is to have continued significance. Webb, Pearsen, and Koerbin note that this concept can be more hindering than helpful for approaches to digital preservation. The authors assert that preservation intent should be declared before determining what the significant properties are, meaning the significant properties would be subjective rather than objective and universal. Yeo affirms this notion that there are no objective measures for value. There is no single method for digital preservation and preservation approaches should depend on the situation.

Archives and marginalized communities

Community participation is important for the preservation of objects from marginalized communities. If archival repositories intend to preserve the events and movements surrounding marginalized communities, such participation should be considered when creating the preservation intent.

Shilton, Srinivasan, Jules, and Drake note the foundation of white supremacy instilled in traditional archives – archival processing has historically been done without the input of marginalized communities, leading to finding aids that lack proper context and collections without diverse perspectives.

Shilton and Srinivasan promote participatory archival processing, which means seeking and listening to input from the actual community creators of the objects. Having “narrative and thick descriptions” from the community lead to contextual knowledge for the archival collection. Shilton and Srinivasan refer to these as empowered narratives, where the community is no longer being spoken for by archivists.

Jules addresses the significance of social media as it relates to protesting and causes. Social media helps promote awareness, allows news to spread faster, and helps movements grow. The preservation of social media is an obvious choice if an archive is interested in such topics, but it also leads to ethical and legal issues. Jules raises the concerns of protecting people in the social media collection, such as from repercussions from the police, and potentially being sued by the social media platform.

Drake promotes building allyship with marginalized communities should archives want to preserve their memory. Allyship involves genuinely wanting to learn about the lived experience of people in these communities and aid them, not just using them for collection development. Drake suggests that archivists help build up community archives and use their existing collections “to host dialogues, programs, and exhibits” around the issues faced by the community.

Affordances of media

In Chapter 3, Owens asserts that we can easily make quality informational copies with computing, but that artifactual qualities of these objects are lost. This is due to the platform layers involved with digital objects, à la the cake analogy used last week by Margaret Rose. Owens provides the example of the Rent script, where only the final version was visible in Word 5.1, while Jonathon Larson’s edits were visible in other text editors. According to Owens, the digital preservation field has essentially given up on preserving these additional layers and artificial qualities.

With a focus on informational preservation and ease of use, Arms and Fleischhauer detail the sustainability, quality, and functionality factors that the Library of Congress considers when selecting digital formats for preservation. The 7 sustainability factors are disclosure (documentation of the format), adoption (how widely used it is), transparency, self-documentation, external dependencies and how much work it will be to preserve them, impact of patents, and technical protection mechanisms like encryption.

The format selection is also dependent on who the primary audience is and how will they use the object. This determines what quality and functionality factors are given priority when choosing a digital format. The authors provide an example of the factors for the preservation of a still image: normal rendering (on-screen viewing and printing), clarity (high resolution), color maintenance, support for graphic effects and typography (filters, shadows, etc.), and functionality beyond normal rendering (layers and 3-D modeling).

Arms and Fleischhauer’s article offers technical insight on the outcome of considering media affordance and preservation intent. I found the article more difficult to understand but appreciated its technicality.

The media affordance available to the NLA led to hurdles with their web archiving. They deal with incorrect renderings due to their software, difficulty in preserving websites with many file types, unpredictable issues during batch preservation actions due to the idiosyncratic structures of websites, and some access issues due to file obsolescence. For the NLA, preserving the website “content, connections, and context” are of primary importance, while the preservation of the site is secondary. The master copy is preserved to the bit level, but the display copy, which undergoes preservation actions such as migration for long-term access, is of greater importance. The snapshots that the NLA preserve also only retain limited functionality of the original websites.

Discussion Questions

Have you noticed a lack of diversity or evidence of white supremacy in collections or records that you have worked with? How do you intend to address these issues when working as an information professional?

How can information professionals effectively assist digital community archives, when digital preservation is permeated with quick obsolescence and continual need for migration, documentation, auditing, and so on?

When creating their preservation intent, the National Library of Australia digital preservation team consider what “adequate access” for the digital objects are and how long that access needs to be maintained. This is a new concept for me because I generally think “forever” or “as long as possible.” I suppose an object only needs to be kept as long as it’s useful, but how do we determine what will be useful 25, 50, or 100 years from now? How do we know when to deaccession a digital object?

digital objects and determining value

This week’s readings were widespread in their content and at times had me feeling a bit at sea with the detailed descriptions of hard drive technology, digital forensics, file formats, etc. (There’s nothing like reading these kinds of things to remind me that I’m nowhere near as technologically proficient as I’d like to think.) I’m grateful for Prof. Owens’ book since it describes digital media and their structures in an accessible, understandable way. I’ll briefly recap his three key points laid out in chapter two, since I saw these ideas echoed throughout the other readings.
1. “All digital information is material.”
Such a basic fact, and yet (as the book mentions) I generally think of my personal digital files in abstract terms, like being lost “in the cloud” or behind this mysterious wall, because my technological know-how is limited.
2. The logic of digital media and computational systems is “the logic of database.”
People interact with digital objects much differently than they engage with analog media. Since databases are ordered based on the query asked of them, digital information can and will always be presented in a myriad of arrangements.
3. “Digital systems are platforms layered on top of each other.”
This one took me a little longer to understand, but I take it to mean that every digital object has multiple informational layers which people are often unaware of. Depending on what someone is studying or looking for, they are going to care about preserving certain layers of the object over others. And these layers are often interdependent on each other.

While reading, I kept thinking of how much we take for granted as we use all of our various devices to function in the world, and the enormous amounts of data and media that will be left behind once we are gone. This quote from the Kirschenbaum article sums up my questions perfectly: “ […] how do these accumulations, these massive drifts of data, interact with irreducible reality of lived experience?” Within the digital preservation field, how do we reconcile that tension between the materiality of our digital footprints and the ephemeral, intangible stuff of life? I’m personally not convinced that you can fully capture someone’s working or personal environment through their digital papers, even with emulation of their computer (thinking of the Salman Rushdie anecdote from the Digital Forensics report). Or even from an ethical standpoint that it’s always advisable. How do we know what digital information is worth saving or recovering, and who deserves access to it?

As the Digital Forensics report points out, it is not immediately clear what digital items are going to have historical or cultural value in the future, making it harder to know what to preserve. And then how can professionals adequately preserve relationships between different items, events, and media (a random aside–the Jackson Citizen Patriot is my hometown’s newspaper. I did a big double take when I read about the photo of the snowmobilers’ accident and its significance.)? This reminded me of our conversation last week about authorial intent, as well. If a creator doesn’t wish for their entire digital footprint to be saved indefinitely (or saved at all), but there is potential cultural value to their information, whose concerns are prioritized? I have a lot of mixed feelings about this. As mentioned before, Trump would love to cover his tracks–he tries daily, either by literally tearing up memos or obfuscating and lying. But the office of the Presidency is bound by laws that prevent this (or try to), and I doubt anyone would argue that these records are not necessary for future generations. Perhaps the question should be, at what point does a person become so culturally or historically influential that their wishes about their data are overridden by other, more pressing concerns?

The Chan & Cope article address these questions from an institutional standpoint. As a museum studies student I was both fascinated by their argument and struggled with it. I definitely like “the stuff” of museums. While I visit museums to be engaged and to relax, oftentimes what draws me to an exhibit (particularly art museums) is a particular piece or an artist whose works I love. I agree with Chan & Cope that collection strategies should serve a different purpose today; there should be a real intention behind acquisition that goes beyond prestige or hoarding mentality.

However, I’m not quite convinced that a “post-objects curatorial practice” is the natural solution. And is it really “post-objects” if a museum instead exhibits the contextual documents surrounding a systems design? The piece that was missing for me was, does a “post-objects” approach reflect the needs of a museum’s community? Collecting a contemporary, provocative item (digital or analog) might generate a lot of buzz, but will it mean something to the average museum-goer beyond taking a selfie with said object? Relevance isn’t necessarily about what’s trending in the moment (btw, The Art of Relevance, a book by Nina Simon, is an excellent read that explores this topic in the museum field in depth).

I realize I have more questions than definitive ideas or opinions in this post. Interested to share more thoughts and discussion with everyone in the week to come!

FUNdamentals

During the course of this week’s readings, I kept coming back to the fourth axiom in the intro to our text – “Nothing has been preserved. There are only things being preserved.” (p. 5) The title to our text is Theory and Craft of Digital Preservation, and in Chapter 4, Owens fleshes out more what he means by “craft.” Digital preservation is an ongoing process. The frameworks in our readings can be used as tools in directing this approach, but there isn’t a manual per se that explains what to do. It’s about asking the right questions to develop something that is sustainable for your situation. The word “craft” is evocative of an artisan, a glass blower for example – the end products might look similar but each piece is unique.

Image from Wikimedia Commons; CC BY-SA

In “The Emperor’s New Repository,” Chudnov suggests not stressing too much at the beginning about doing everything just right. You can start small and build, change tools, change how you think about the content, and draw on user feedback to guide your changes. My sense was that he didn’t want people to be paralyzed by the possibility of having to scrap or redo the work because they didn’t make the right decision at the beginning. In fairness, when you’re spending someone else’s money, it’s a difficult prospect to have to explain that this is all part of the process. As we discussed last week, we can can’t assume that everyone understands what’s involved in archiving or digital preservation.

Trying and learning from mistakes is still better than nothing though. Our readings last week presented an urgency to this. Setting aside how well you feel professional archivists have this in hand, there’s a lot out there and archivists can’t preserve it all or even anticipate everything that is worth preserving; so if you think something important is slipping through the cracks, you might be the last recourse.

 

Practitioners don’t have to start from scratch

We can make informed decisions based on traditional archival and preservation practices. People are sharing their experiences and putting their heads together to try to make this attainable even if you don’t do this for a living. Reading, sharing, and talking it out is how we develop the craft.

Oh and another area where archivists can help – documenting decisions. What you did, why you did it, what worked, what didn’t. Owens writes, “Preservation happens because of institutions.…individuals alone can’t do digital preservation.” (p. 78) If an individual tries to preserve a collection alone and doesn’t pass it on to anyone, then it’s not being preserved anymore. When those responsibilities get passed on, either to or within an organization, documentation gives us a context and affects future decision-making. The most frustrating aspects of jobs I’ve had in the past all point to a lack of context to make informed decisions. It means taking the time to ask something I could have figured out myself, or trying something that someone else has already determined doesn’t work, or following the wrong path based on a misunderstanding.

 

Digital preservation frameworks

This week’s readings focused on two frameworks – Levels of Digital Preservation (LoDP) and the Open Archival Information System (OAIS). LoDP was developed by the National Digital Stewardship Alliance. It takes five concerns of digital preservation (storage and geographic location, file fixity and data integrity, information security, metadata, file formats) and provides recommendations for the types of activities at each level. Level 1 pertains to what are considered the most urgent activities and serves as a prerequisite to the later levels.

In keeping with the idea of digital preservation as a craft, LoDP is a work in progress. An update is underway. LoDP is conceptual. The authors discourage thinking of a preservation program as being at one level. The different concerns listed above can fall at different levels or only partially meet the recommendations at a particular level.

OAIS is more specific than LoDP and deals with repository design from submission to dissemination. Despite the fact that OAIS is now an ISO standard, the report written by the Digital Preservation Coalition still describes it more as a concept than a standard (pp. 3, 31). This means that there’s no official way to tell if a repository is “OAIS-compliant.” (Side note: I didn’t go directly to the source for this because the ISO standard costs $200.)

 

Theory versus practice

The fact that these frameworks are conceptual didn’t stop me from wanting to harness this theory to something a little more concrete – to think about what Level 3 might look like or how realistic Level 4 is. I work on a digitization program so I have some idea of what goes into the repository, but I don’t work on ingest or design user interfaces. As a student I’ve accessed digital repositories so I understand what I might want, at least right now, from a repository as a user. In this way, I could think about my own experiences to put some shape to the theory. I suppose there’s a danger of oversimplifying when doing that and we’ve seen examples of this in our readings.

Chudnov warns not to “fetishize” software because what works for one situation won’t necessarily work for another, but examples help. One of our optional readings described  DSpace. Even though “[a] repository is not a piece of software” (Owens, p. 4), the author describes it as a digital repository built from open-source software. Still I appreciated the example because it presented specific scenarios for how you could use the software. The article doesn’t mention OAIS, but the description seemed similar to that model. In googling “DSPACE” and “OAIS-compliant” however, I came across this quote from a white paper:

“Digital  preservation is a  process,  not a  technology.  I’m not  quite  sure where  claims  that DSpace  is  ‘OAIS compliant’ came  from, but since OAIS talks about processes, communities and responsibilities,  DSpace itself  can  no more  be  ‘OAIS compliant’  than a set of pliers can be a certified electrician.” (p. 18)

 

Conclusion

This week’s readings opened us up to the idea that we have a lot of choices in our digital preservation activities, but I wondered if the theoretical basis for some of this would be off-putting for those who have no previous experience. I found LoDP understandable, but still question if readers would shut down at the mention of “fixity” or “metadata.” One thing I like about LoDP is that it uses the language that you need to know to make those decisions.

I know we come from different backgrounds and that we all have different levels of experience with digital preservation. I’m curious to read your impressions and what you responded to.

Tina

Introduction and Thoughts on Readings

Hi everyone. I’m Maya Reid. I’m starting my second year in the MLIS program and my specialization is Archives and Digital Curation. Preservation is an important aspect of digital curation, so I’m taking this class to learn more about the work and concept.

I’m currently working on my field study at the NASA Goddard Library as a Digital Collections Intern and am also a student assistant in the Special Collections and University Archives on campus. I worked previously as a digitization assistant in the Digitization Center in Hornbake Library. I have experience with digitizing analog materials and screening for file corruption, but still have a lot to learn about preservation.

Professor Owen’s statement that “Preservation is the result of ongoing work of people and commitments of resources. The work is never finished” stood out to me. In my other class this week, Curation in Cultural Institutions, we discussed how digital objects are even more prone to alteration and corruption than analog materials. I had not considered the amount of labor and resources that goes into preserving digital collections. Working in the Special Collections, I suppose I had an idea that a box of records could be shelved in a climate-controlled room and be good to go as far as preservation is concerned (and I realize preservation of analog materials is more detailed than that), so I think I had the same perception for digital objects. That there could be some digital “climate-controlled room” objects could be placed to be properly preserved. I appreciate the wake-up call that digital preservation is more laborious than that.

Lyons brings up a good point when he states archivists are “hidden in the public narrative.” Whenever I tell someone about my studies, the majority of people ask me what an archive or archivist is. It was grating but unsurprising to read about Cerf’s perspective, wherein he seems to think no one preserves digital artifacts. From prior coursework I have garnered that the answer to this dilemma is always “advocacy.” GLAM institutions can empower themselves by advocating to their greater institutions, their user group, wider community, and so on. Thus it was disheartening for me to read Tansey’s contrasting point of view that advocacy doesn’t always work, especially in more bureaucratic institutions. Tansey describes the “cycle of poverty that afflicts archives” and how lack of funding can lead to a digital dark age. I had a lot of faith in the power of advocacy due to previous classes, but now have my doubts about its effectiveness and what the reality of budgetary constraints looks like. I’m interested in what other classmates have to say about Tansey’s essay.