We started out talking about the theory, but this week’s readings really got into the nitty gritty of how to initiate and sustain digital preservation projects.

Where do I start? What’s involved?

Owens’ chapter points out three major elements of preservation required to save “the bits”. We need to create and maintain multiple copies of digital objects, use fixity checks to ensure you can account for all the information in those digital objects, and ensure the security of those digital objects so that they can’t be corrupted or deleted by some unsavory sort.

These are our basic elements, and the folks from POWRR (From Theory to Action: Good Enough Digital Preservation) want to emphasize that when you’re starting out, it’s best to focus on the basics. It’s easy to get overwhelmed by all the technical aspects of digital preservation, but it’s really an incremental process that you can work up to. Before maintaining those multiple copies, fixity checks, and working on security, it’s a good idea to take stock of your institution’s strengths and abilities,  and consider what kind of resources it can devote to digital preservation, start planning ingest workflows, and creating a basic inventory of your collection.

Owens reiterates this last suggestion: start out by creating an inventory of what you have, and start thinking about policies and practices that will help you manage that collection. (Make sure you keep that inventory up to date!)

So actually, how do I start “doing” digital preservation?

You’ve got a sick inventory now, and we can get started on preserving those bits. Owens suggests running a fixity check to take stock of each digital object at the start, and then moving on to making copies. Both Owens and the NDSA indicate that it’s generally best practice to keep at least 2-3 copies, and to store those copies each in different ways and locations, so that each copy faces a different type of disaster risk. How do you do that though? Actually, a lot of institutions collaboratively form “consortia” like MetaArchive and Data-Pass where “one institution [hosts] a staging server, to which the other partner institutions transfer their digital content.” (From Theory to Action) So multiple organizations can help each other out with storing their digital content. Sweet. Let’s be friends. (You send them some copies.)

Oh, but that first fixity check wasn’t enough. You’re not done now. You just made a bunch of copies of your files and transferred them to your bud to store! Run another fixity check (maybe using a sweet cryptographic hash or checksum) to make sure that all your files got copied correctly. Any time you make new copies, or transfer those copies you gotta check those files to see if they’re still identical to the originals! Also– it’s probably a good idea to run some fixity checks periodically to make sure everything’s chill.

But say— what if everything’s not chill?

You’ve got some numbers that just aren’t adding up, could it be that some of your files got corrupted? You gotta fix those. Using the results of your fixity check you can identify which files aren’t totally correct and try to make new, better copies, or you can attempt to repair the file. “This is done by replacing corrupted data with the distributed, replicated, and verified data held at “mirroring” partner repositories in multi-institutional, collaborative distributed networks. The consortia groups MetaArchive and Data-PASS use LOCKSS (“Lots of Copies Keep Stuff Safe”) for this kind of distributed fixity checking and repair. ” (NDSA Storage Report)

So remember those copies you sent to your friends? Because you have multiple copies of your stuff, you can use those to help fix all your broken ones! Sweet, geographic redundancy really pays off.

Am I done?

NO!

We still gotta think about security and access!

Security could be its own whole thing, but really this involves determining who has access to your files and controlling what they can do with those files. Keep logs of who accessed files, and what they did to those files. If you don’t have any fancy database software to keep track and control access to those original files, Owens suggests you could simply keep those files on a hard drive in a locked drawer and there you go– no one’s deleting that stuff.

And access is the whole reason we’re doing any of this! How will you provide people with those files? Will anything be restricted? Certainly, some of your digital files will have information that shouldn’t just be publicly accessible, or maybe your donor doesn’t want anyone to read those files for a while. If that’s the case, it may be a good idea to stick that into a dark archive, which will preserve your stuff, but no one will be able to read it. Or, if your stuff is less sensitive, maybe it could just be made available online. Your organization should probably develop policies specifically for security and access to your collections.

So we’ve covered maintaining multiple copies, running fixity checks, and security! I think we’re good.

Questions I guess?

So I know I really glossed over these processes, but I wanted to talk more about the preservation of specific file formats, which I think both Owens and the “Party like it’s 1999” reading about emulation seemed to touch on. How do you determine the feasibility of saving a particular file? There are hundreds of different types of proprietary file formats that have come and gone over the years, but how do you determine if you should migrate a file to a more common, modern format, or if it’s necessary to emulate an environment that enables you to experience the file as it was originally intended?

Are there risks of losing some of the affordances of a specific format when migrating to a new file format? If it’s possible to preserve an original file bit-for-bit, would it be more authentic to keep it as is and provide access through an emulated environment? or are we less concerned with the authentic, artifactual experience of that file and more concerned with the information?

I know that the answer to these questions is more so “it depends” or “it’s contextual”, but I more want to see people’s personal thoughts on emulation. I know it’s a complex process to create emulators, but once we are able to successfully emulate past operating systems, can you see emulation becoming “best practice” for digital preservation and access?

Digital Preservation from Both Sides

I haven’t done much digital preservation. I’ve been in the bit trenches. The work I’ve done is to digital preservation as community sandbagging is to the Army Corps of Engineers. I need to reword my resume.

Bit preservation is our most urgent set of tasks; managing multiple copies, managing and using fixity information and ensuring our data is secure. All these activities are directed at long term usability, but digital preservation is broader. It’s concerned with the future viability of file formats and software; with future renderability. Having done the basic bit level work, we might consider migrating files en masse to more sustainable formats or “leave the bits alone” by emulating or virtualizing earlier computing environments.

Our digital preservation decisions will not be identical across institutions. Best practices are like recipes; they’re frameworks. “Approaches to copies and formats should fit the overall contours of your institution’s collecting mission and resources. This is about the digital infrastructure you establish to enable the more tailored work that is done around individual collections” (Owens, p. 105). A video art or game collecting museum might attempt emulation to preserve as much of the experience of the work as possible; seeing the artifactual aspect as more intrinsic than the informational to its mission. But if those bits aren’t safe, these considerations will never arise. Sandbag first.

Much of our study this week is focused on fixity and storage. In my bit preservation experience, fixity checks can get lost in the mix. 80% of NDSA member organizations reported that they use some sort of fixity checking. I think that members and non-members alike have mostly heard the urgent call to action to get the bits off the floor and maybe while they’re at it, make multiple copies. Then, of course, they have to store them somewhere, so they’re forced to make storage decisions. But I’m not so sure that organizations often understand the necessity of maintaining bit-level integrity and how they’d go about it. Then again, they could be taking it for granted that their storage solution is a fixity solution as well. And it might be, but I think that’s something of an afterthought.

We’re going to talk about access later in this course, but my impression has been that access, as a buzzword, can cloud our perspective on preservation. I’m concerned that when that scan hits the web, it’s tempting to feel that our preservation work is done. We’d never take that approach to analog media. We wouldn’t hang a painting in a gallery, throw our hammer in the truck and head home. Well we might, but we’d be invested in maintaining the integrity of that work for future shows. Accessibility now isn’t access. This might be obvious to us, but just this week I was speaking to someone about born digital material and they asked me if I was also interested in endangered media. It’s still a hard sell.

That brings me to one final thought I had while reading the case studies in the POWRR group white paper. The experience at Chicago State University was illuminating. “The defining moment when several library staff members recognized the importance of digital preservation activities occurred when they realized that grant activities digitizing library collections included no provision for storage or preservation” (p. 21). That might be because the grants themselves don’t allow for appropriate storage solutions. I was investigating grants for an in-house digitization project I was working on and had determined that cloud storage was the best, and most affordable, offsite storage solution for my organization. Once I found a grant that wouldn’t exclude in-house digitization projects, I realized it had seemingly arbitrary restrictions excluding “subscription-based” or simply, “cloud” storage.

Our reading this week helped me recontextualize my own work as a novice in this field. I wonder if anyone else had a similar experience.

I Find Your Lack of Faith… Well, Prudent.

Join the Dark Side

I couldn’t help feeling that the underlying subtext to this week’s readings was an embracing of distrust and uncertainty. Distrust in physical media, file formats, third-party cloud services… even the computer’s ability to do what many take for granted: create an exact copy. Uncertainty manifested itself in issues such as the future adoption levels of formats, the continuity of tools, and even the motives and competency of our own staff. Rather than being dismayed by this somewhat dour outlook, I found it to be a heartening confirmation of my belief that pessimism can indeed be used as a force for good.

I guess I’m weird like that.

Owens Chapter 6 kicked off this theme of distrust with the recurring phrase “hedge your bets.” This one phrase was applied repeatedly to the first of three core processes for bit preservation: 1) creating and managing multiple copies, 2) managing and using fixity information, and 3) establishing information security protocols to keep accidents and attacks from occurring at the hands of staff or users. In the context of the first process – managing multiple copies – the “hedge your bets” approach necessarily results in a proliferation of file types, storage media, and a geographically sprawling network of storage locations. The point of this push for diversity being that no one disaster, bad actor, or system failure is likely to wipe out all copies.

The distrust also extended to seemingly mundane processes like the act of transferring data, and minimizing the number of people capable of accessing the objects. But the issue that interested me most was the emphasis to not put too much faith in any one tool. As Owens notes, vendor lock-in is a real concern that necessitates forming an exit strategy before acquisition is even complete (p. 115). I have seen this happen in my own career and know how dangerous it can be. Indeed, it was one of the catalysts that inspired me to seek this degree.

The theme of distrust continued in the NDSA Storage Report. This survey found that the majority of NDSA members’ desire for control over their own holdings tended to dissuade them from embracing commercial cloud services. The perception (or reality) of greater control of their data caused the majority to prefer joining institutional cooperatives where each member shares their data with a member organization in order to establish geographic diversity in their storage plan. Of particular concern among the group was the lack of transparency in the fixity checks performed behind the scenes by commercial cloud services. There was no proof offered that hashes provided at the time of access weren’t simply being replayed from the time of upload and thus providing a false sense of safety.

Again, I was struck by how issues of uncertainty and distrust could be harnessed to realize positive and productive ends. Perhaps I’ve finally found my people?

A New Hope

Not all the readings were gloom and doom. “From Theory to Action” in particular revisited many of the themes we’ve touched on in previous weeks emphasizing a simple and incremental approach to beginning a preservation program. As the subtitle of the piece indicates, they emphasize embracing the concept of “good enough,” and then building on it. Digital preservation is not a binary status requiring that an institution either be moving at light speed or standing completely still. Institutions should focus on near term goals that can immediately improve preservation, however small and simple they might be. But probably the biggest takeaway from this piece was the degree of confidence and self-efficacy the POWRR group members instilled in each other simply by choosing to assess and tackle their collective issues in a cooperative fashion. The creation of communities of practice is particularly effective at helping the entire group identify solutions to common problems.

Discussion questions

In Chapter 6, Owens notes the importance of differentiating working files from files that have become static and thus ready for long-term storage. I have found that in practice this is more difficult than it would seem, particularly for video. In our multimedia department the concept of finality has been elusive at best, to the point that our manager gets angry if a file is actually labeled “final”, because it becomes untrue almost the moment it’s saved. Our company’s trend of editing-by-committee basically guarantees at least a few more rounds edits no matter what. Even an extended passage of time is no indication of finality. Customers will come back and ask for changes to a video many years after the first version, usually because they want to remove a staff member that has left, or change someone’s job title. Saving an editable version requires saving the non-linear editor project files and all associated source files. This is the most complicated version to save and the first to become obsolete. So, my question for the class is how should we as archivists respond to such a dynamic situation, where the concept of finality is tenuous and fluid?

And lastly, I didn’t discuss Dietrich’s “Emulation for Everyone” above because it seemed like something of an outlier relative to the others. I find myself fascinated with emulation as a practice, but wondering about its feasibility for all but the most extreme cases. For example, it was mentioned at the end of this piece that researchers looking at the Jeremy Blake Papers actually preferred using the modern OS and were really primarily interested in the informational value of the objects. Authenticity and fidelity were less of a priority. This seems like a lot of effort to have gone to for an experience that no one really needed. So my question for the class is, where do you see emulation fitting into the list of preservation options? Should it require a more rigorous examination of preservation intent to make sure the considerable extra effort is justified?

I’m also curious to what extent an emulated system becomes a digital object in itself, which then becomes exponentially more complicated to preserve? At what point do we decide that these towering platform stacks held together with scotch tape and shoe goo are no longer worth the expense of maintaining?

 

I formally apologize* for all Star Wars references.

*Not really.

Before the “How?”of Digital Preservation Are the Questions of “What?” and “Why?”

Before taking this class, I thought of digital preservation only as a skill set I needed to learn. I wanted to know–how do I accomplish this thing that so many job ads expect me to be able to do? I realize now that by focusing exclusively on the “how?” of digital preservation, I had skipped over the important steps of first determining the “what?” and the “why?”. When these questions are taken into consideration, it becomes clear that digital preservation isn’t just a checklist of methods or tools students need to learn ,  it’s also a set of theories about what preservation means. The readings this week stressed that before creating a plan for preservation, it is important to first ask–what do we want to preserve? And why do we want to preserve it (what aspects of it are the worth saving)? Specific preservation goals will make it so that what we deem most valuable is saved, and (to paraphrase Professor Owens) if compromises must be made for practical reasons, having clear intentions will at least ensure that we are making those decisions deliberately.

Owens writes that the “future of digital preservation does not lie in the maintenance of old computing systems and hardware…[but in] making copies and moving those copies forward to new systems.” This may, upon first reading, seem simple enough, and yet it raises a multitude of questions about what it means to say something is the “same” as something else or that it is an “authentic” copy. Increasingly, archivists are coming to see that preservation judgments are (in the words of Herb Stovel) “necessarily relative and contextual.”

Geoffrey Yeo interrogates the idea of “sameness” in his article, arguing that what makes something the “same” as something else depends upon our definition of “sameness,” describing scenarios that roughly correlate to the ideas of the artifactual and informational frameworks we first encountered in Professor Owen’s book. As the title “Nothing is the same as something else” suggests, however, Yeo is ultimately skeptical of any claim to “sameness” with copies, because “attempts to determine the significant characteristics of records are problematic, not least because judgements about significance will vary from one user community to another.”

I was particularly interested in the comparisons that Yeo draws between historical manuscripts and digital objects. Yeo suggests that scholars may one day scrutinize digital objects in the same way that they now examine handwritten manuscripts for clues about the author’s intent in their handwriting, the spacing of the words, or smudges on the paper. We must, therefore, be mindful of what gets preserved with digital copies since,  according to Yeo, there is no detail so inconsequential that we can assume it will never be of value to some researcher. He also proposes that we may one day fetishize old forms of media like CD-ROMS or floppy disks in the same way we do other objects in museums or special collections. For this reason, Yeo proposes saving both the original and a copy (or copies) when possible, but if not, we need to at  least give careful consideration to who may be using the digital objects in the future and what their needs or expectations will be.

In a way that parallels Yeo’s questioning of “sameness,” Edward M. Bruner uses the example of Lincoln’s New Salem, a historic site in Illinois, to examine different definitions of “authenticity.” He proposes four main types–verisimilitude (the appearance of authenticity), genuineness (historically accurate authenticity), originality (the original and not a copy), and authority (an authorized replica). In New Salem, Bruner sees a mixture of the different types, identifying aspects that correspond to each of the four markers of authenticity. He also notices that the desire for authenticity can be very idiosyncratic and selective, pointing out all the different ways in which New Salem has been deliberately made inauthentic (such as paving the roads or adding bathrooms or gutters) and no one seems to give these details much notice. Bruner argues for transcending the inauthentic/authentic dichotomy and recognizing that tourists are not really there as judges of authenticity, but to “construct a past that is meaningful to them and that relates to their lives and experiences.” The lesson that a digital preservationist might take from Bruner’s article is that what makes a recreation or a copy meaningful is often highly personal and will vary from person to person. Unlike Yeo’s article, which emphasized that importance of details, Bruner seems to suggest that in some instances the details aren’t as important as recreating an experience that people can relate to. This reminded me of video game simulations–getting the game’s exact code correct may be less important than other, more personal factors that people associate with the game.

When it comes to what makes something the “same” or “authentic,” both Yeo and Bruner would say that it depends on the person. This connects to a second theme of this week’s readings–the importance of participatory or community-based archiving, especially for marginalized communities. While it is generally acknowledged that archives need to become more diverse in their representation, Katie Shilton and Ramesh Srinivasan write that archives “have appropriated the histories of marginalized communities, creating archives about rather than of the communities,” which can create distorted narratives. Shilton and Srinivasan propose changing the way archivists acquire and process new collections so that they are involving community members throughout the process of appraisal, arrangement, and description, ensuring that the archives represent marginalized communities the way that they want to be understood.

These ideas have become especially relevant recently in discussions of how to archive Black Lives Matters and other protest movements. Jarrett M. Drake argues that participatory archiving may not be enough, and that perhaps traditional archivists should avoid archiving the movement altogether and instead allow independent, community-based archives to do the job. Drake does, however, offer advice to traditional archives who still want to be involved. He tells traditional archivists to first look at their existing holdings to “see whether or not black lives matter there,” and to “confront [their own] complicity in white supremacy, patriarchy, and other structural inequalities that the movement is seeking to eradicate.” Moreover, they need to build trust among the “people, communities, and organizations around whose lives the movement is centered, a trust they should pursue not under the guise of collection development but under the practice of allyship.”

I’m curious–what did you think of Yeo’s and Bruner’s discussions of “sameness” and “authenticity”? Do you forsee a future in which digital objects are scrutinized for minute details in the same way manuscripts are today? What about the ideas presented by Drake–do you think that traditional archivists should have a role in documenting marginalized communities? If so, what steps do you think they should take to ensure they are doing so responsibly?

 

 

 

 

Preservation intent without the racism

Preservation intent

This week’s readings all build off the concept of preservation intent. In Chapter 5, Owens raises the two key questions of preservation intent: What about the object do you want to preserve? What do you need to do to preserve this aspect of the object? These questions should not be asked just at the beginning of the process, but continually during the processing of the content.

Some files will not be kept or preserved after considering preservation intent. The Rose Manuscript, Archives, and Rare Book Library of Emory University opted to remove such unnecessary files after acquiring Salman Rushdie’s laptop. After realizing preservation intent, Owens notes that archivists may choose not to preserve the object itself, but documentations of the object’s use. This was the case with Nicole Contaxis of the National Library of Medicine choosing to preserve a how-to on using the Grateful Med database, rather than the entire database itself. Owens also uses the example of preserving a screenshot for the Form Art website, rather than an emulation. The National Library of Australia (NLA) opted to use this preservation method for their PANDORA web archive as well.

This week’s readings also introduced the concept of significant properties, which are the aspects of digital objects that must be preserved if the object is to have continued significance. Webb, Pearsen, and Koerbin note that this concept can be more hindering than helpful for approaches to digital preservation. The authors assert that preservation intent should be declared before determining what the significant properties are, meaning the significant properties would be subjective rather than objective and universal. Yeo affirms this notion that there are no objective measures for value. There is no single method for digital preservation and preservation approaches should depend on the situation.

Archives and marginalized communities

Community participation is important for the preservation of objects from marginalized communities. If archival repositories intend to preserve the events and movements surrounding marginalized communities, such participation should be considered when creating the preservation intent.

Shilton, Srinivasan, Jules, and Drake note the foundation of white supremacy instilled in traditional archives – archival processing has historically been done without the input of marginalized communities, leading to finding aids that lack proper context and collections without diverse perspectives.

Shilton and Srinivasan promote participatory archival processing, which means seeking and listening to input from the actual community creators of the objects. Having “narrative and thick descriptions” from the community lead to contextual knowledge for the archival collection. Shilton and Srinivasan refer to these as empowered narratives, where the community is no longer being spoken for by archivists.

Jules addresses the significance of social media as it relates to protesting and causes. Social media helps promote awareness, allows news to spread faster, and helps movements grow. The preservation of social media is an obvious choice if an archive is interested in such topics, but it also leads to ethical and legal issues. Jules raises the concerns of protecting people in the social media collection, such as from repercussions from the police, and potentially being sued by the social media platform.

Drake promotes building allyship with marginalized communities should archives want to preserve their memory. Allyship involves genuinely wanting to learn about the lived experience of people in these communities and aid them, not just using them for collection development. Drake suggests that archivists help build up community archives and use their existing collections “to host dialogues, programs, and exhibits” around the issues faced by the community.

Affordances of media

In Chapter 3, Owens asserts that we can easily make quality informational copies with computing, but that artifactual qualities of these objects are lost. This is due to the platform layers involved with digital objects, à la the cake analogy used last week by Margaret Rose. Owens provides the example of the Rent script, where only the final version was visible in Word 5.1, while Jonathon Larson’s edits were visible in other text editors. According to Owens, the digital preservation field has essentially given up on preserving these additional layers and artificial qualities.

With a focus on informational preservation and ease of use, Arms and Fleischhauer detail the sustainability, quality, and functionality factors that the Library of Congress considers when selecting digital formats for preservation. The 7 sustainability factors are disclosure (documentation of the format), adoption (how widely used it is), transparency, self-documentation, external dependencies and how much work it will be to preserve them, impact of patents, and technical protection mechanisms like encryption.

The format selection is also dependent on who the primary audience is and how will they use the object. This determines what quality and functionality factors are given priority when choosing a digital format. The authors provide an example of the factors for the preservation of a still image: normal rendering (on-screen viewing and printing), clarity (high resolution), color maintenance, support for graphic effects and typography (filters, shadows, etc.), and functionality beyond normal rendering (layers and 3-D modeling).

Arms and Fleischhauer’s article offers technical insight on the outcome of considering media affordance and preservation intent. I found the article more difficult to understand but appreciated its technicality.

The media affordance available to the NLA led to hurdles with their web archiving. They deal with incorrect renderings due to their software, difficulty in preserving websites with many file types, unpredictable issues during batch preservation actions due to the idiosyncratic structures of websites, and some access issues due to file obsolescence. For the NLA, preserving the website “content, connections, and context” are of primary importance, while the preservation of the site is secondary. The master copy is preserved to the bit level, but the display copy, which undergoes preservation actions such as migration for long-term access, is of greater importance. The snapshots that the NLA preserve also only retain limited functionality of the original websites.

Discussion Questions

Have you noticed a lack of diversity or evidence of white supremacy in collections or records that you have worked with? How do you intend to address these issues when working as an information professional?

How can information professionals effectively assist digital community archives, when digital preservation is permeated with quick obsolescence and continual need for migration, documentation, auditing, and so on?

When creating their preservation intent, the National Library of Australia digital preservation team consider what “adequate access” for the digital objects are and how long that access needs to be maintained. This is a new concept for me because I generally think “forever” or “as long as possible.” I suppose an object only needs to be kept as long as it’s useful, but how do we determine what will be useful 25, 50, or 100 years from now? How do we know when to deaccession a digital object?