We started out talking about the theory, but this week’s readings really got into the nitty gritty of how to initiate and sustain digital preservation projects.

Where do I start? What’s involved?

Owens’ chapter points out three major elements of preservation required to save “the bits”. We need to create and maintain multiple copies of digital objects, use fixity checks to ensure you can account for all the information in those digital objects, and ensure the security of those digital objects so that they can’t be corrupted or deleted by some unsavory sort.

These are our basic elements, and the folks from POWRR (From Theory to Action: Good Enough Digital Preservation) want to emphasize that when you’re starting out, it’s best to focus on the basics. It’s easy to get overwhelmed by all the technical aspects of digital preservation, but it’s really an incremental process that you can work up to. Before maintaining those multiple copies, fixity checks, and working on security, it’s a good idea to take stock of your institution’s strengths and abilities,  and consider what kind of resources it can devote to digital preservation, start planning ingest workflows, and creating a basic inventory of your collection.

Owens reiterates this last suggestion: start out by creating an inventory of what you have, and start thinking about policies and practices that will help you manage that collection. (Make sure you keep that inventory up to date!)

So actually, how do I start “doing” digital preservation?

You’ve got a sick inventory now, and we can get started on preserving those bits. Owens suggests running a fixity check to take stock of each digital object at the start, and then moving on to making copies. Both Owens and the NDSA indicate that it’s generally best practice to keep at least 2-3 copies, and to store those copies each in different ways and locations, so that each copy faces a different type of disaster risk. How do you do that though? Actually, a lot of institutions collaboratively form “consortia” like MetaArchive and Data-Pass where “one institution [hosts] a staging server, to which the other partner institutions transfer their digital content.” (From Theory to Action) So multiple organizations can help each other out with storing their digital content. Sweet. Let’s be friends. (You send them some copies.)

Oh, but that first fixity check wasn’t enough. You’re not done now. You just made a bunch of copies of your files and transferred them to your bud to store! Run another fixity check (maybe using a sweet cryptographic hash or checksum) to make sure that all your files got copied correctly. Any time you make new copies, or transfer those copies you gotta check those files to see if they’re still identical to the originals! Also– it’s probably a good idea to run some fixity checks periodically to make sure everything’s chill.

But say— what if everything’s not chill?

You’ve got some numbers that just aren’t adding up, could it be that some of your files got corrupted? You gotta fix those. Using the results of your fixity check you can identify which files aren’t totally correct and try to make new, better copies, or you can attempt to repair the file. “This is done by replacing corrupted data with the distributed, replicated, and verified data held at “mirroring” partner repositories in multi-institutional, collaborative distributed networks. The consortia groups MetaArchive and Data-PASS use LOCKSS (“Lots of Copies Keep Stuff Safe”) for this kind of distributed fixity checking and repair. ” (NDSA Storage Report)

So remember those copies you sent to your friends? Because you have multiple copies of your stuff, you can use those to help fix all your broken ones! Sweet, geographic redundancy really pays off.

Am I done?

NO!

We still gotta think about security and access!

Security could be its own whole thing, but really this involves determining who has access to your files and controlling what they can do with those files. Keep logs of who accessed files, and what they did to those files. If you don’t have any fancy database software to keep track and control access to those original files, Owens suggests you could simply keep those files on a hard drive in a locked drawer and there you go– no one’s deleting that stuff.

And access is the whole reason we’re doing any of this! How will you provide people with those files? Will anything be restricted? Certainly, some of your digital files will have information that shouldn’t just be publicly accessible, or maybe your donor doesn’t want anyone to read those files for a while. If that’s the case, it may be a good idea to stick that into a dark archive, which will preserve your stuff, but no one will be able to read it. Or, if your stuff is less sensitive, maybe it could just be made available online. Your organization should probably develop policies specifically for security and access to your collections.

So we’ve covered maintaining multiple copies, running fixity checks, and security! I think we’re good.

Questions I guess?

So I know I really glossed over these processes, but I wanted to talk more about the preservation of specific file formats, which I think both Owens and the “Party like it’s 1999” reading about emulation seemed to touch on. How do you determine the feasibility of saving a particular file? There are hundreds of different types of proprietary file formats that have come and gone over the years, but how do you determine if you should migrate a file to a more common, modern format, or if it’s necessary to emulate an environment that enables you to experience the file as it was originally intended?

Are there risks of losing some of the affordances of a specific format when migrating to a new file format? If it’s possible to preserve an original file bit-for-bit, would it be more authentic to keep it as is and provide access through an emulated environment? or are we less concerned with the authentic, artifactual experience of that file and more concerned with the information?

I know that the answer to these questions is more so “it depends” or “it’s contextual”, but I more want to see people’s personal thoughts on emulation. I know it’s a complex process to create emulators, but once we are able to successfully emulate past operating systems, can you see emulation becoming “best practice” for digital preservation and access?

Digital Preservation from Both Sides

I haven’t done much digital preservation. I’ve been in the bit trenches. The work I’ve done is to digital preservation as community sandbagging is to the Army Corps of Engineers. I need to reword my resume.

Bit preservation is our most urgent set of tasks; managing multiple copies, managing and using fixity information and ensuring our data is secure. All these activities are directed at long term usability, but digital preservation is broader. It’s concerned with the future viability of file formats and software; with future renderability. Having done the basic bit level work, we might consider migrating files en masse to more sustainable formats or “leave the bits alone” by emulating or virtualizing earlier computing environments.

Our digital preservation decisions will not be identical across institutions. Best practices are like recipes; they’re frameworks. “Approaches to copies and formats should fit the overall contours of your institution’s collecting mission and resources. This is about the digital infrastructure you establish to enable the more tailored work that is done around individual collections” (Owens, p. 105). A video art or game collecting museum might attempt emulation to preserve as much of the experience of the work as possible; seeing the artifactual aspect as more intrinsic than the informational to its mission. But if those bits aren’t safe, these considerations will never arise. Sandbag first.

Much of our study this week is focused on fixity and storage. In my bit preservation experience, fixity checks can get lost in the mix. 80% of NDSA member organizations reported that they use some sort of fixity checking. I think that members and non-members alike have mostly heard the urgent call to action to get the bits off the floor and maybe while they’re at it, make multiple copies. Then, of course, they have to store them somewhere, so they’re forced to make storage decisions. But I’m not so sure that organizations often understand the necessity of maintaining bit-level integrity and how they’d go about it. Then again, they could be taking it for granted that their storage solution is a fixity solution as well. And it might be, but I think that’s something of an afterthought.

We’re going to talk about access later in this course, but my impression has been that access, as a buzzword, can cloud our perspective on preservation. I’m concerned that when that scan hits the web, it’s tempting to feel that our preservation work is done. We’d never take that approach to analog media. We wouldn’t hang a painting in a gallery, throw our hammer in the truck and head home. Well we might, but we’d be invested in maintaining the integrity of that work for future shows. Accessibility now isn’t access. This might be obvious to us, but just this week I was speaking to someone about born digital material and they asked me if I was also interested in endangered media. It’s still a hard sell.

That brings me to one final thought I had while reading the case studies in the POWRR group white paper. The experience at Chicago State University was illuminating. “The defining moment when several library staff members recognized the importance of digital preservation activities occurred when they realized that grant activities digitizing library collections included no provision for storage or preservation” (p. 21). That might be because the grants themselves don’t allow for appropriate storage solutions. I was investigating grants for an in-house digitization project I was working on and had determined that cloud storage was the best, and most affordable, offsite storage solution for my organization. Once I found a grant that wouldn’t exclude in-house digitization projects, I realized it had seemingly arbitrary restrictions excluding “subscription-based” or simply, “cloud” storage.

Our reading this week helped me recontextualize my own work as a novice in this field. I wonder if anyone else had a similar experience.

I Find Your Lack of Faith… Well, Prudent.

Join the Dark Side

I couldn’t help feeling that the underlying subtext to this week’s readings was an embracing of distrust and uncertainty. Distrust in physical media, file formats, third-party cloud services… even the computer’s ability to do what many take for granted: create an exact copy. Uncertainty manifested itself in issues such as the future adoption levels of formats, the continuity of tools, and even the motives and competency of our own staff. Rather than being dismayed by this somewhat dour outlook, I found it to be a heartening confirmation of my belief that pessimism can indeed be used as a force for good.

I guess I’m weird like that.

Owens Chapter 6 kicked off this theme of distrust with the recurring phrase “hedge your bets.” This one phrase was applied repeatedly to the first of three core processes for bit preservation: 1) creating and managing multiple copies, 2) managing and using fixity information, and 3) establishing information security protocols to keep accidents and attacks from occurring at the hands of staff or users. In the context of the first process – managing multiple copies – the “hedge your bets” approach necessarily results in a proliferation of file types, storage media, and a geographically sprawling network of storage locations. The point of this push for diversity being that no one disaster, bad actor, or system failure is likely to wipe out all copies.

The distrust also extended to seemingly mundane processes like the act of transferring data, and minimizing the number of people capable of accessing the objects. But the issue that interested me most was the emphasis to not put too much faith in any one tool. As Owens notes, vendor lock-in is a real concern that necessitates forming an exit strategy before acquisition is even complete (p. 115). I have seen this happen in my own career and know how dangerous it can be. Indeed, it was one of the catalysts that inspired me to seek this degree.

The theme of distrust continued in the NDSA Storage Report. This survey found that the majority of NDSA members’ desire for control over their own holdings tended to dissuade them from embracing commercial cloud services. The perception (or reality) of greater control of their data caused the majority to prefer joining institutional cooperatives where each member shares their data with a member organization in order to establish geographic diversity in their storage plan. Of particular concern among the group was the lack of transparency in the fixity checks performed behind the scenes by commercial cloud services. There was no proof offered that hashes provided at the time of access weren’t simply being replayed from the time of upload and thus providing a false sense of safety.

Again, I was struck by how issues of uncertainty and distrust could be harnessed to realize positive and productive ends. Perhaps I’ve finally found my people?

A New Hope

Not all the readings were gloom and doom. “From Theory to Action” in particular revisited many of the themes we’ve touched on in previous weeks emphasizing a simple and incremental approach to beginning a preservation program. As the subtitle of the piece indicates, they emphasize embracing the concept of “good enough,” and then building on it. Digital preservation is not a binary status requiring that an institution either be moving at light speed or standing completely still. Institutions should focus on near term goals that can immediately improve preservation, however small and simple they might be. But probably the biggest takeaway from this piece was the degree of confidence and self-efficacy the POWRR group members instilled in each other simply by choosing to assess and tackle their collective issues in a cooperative fashion. The creation of communities of practice is particularly effective at helping the entire group identify solutions to common problems.

Discussion questions

In Chapter 6, Owens notes the importance of differentiating working files from files that have become static and thus ready for long-term storage. I have found that in practice this is more difficult than it would seem, particularly for video. In our multimedia department the concept of finality has been elusive at best, to the point that our manager gets angry if a file is actually labeled “final”, because it becomes untrue almost the moment it’s saved. Our company’s trend of editing-by-committee basically guarantees at least a few more rounds edits no matter what. Even an extended passage of time is no indication of finality. Customers will come back and ask for changes to a video many years after the first version, usually because they want to remove a staff member that has left, or change someone’s job title. Saving an editable version requires saving the non-linear editor project files and all associated source files. This is the most complicated version to save and the first to become obsolete. So, my question for the class is how should we as archivists respond to such a dynamic situation, where the concept of finality is tenuous and fluid?

And lastly, I didn’t discuss Dietrich’s “Emulation for Everyone” above because it seemed like something of an outlier relative to the others. I find myself fascinated with emulation as a practice, but wondering about its feasibility for all but the most extreme cases. For example, it was mentioned at the end of this piece that researchers looking at the Jeremy Blake Papers actually preferred using the modern OS and were really primarily interested in the informational value of the objects. Authenticity and fidelity were less of a priority. This seems like a lot of effort to have gone to for an experience that no one really needed. So my question for the class is, where do you see emulation fitting into the list of preservation options? Should it require a more rigorous examination of preservation intent to make sure the considerable extra effort is justified?

I’m also curious to what extent an emulated system becomes a digital object in itself, which then becomes exponentially more complicated to preserve? At what point do we decide that these towering platform stacks held together with scotch tape and shoe goo are no longer worth the expense of maintaining?

 

I formally apologize* for all Star Wars references.

*Not really.

Before the “How?”of Digital Preservation Are the Questions of “What?” and “Why?”

Before taking this class, I thought of digital preservation only as a skill set I needed to learn. I wanted to know–how do I accomplish this thing that so many job ads expect me to be able to do? I realize now that by focusing exclusively on the “how?” of digital preservation, I had skipped over the important steps of first determining the “what?” and the “why?”. When these questions are taken into consideration, it becomes clear that digital preservation isn’t just a checklist of methods or tools students need to learn ,  it’s also a set of theories about what preservation means. The readings this week stressed that before creating a plan for preservation, it is important to first ask–what do we want to preserve? And why do we want to preserve it (what aspects of it are the worth saving)? Specific preservation goals will make it so that what we deem most valuable is saved, and (to paraphrase Professor Owens) if compromises must be made for practical reasons, having clear intentions will at least ensure that we are making those decisions deliberately.

Owens writes that the “future of digital preservation does not lie in the maintenance of old computing systems and hardware…[but in] making copies and moving those copies forward to new systems.” This may, upon first reading, seem simple enough, and yet it raises a multitude of questions about what it means to say something is the “same” as something else or that it is an “authentic” copy. Increasingly, archivists are coming to see that preservation judgments are (in the words of Herb Stovel) “necessarily relative and contextual.”

Geoffrey Yeo interrogates the idea of “sameness” in his article, arguing that what makes something the “same” as something else depends upon our definition of “sameness,” describing scenarios that roughly correlate to the ideas of the artifactual and informational frameworks we first encountered in Professor Owen’s book. As the title “Nothing is the same as something else” suggests, however, Yeo is ultimately skeptical of any claim to “sameness” with copies, because “attempts to determine the significant characteristics of records are problematic, not least because judgements about significance will vary from one user community to another.”

I was particularly interested in the comparisons that Yeo draws between historical manuscripts and digital objects. Yeo suggests that scholars may one day scrutinize digital objects in the same way that they now examine handwritten manuscripts for clues about the author’s intent in their handwriting, the spacing of the words, or smudges on the paper. We must, therefore, be mindful of what gets preserved with digital copies since,  according to Yeo, there is no detail so inconsequential that we can assume it will never be of value to some researcher. He also proposes that we may one day fetishize old forms of media like CD-ROMS or floppy disks in the same way we do other objects in museums or special collections. For this reason, Yeo proposes saving both the original and a copy (or copies) when possible, but if not, we need to at  least give careful consideration to who may be using the digital objects in the future and what their needs or expectations will be.

In a way that parallels Yeo’s questioning of “sameness,” Edward M. Bruner uses the example of Lincoln’s New Salem, a historic site in Illinois, to examine different definitions of “authenticity.” He proposes four main types–verisimilitude (the appearance of authenticity), genuineness (historically accurate authenticity), originality (the original and not a copy), and authority (an authorized replica). In New Salem, Bruner sees a mixture of the different types, identifying aspects that correspond to each of the four markers of authenticity. He also notices that the desire for authenticity can be very idiosyncratic and selective, pointing out all the different ways in which New Salem has been deliberately made inauthentic (such as paving the roads or adding bathrooms or gutters) and no one seems to give these details much notice. Bruner argues for transcending the inauthentic/authentic dichotomy and recognizing that tourists are not really there as judges of authenticity, but to “construct a past that is meaningful to them and that relates to their lives and experiences.” The lesson that a digital preservationist might take from Bruner’s article is that what makes a recreation or a copy meaningful is often highly personal and will vary from person to person. Unlike Yeo’s article, which emphasized that importance of details, Bruner seems to suggest that in some instances the details aren’t as important as recreating an experience that people can relate to. This reminded me of video game simulations–getting the game’s exact code correct may be less important than other, more personal factors that people associate with the game.

When it comes to what makes something the “same” or “authentic,” both Yeo and Bruner would say that it depends on the person. This connects to a second theme of this week’s readings–the importance of participatory or community-based archiving, especially for marginalized communities. While it is generally acknowledged that archives need to become more diverse in their representation, Katie Shilton and Ramesh Srinivasan write that archives “have appropriated the histories of marginalized communities, creating archives about rather than of the communities,” which can create distorted narratives. Shilton and Srinivasan propose changing the way archivists acquire and process new collections so that they are involving community members throughout the process of appraisal, arrangement, and description, ensuring that the archives represent marginalized communities the way that they want to be understood.

These ideas have become especially relevant recently in discussions of how to archive Black Lives Matters and other protest movements. Jarrett M. Drake argues that participatory archiving may not be enough, and that perhaps traditional archivists should avoid archiving the movement altogether and instead allow independent, community-based archives to do the job. Drake does, however, offer advice to traditional archives who still want to be involved. He tells traditional archivists to first look at their existing holdings to “see whether or not black lives matter there,” and to “confront [their own] complicity in white supremacy, patriarchy, and other structural inequalities that the movement is seeking to eradicate.” Moreover, they need to build trust among the “people, communities, and organizations around whose lives the movement is centered, a trust they should pursue not under the guise of collection development but under the practice of allyship.”

I’m curious–what did you think of Yeo’s and Bruner’s discussions of “sameness” and “authenticity”? Do you forsee a future in which digital objects are scrutinized for minute details in the same way manuscripts are today? What about the ideas presented by Drake–do you think that traditional archivists should have a role in documenting marginalized communities? If so, what steps do you think they should take to ensure they are doing so responsibly?

 

 

 

 

What matters most and how do you make it last?

I found this week’s readings overwhelming. This is primarily because it not only drew on a lot of the themes we’ve covered in class so far, but for me, is really at the heart of what it means to be an information professional. The subject was preservation intent, authenticity and selection, which quite honestly, seemed like everything to me. It turns out this is all interrelated.

What does it mean to be authentic?

Bruner describes four meanings of authenticity – verisimilitude, genuineness, originality, and authority – using New Salem Historic Site, a reconstruction of the village where Abraham Lincoln lived in his 20s. I couldn’t help but think of our discussion of artifactual identity in one of our earlier classes since it referenced another historic site, Mount Vernon (Owens, 15-17). Using Bruner’s terminology, the Mount Vernon mansion is authentic because it is the original. According to its website, “restoration efforts aim to represent the estate as it appeared in 1799, the last year of George Washington’s life and the culmination of his designs for Mount Vernon.” This description conforms to Bruner’s idea of genuineness because the idea is that someone from the same period could believe it to be from that period.

Verisimilitude seems one step removed from genuineness. It may pass as believable for visitors, but it isn’t picture perfect. Bruner’s description of New Salem and Mount Vernon’s website include descriptions of modern-day conveniences for tourists and upkeep. Bruner describes gutters on the log cabins that would not have existed at the time, and Mount Vernon has accessible pathways for wheelchairs. Presumably both have bathrooms somewhere on the grounds.

Bruner’s last meaning refers to an authority that certifies something as authentic. For example, the State of Illinois has authority to approve New Salem Village as the official reconstructed site. 

So what’s an authentic digital object?

We’ve learned in class that “digital information is material.” (Owens, 34) Just like words on a page in a book, it’s written on something like a hard drive. However, as a storage medium, hard drives are much less reliable than books. In order to preserve a digital object, you have to transfer it to something stable and be ready to do it again before the storage conditions fail. This concept is outlined in the storage component of the Levels of  Digital Preservation which we read about in our first week of class. The idea then of an authentic digital object precludes Bruner’s third meaning of “original” because we won’t be able to open and use files under the exact same conditions on the same hardware forever.

So how close can we get to the original and what does that even mean? Last week we learned about platforms layers. Digital objects are constructed within a certain context related to several factors such as software, operating system, file formats, etc. When this context changes, it affects how the object appears to us if it can at all. In order then to recreate the object we have to consider what’s important about it and what parts of the object we need to hold onto that ideal. This brings us to the idea of preservation intent.

Owens presents several examples in our readings of preservation which speaks more to creating an authentic experience of the object, but in order to get there, you have to think about the aspects worth preserving. In one case, it might be the appearance recreated through a screen shot; in another, it might be worthwhile to emulate the platforms that were originally used in order to present an interactive experience.

However, you don’t necessarily have control over all aspects to recreate the experience faithfully. Owens uses one example in Grateful Med, a software interface for searching medical information. In order to recreate the experience of using Grateful Med, you would need to emulate all the platform layers required to run the software as well as preserve the external medical databases used. Because of all the variations in platforms involved, this approach was considered impracticable. Instead of preserving the software, preserving the tutorial  served to fulfill the preservation intent which was to captured how the software worked.

This reminded me our readings last week on Documenting Dancebecause it showed how an experience can be documented without being strictly representational. You don’t have to make a direct copy. You just have to drill down to what you think is important to remember.

Who decides what’s authentic?

Bruner’s last meaning of authenticity dealt with authority. I think this idea was captured in two of our readings – Preserving Social Media Records of Activism (Jules, 2015) and Expanding #ArchivesForBlackLives: Building a Community Archives of Police Violence in Cleveland (Drake, 2016). Both of these articles have to do with social media, but it’s also about who has historically had the authority to save or neglect the history of marginalized people? Drake especially tackles this head on and describes how alienating the archival profession has been for black people. Archivists don’t have a place in preserving this story unless they acknowledge complicity in maintaining the white patriarchal structure.

Social media has been described as a way to give voice to people to tell their own story, but it’s complicated by issues of privacy and ownership as well as a means to capture “authentically” an experience from what may amount to millions of different perspectives.

I have to digress a little here because my feelings about social media are complicated. As an information processional, it’s not my place to direct how the public creates records. It’s even questionable as whether it’s my place to preserve them. I have to say as my own opinion, I question the value of social media as a way of authentically preserving an experience. Jules acknowledges the limitations of Twitter, but I think there’s a suggestion that these limitations can be overcome, and I’m not sure I believe that. The essence of Twitter is after all a means of surveillance, not sneaky government surveillance but marketing. Owens gave the example of Documenting the Now, an effort to ethically collect and preserve social media content. I have to hope that if smart people are putting their heads together to ethically preserve this, then maybe they can come up with a better alternative to current social media platforms all together.

There was so much more in our readings with week so I’ll look forward to reading your impressions.