“But it’s the way we’ve always done it!”: Challenging Traditional Archival Arrangement and Description

This week’s readings painted an excellent picture of how digital and analog archives must be treated in separate manners, especially when it comes to arrangement and description. Key archival traditions, such as provenance and original order, do not always adapt smoothly to born-digital material.  In the words of Peterson, “the units of arrangement, description and access typically used in web archives simply don’t map well onto traditional archival units of arrangement and description, particularly if one is concerned with preserving information about the creation of the archive itself” (“Archival Description for Web Archives“).

Owen’s chapter, “Arranging and Describing Digital Objects,” defines arrangement and description as “the process by which collections are made discoverable, intelligible, and legible to their future users” (129).  An archivist’s main job is to provide access to materials, and description and arrangement plays an integral role in finding that information in as smooth and painless a manner as finding aids can offer.  Since the 1898 publication, Manual for the Arrangement and Description of Archives, or simply the Dutch Manual, by archivists Muller, Feith, and Fruin, the principle of respect des fonds has dominated archival description and arrangement (Bailey).  Within this method, the ideas of Provenance and Original Order encompasses how archivist should deal with physical materials.  Provenance refers to “the origin or source of something” and original order is “the organization and sequence of records established by the creator of the records.”  Respect des fonds uses these two concepts to impose the rule that materials made by one creator should not be intermingled with materials from another creator and that when the archives receive materials, they should remain in that order to prevent the loss of contextual information. Yep, that’s a lot to wrap your mind around, and even archivists sometimes struggle with this method.  In the digital realm, Drake and Bailey argue that these concepts do not easily transfer to born-digital objects for multiple reasons.

Drake’s article “RadTech Meets RadArch: Towards A New Principle for Archives and Archival Description” argues that provenance is a colonialist and imperialist ambition and should be replaced with a new principle that allows for communities impacted by those materials as having recognition for being a part of the provenance.  In short, Drake believes that determining provenance is a grey area, especially when “only a sliver of Western society had 1) the legal privilege to create and own, and 2) the legal protection of that privilege.”  Because of this segregation and exclusion of many demographics in archives and history, Drake believes that when it comes to digital materials, “users should be able to obtain […] 1) the person(s) who had access to a particular file or folder, 2) their level of access, and 3) the log of changes to these access permissions.”  By mentioning who had access to a file and how much access they had (i.e. who could change parts of the file), Drake starts to blur the clear-cut distinction of creator.  As it is easy for there to be multiple editors, creators, and contributors to files, there is no longer a single person that can be inputted into the provenance statement, but multiple creators.

Speaking toward the practice of original order, in Bailey’s article, “Disrespect des Fonds: Rethinking Arrangement and Description in Born-Digital Archives,” he comments that with born-digital material, there is no physical order to where the bits are written into the storage device, and the order changes as the file is constantly changing, or at least the metadata is, every time a file is opened (i.e. a file’s “last opened” date). He states that “a new order [is composed] as new bits are assigned to other available areas of the disk.”  He then continues to state, “In a database, objects are related but not ordered. The database logic is non-linear and there is no original order because order is dependent upon query.”  What does all this mean? It means that it is almost impossible to preserve an original order with born-digital materials because of the nature of digital objects. “Digital objects will have an identifier, yes, but where they ‘rest’ in intellectual space is contingent, mutable.”  Because original order does not exist in a database structure (a structure opposite to a narrative structure, as explained by Manovich’s article), the concept of original order is impractical for the arrangement of digital materials.  Marshall adds to this conversation by discussing the authenticity of duplicate file copies by various “creators” in her article, “Digital Copies and a Distributed Notion of Reference in Personal Archives.” She mentions that people make copies of their files for many reasons, including to prevent loss and to make changes without affecting the original. Therefore, where do the multiple copies fit into original order, and to some extent, provenance?

So how do we arrange and describe our digital materials if we can’t use the traditional archival methods?  Owens offers that one stick to the More Product, Less Process theory by Greene and Meissner (132).  He says that because there is usually a sizable amount of information about the arrangement and creation of a digital object within its metadata, one can take that information to “create a collection-level record and provide whatever level of access [one] can legally and ethically offer” (135).  But are there any other ways other than not arranging the materials?

 

Discussion questions:

From Drake’s article, How can archivists revisit this core principle [Provenance] to learn of its limitations and envision a post-colonial archive free of these oppressive forces and equipped to meet the challenges of contemporary born-digital archival records?

How can we better our software like Archive-It to make it compatible with born-digital materials? What is it missing? (based on Peterson’s observations or your own experiences with metadata or cataloging software)

 

 

 

 

 

Challenging traditional archival principles

Our readings this week covered description and arrangement in digital preservation and challenged the effectiveness of archival principles respect des fonds and provenance for new media, objects.

Database nature of new media objects

Lev Manovich details how new media objects are essentially databases. Digital objects are a layered collection of items. Users can interact with the same digital object in a variety of ways, meaning the objects lack a linear narrative.

Manovich introduces videogames as an exception. On the surface level, players interacting with the game follow a narrative and pursue defined goals. However, Manovich goes on to clarify that to create a digital object is to create “an interface to a database” and that the content of the work and its interface are actually separate. Even while playing a video game, which seems to follow a narrative, players are only going to points mapped out by the database creators. The database nature of new media objects contrasts the narratives often provided by analog objects, meaning new methods for describing and arranging digital objects are needed.

Describing New Media Objects

Professor Owens details Green and Meissner’s suggestion of More Product, Less Process (MPLP). Green and Meissner believe that organizations should avoid putting preservation concerns before access concerns. Collections should be minimally processed so that they can be accessed by researchers sooner. Item level description should be provided rarely. For arrangement and description, archivists should strive for the “golden minimum.”

Owens provides the 4Chan Archive at Stanford University as an example of using the MPLP approach for digital objects. The archive is available as a 4 GB download, an example of quick and easy access. Stanford opted to include limited but informative description, including the scope of the collection and metadata for the format, date range, and contributor.

Owens also states that digital objects are semi-self-describing due to containing machine-readable metadata. Owens uses tweets as an example. Underneath the surface, tweets contain a lot of informative metadata, such as the time and time zone.

In an effort to describe Web Archives, Christie Peterson tested Archivists’ ToolKit, Archive-It, DACS, and EAD. Peterson found that the “units of arrangement, description, and access typically used in web archives simply don’t map well onto traditional archival units of arrangement and description.” Discussing Archive-It, Peterson describes the break-down of the tool. Archive-It uses three categories: collections, seeds, and crawls. An accession of a collection of websites would be a crawl. Peterson found that there were no good options for describing a crawl. She could not say what the scope of the crawl was or explain why certain websites were left out. This means current tools and methods leave archivists unable to document their activity, creating a lack of transparency.

Challenging Archival Principles

Owens defines original order as “the sequence and structure of records as they were used in their original context.” Original order maintains context and saves time and effort from being spent reorganizing and arranging content, leading to faster access. However, maintaining original order can be difficult for digital objects.

Jefferson Bailey describes an issue with following traditional archival principles with digital objects. Since every interaction with a digital object leaves a trace of that interaction, there is no original order. Bailey explains that with new media objects, context can “be a part of the very media itself” since digital objects can be self-describing. Attempting to preserve original order is unnecessary as meaning can be found “through networks, inter-linkages, modeling, and content analysis.”

Bailey also gives a history of respect des fonds. This principle comes from an era of, and thus is designed for, analog materials. Respect des fonds made the organization of records focus on the creating agencies. Some critiques of the principle are that there is not always a single creator, those who structured the documents may not be the creators, and that original order “prioritizes unknown filing systems over use and accessibility.”

Jarrett Drake asserts that provenance is an “insufficient principle” for preserving born-digital and socially inclusive records due to its origins rooted in colonialism. The provenance principle asserts that records of different origins should not mix. The principle became popular in the United States in the early 20th century, when few were able to own and control their records.

When it comes to digital objects, Drake states “the fonds of one creator are increasingly less distinct from the fonds of other creators.” He provides the example of Google Drive, which allows multiple people to collaborate on document creation. Another change in the times that affects provenance is the rise in people who are able to create and own their records. Nowadays, people are able to name and describe themselves. According to Drake, archivists should support this and name creators in archival description according to their self-assertion.

According to Owens, using community-provided descriptions is becoming popular. To create the online exhibition The Reaction GIF: Moving Image as Gesture, Jason Eppink asked the Reddit community for canon GIFs and descriptions of them. Eppink wanted to mark what GIFs meant to those who used them and getting the description directly from the community enabled him to do that.

Our readings also assert that, when dealing with multiple copies, it’s easier to keep all of them. As Catherine Marshall states, “Our personal collections of digital media become rife with copies, exact, modified, and partial.” One copy may have better metadata, another better resolution, and so on. We have so many copies that the “archival original” is decentralized and not straightforward to determine. Marshall states that it is better to keep these copies than delete them. This is due to people having too many copies, storage being so cheap, and people not knowing which copy they’ll want in the future.

Discussion Questions

Our readings lately have been asserting the value in allowing communities to describe their records. In chapter 7, Owens points out that giving description over to the end user can “easily result in spotty and inconsistent data.” How can archives maintain a balance between empowering communities and keeping quality, consistent data?

What are your thoughts on permitting anonymity in archives? Do you think that it’ll lead to doubt over the validity of the record later on? How can archives demonstrate truthfulness in a record while protecting the creator’s identity?

We started out talking about the theory, but this week’s readings really got into the nitty gritty of how to initiate and sustain digital preservation projects.

Where do I start? What’s involved?

Owens’ chapter points out three major elements of preservation required to save “the bits”. We need to create and maintain multiple copies of digital objects, use fixity checks to ensure you can account for all the information in those digital objects, and ensure the security of those digital objects so that they can’t be corrupted or deleted by some unsavory sort.

These are our basic elements, and the folks from POWRR (From Theory to Action: Good Enough Digital Preservation) want to emphasize that when you’re starting out, it’s best to focus on the basics. It’s easy to get overwhelmed by all the technical aspects of digital preservation, but it’s really an incremental process that you can work up to. Before maintaining those multiple copies, fixity checks, and working on security, it’s a good idea to take stock of your institution’s strengths and abilities,  and consider what kind of resources it can devote to digital preservation, start planning ingest workflows, and creating a basic inventory of your collection.

Owens reiterates this last suggestion: start out by creating an inventory of what you have, and start thinking about policies and practices that will help you manage that collection. (Make sure you keep that inventory up to date!)

So actually, how do I start “doing” digital preservation?

You’ve got a sick inventory now, and we can get started on preserving those bits. Owens suggests running a fixity check to take stock of each digital object at the start, and then moving on to making copies. Both Owens and the NDSA indicate that it’s generally best practice to keep at least 2-3 copies, and to store those copies each in different ways and locations, so that each copy faces a different type of disaster risk. How do you do that though? Actually, a lot of institutions collaboratively form “consortia” like MetaArchive and Data-Pass where “one institution [hosts] a staging server, to which the other partner institutions transfer their digital content.” (From Theory to Action) So multiple organizations can help each other out with storing their digital content. Sweet. Let’s be friends. (You send them some copies.)

Oh, but that first fixity check wasn’t enough. You’re not done now. You just made a bunch of copies of your files and transferred them to your bud to store! Run another fixity check (maybe using a sweet cryptographic hash or checksum) to make sure that all your files got copied correctly. Any time you make new copies, or transfer those copies you gotta check those files to see if they’re still identical to the originals! Also– it’s probably a good idea to run some fixity checks periodically to make sure everything’s chill.

But say— what if everything’s not chill?

You’ve got some numbers that just aren’t adding up, could it be that some of your files got corrupted? You gotta fix those. Using the results of your fixity check you can identify which files aren’t totally correct and try to make new, better copies, or you can attempt to repair the file. “This is done by replacing corrupted data with the distributed, replicated, and verified data held at “mirroring” partner repositories in multi-institutional, collaborative distributed networks. The consortia groups MetaArchive and Data-PASS use LOCKSS (“Lots of Copies Keep Stuff Safe”) for this kind of distributed fixity checking and repair. ” (NDSA Storage Report)

So remember those copies you sent to your friends? Because you have multiple copies of your stuff, you can use those to help fix all your broken ones! Sweet, geographic redundancy really pays off.

Am I done?

NO!

We still gotta think about security and access!

Security could be its own whole thing, but really this involves determining who has access to your files and controlling what they can do with those files. Keep logs of who accessed files, and what they did to those files. If you don’t have any fancy database software to keep track and control access to those original files, Owens suggests you could simply keep those files on a hard drive in a locked drawer and there you go– no one’s deleting that stuff.

And access is the whole reason we’re doing any of this! How will you provide people with those files? Will anything be restricted? Certainly, some of your digital files will have information that shouldn’t just be publicly accessible, or maybe your donor doesn’t want anyone to read those files for a while. If that’s the case, it may be a good idea to stick that into a dark archive, which will preserve your stuff, but no one will be able to read it. Or, if your stuff is less sensitive, maybe it could just be made available online. Your organization should probably develop policies specifically for security and access to your collections.

So we’ve covered maintaining multiple copies, running fixity checks, and security! I think we’re good.

Questions I guess?

So I know I really glossed over these processes, but I wanted to talk more about the preservation of specific file formats, which I think both Owens and the “Party like it’s 1999” reading about emulation seemed to touch on. How do you determine the feasibility of saving a particular file? There are hundreds of different types of proprietary file formats that have come and gone over the years, but how do you determine if you should migrate a file to a more common, modern format, or if it’s necessary to emulate an environment that enables you to experience the file as it was originally intended?

Are there risks of losing some of the affordances of a specific format when migrating to a new file format? If it’s possible to preserve an original file bit-for-bit, would it be more authentic to keep it as is and provide access through an emulated environment? or are we less concerned with the authentic, artifactual experience of that file and more concerned with the information?

I know that the answer to these questions is more so “it depends” or “it’s contextual”, but I more want to see people’s personal thoughts on emulation. I know it’s a complex process to create emulators, but once we are able to successfully emulate past operating systems, can you see emulation becoming “best practice” for digital preservation and access?

Digital Preservation from Both Sides

I haven’t done much digital preservation. I’ve been in the bit trenches. The work I’ve done is to digital preservation as community sandbagging is to the Army Corps of Engineers. I need to reword my resume.

Bit preservation is our most urgent set of tasks; managing multiple copies, managing and using fixity information and ensuring our data is secure. All these activities are directed at long term usability, but digital preservation is broader. It’s concerned with the future viability of file formats and software; with future renderability. Having done the basic bit level work, we might consider migrating files en masse to more sustainable formats or “leave the bits alone” by emulating or virtualizing earlier computing environments.

Our digital preservation decisions will not be identical across institutions. Best practices are like recipes; they’re frameworks. “Approaches to copies and formats should fit the overall contours of your institution’s collecting mission and resources. This is about the digital infrastructure you establish to enable the more tailored work that is done around individual collections” (Owens, p. 105). A video art or game collecting museum might attempt emulation to preserve as much of the experience of the work as possible; seeing the artifactual aspect as more intrinsic than the informational to its mission. But if those bits aren’t safe, these considerations will never arise. Sandbag first.

Much of our study this week is focused on fixity and storage. In my bit preservation experience, fixity checks can get lost in the mix. 80% of NDSA member organizations reported that they use some sort of fixity checking. I think that members and non-members alike have mostly heard the urgent call to action to get the bits off the floor and maybe while they’re at it, make multiple copies. Then, of course, they have to store them somewhere, so they’re forced to make storage decisions. But I’m not so sure that organizations often understand the necessity of maintaining bit-level integrity and how they’d go about it. Then again, they could be taking it for granted that their storage solution is a fixity solution as well. And it might be, but I think that’s something of an afterthought.

We’re going to talk about access later in this course, but my impression has been that access, as a buzzword, can cloud our perspective on preservation. I’m concerned that when that scan hits the web, it’s tempting to feel that our preservation work is done. We’d never take that approach to analog media. We wouldn’t hang a painting in a gallery, throw our hammer in the truck and head home. Well we might, but we’d be invested in maintaining the integrity of that work for future shows. Accessibility now isn’t access. This might be obvious to us, but just this week I was speaking to someone about born digital material and they asked me if I was also interested in endangered media. It’s still a hard sell.

That brings me to one final thought I had while reading the case studies in the POWRR group white paper. The experience at Chicago State University was illuminating. “The defining moment when several library staff members recognized the importance of digital preservation activities occurred when they realized that grant activities digitizing library collections included no provision for storage or preservation” (p. 21). That might be because the grants themselves don’t allow for appropriate storage solutions. I was investigating grants for an in-house digitization project I was working on and had determined that cloud storage was the best, and most affordable, offsite storage solution for my organization. Once I found a grant that wouldn’t exclude in-house digitization projects, I realized it had seemingly arbitrary restrictions excluding “subscription-based” or simply, “cloud” storage.

Our reading this week helped me recontextualize my own work as a novice in this field. I wonder if anyone else had a similar experience.

I Find Your Lack of Faith… Well, Prudent.

Join the Dark Side

I couldn’t help feeling that the underlying subtext to this week’s readings was an embracing of distrust and uncertainty. Distrust in physical media, file formats, third-party cloud services… even the computer’s ability to do what many take for granted: create an exact copy. Uncertainty manifested itself in issues such as the future adoption levels of formats, the continuity of tools, and even the motives and competency of our own staff. Rather than being dismayed by this somewhat dour outlook, I found it to be a heartening confirmation of my belief that pessimism can indeed be used as a force for good.

I guess I’m weird like that.

Owens Chapter 6 kicked off this theme of distrust with the recurring phrase “hedge your bets.” This one phrase was applied repeatedly to the first of three core processes for bit preservation: 1) creating and managing multiple copies, 2) managing and using fixity information, and 3) establishing information security protocols to keep accidents and attacks from occurring at the hands of staff or users. In the context of the first process – managing multiple copies – the “hedge your bets” approach necessarily results in a proliferation of file types, storage media, and a geographically sprawling network of storage locations. The point of this push for diversity being that no one disaster, bad actor, or system failure is likely to wipe out all copies.

The distrust also extended to seemingly mundane processes like the act of transferring data, and minimizing the number of people capable of accessing the objects. But the issue that interested me most was the emphasis to not put too much faith in any one tool. As Owens notes, vendor lock-in is a real concern that necessitates forming an exit strategy before acquisition is even complete (p. 115). I have seen this happen in my own career and know how dangerous it can be. Indeed, it was one of the catalysts that inspired me to seek this degree.

The theme of distrust continued in the NDSA Storage Report. This survey found that the majority of NDSA members’ desire for control over their own holdings tended to dissuade them from embracing commercial cloud services. The perception (or reality) of greater control of their data caused the majority to prefer joining institutional cooperatives where each member shares their data with a member organization in order to establish geographic diversity in their storage plan. Of particular concern among the group was the lack of transparency in the fixity checks performed behind the scenes by commercial cloud services. There was no proof offered that hashes provided at the time of access weren’t simply being replayed from the time of upload and thus providing a false sense of safety.

Again, I was struck by how issues of uncertainty and distrust could be harnessed to realize positive and productive ends. Perhaps I’ve finally found my people?

A New Hope

Not all the readings were gloom and doom. “From Theory to Action” in particular revisited many of the themes we’ve touched on in previous weeks emphasizing a simple and incremental approach to beginning a preservation program. As the subtitle of the piece indicates, they emphasize embracing the concept of “good enough,” and then building on it. Digital preservation is not a binary status requiring that an institution either be moving at light speed or standing completely still. Institutions should focus on near term goals that can immediately improve preservation, however small and simple they might be. But probably the biggest takeaway from this piece was the degree of confidence and self-efficacy the POWRR group members instilled in each other simply by choosing to assess and tackle their collective issues in a cooperative fashion. The creation of communities of practice is particularly effective at helping the entire group identify solutions to common problems.

Discussion questions

In Chapter 6, Owens notes the importance of differentiating working files from files that have become static and thus ready for long-term storage. I have found that in practice this is more difficult than it would seem, particularly for video. In our multimedia department the concept of finality has been elusive at best, to the point that our manager gets angry if a file is actually labeled “final”, because it becomes untrue almost the moment it’s saved. Our company’s trend of editing-by-committee basically guarantees at least a few more rounds edits no matter what. Even an extended passage of time is no indication of finality. Customers will come back and ask for changes to a video many years after the first version, usually because they want to remove a staff member that has left, or change someone’s job title. Saving an editable version requires saving the non-linear editor project files and all associated source files. This is the most complicated version to save and the first to become obsolete. So, my question for the class is how should we as archivists respond to such a dynamic situation, where the concept of finality is tenuous and fluid?

And lastly, I didn’t discuss Dietrich’s “Emulation for Everyone” above because it seemed like something of an outlier relative to the others. I find myself fascinated with emulation as a practice, but wondering about its feasibility for all but the most extreme cases. For example, it was mentioned at the end of this piece that researchers looking at the Jeremy Blake Papers actually preferred using the modern OS and were really primarily interested in the informational value of the objects. Authenticity and fidelity were less of a priority. This seems like a lot of effort to have gone to for an experience that no one really needed. So my question for the class is, where do you see emulation fitting into the list of preservation options? Should it require a more rigorous examination of preservation intent to make sure the considerable extra effort is justified?

I’m also curious to what extent an emulated system becomes a digital object in itself, which then becomes exponentially more complicated to preserve? At what point do we decide that these towering platform stacks held together with scotch tape and shoe goo are no longer worth the expense of maintaining?

 

I formally apologize* for all Star Wars references.

*Not really.