Challenging traditional archival principles

Our readings this week covered description and arrangement in digital preservation and challenged the effectiveness of archival principles respect des fonds and provenance for new media, objects.

Database nature of new media objects

Lev Manovich details how new media objects are essentially databases. Digital objects are a layered collection of items. Users can interact with the same digital object in a variety of ways, meaning the objects lack a linear narrative.

Manovich introduces videogames as an exception. On the surface level, players interacting with the game follow a narrative and pursue defined goals. However, Manovich goes on to clarify that to create a digital object is to create “an interface to a database” and that the content of the work and its interface are actually separate. Even while playing a video game, which seems to follow a narrative, players are only going to points mapped out by the database creators. The database nature of new media objects contrasts the narratives often provided by analog objects, meaning new methods for describing and arranging digital objects are needed.

Describing New Media Objects

Professor Owens details Green and Meissner’s suggestion of More Product, Less Process (MPLP). Green and Meissner believe that organizations should avoid putting preservation concerns before access concerns. Collections should be minimally processed so that they can be accessed by researchers sooner. Item level description should be provided rarely. For arrangement and description, archivists should strive for the “golden minimum.”

Owens provides the 4Chan Archive at Stanford University as an example of using the MPLP approach for digital objects. The archive is available as a 4 GB download, an example of quick and easy access. Stanford opted to include limited but informative description, including the scope of the collection and metadata for the format, date range, and contributor.

Owens also states that digital objects are semi-self-describing due to containing machine-readable metadata. Owens uses tweets as an example. Underneath the surface, tweets contain a lot of informative metadata, such as the time and time zone.

In an effort to describe Web Archives, Christie Peterson tested Archivists’ ToolKit, Archive-It, DACS, and EAD. Peterson found that the “units of arrangement, description, and access typically used in web archives simply don’t map well onto traditional archival units of arrangement and description.” Discussing Archive-It, Peterson describes the break-down of the tool. Archive-It uses three categories: collections, seeds, and crawls. An accession of a collection of websites would be a crawl. Peterson found that there were no good options for describing a crawl. She could not say what the scope of the crawl was or explain why certain websites were left out. This means current tools and methods leave archivists unable to document their activity, creating a lack of transparency.

Challenging Archival Principles

Owens defines original order as “the sequence and structure of records as they were used in their original context.” Original order maintains context and saves time and effort from being spent reorganizing and arranging content, leading to faster access. However, maintaining original order can be difficult for digital objects.

Jefferson Bailey describes an issue with following traditional archival principles with digital objects. Since every interaction with a digital object leaves a trace of that interaction, there is no original order. Bailey explains that with new media objects, context can “be a part of the very media itself” since digital objects can be self-describing. Attempting to preserve original order is unnecessary as meaning can be found “through networks, inter-linkages, modeling, and content analysis.”

Bailey also gives a history of respect des fonds. This principle comes from an era of, and thus is designed for, analog materials. Respect des fonds made the organization of records focus on the creating agencies. Some critiques of the principle are that there is not always a single creator, those who structured the documents may not be the creators, and that original order “prioritizes unknown filing systems over use and accessibility.”

Jarrett Drake asserts that provenance is an “insufficient principle” for preserving born-digital and socially inclusive records due to its origins rooted in colonialism. The provenance principle asserts that records of different origins should not mix. The principle became popular in the United States in the early 20th century, when few were able to own and control their records.

When it comes to digital objects, Drake states “the fonds of one creator are increasingly less distinct from the fonds of other creators.” He provides the example of Google Drive, which allows multiple people to collaborate on document creation. Another change in the times that affects provenance is the rise in people who are able to create and own their records. Nowadays, people are able to name and describe themselves. According to Drake, archivists should support this and name creators in archival description according to their self-assertion.

According to Owens, using community-provided descriptions is becoming popular. To create the online exhibition The Reaction GIF: Moving Image as Gesture, Jason Eppink asked the Reddit community for canon GIFs and descriptions of them. Eppink wanted to mark what GIFs meant to those who used them and getting the description directly from the community enabled him to do that.

Our readings also assert that, when dealing with multiple copies, it’s easier to keep all of them. As Catherine Marshall states, “Our personal collections of digital media become rife with copies, exact, modified, and partial.” One copy may have better metadata, another better resolution, and so on. We have so many copies that the “archival original” is decentralized and not straightforward to determine. Marshall states that it is better to keep these copies than delete them. This is due to people having too many copies, storage being so cheap, and people not knowing which copy they’ll want in the future.

Discussion Questions

Our readings lately have been asserting the value in allowing communities to describe their records. In chapter 7, Owens points out that giving description over to the end user can “easily result in spotty and inconsistent data.” How can archives maintain a balance between empowering communities and keeping quality, consistent data?

What are your thoughts on permitting anonymity in archives? Do you think that it’ll lead to doubt over the validity of the record later on? How can archives demonstrate truthfulness in a record while protecting the creator’s identity?

6 Replies to “Challenging traditional archival principles”

  1. Thanks for your summary of the readings this week, Maya. You really covered the major topics and your writing was a great refresh for me. I found the focus on arrangement and description this week weirdly well-timed for my other library science classwork. I’m taking the Arrangement, Description, and Access class right now and I’ve been thinking through the principles I’ve learned in that class and how the readings this week challenge their underlying logic and necessity. For me, this week’s readings have reemphasized the need to understand those principles thoroughly on a theoretical level and how they work functionally, else any critique or suggestion for improvement could be easily dismissed (I’m thinking of Bailey’s article in particular).

    Anyways, I wanted to respond to your post in particular because I like the discussion question you posed about anonymity in the archive. My knee-jerk reaction was to think that that principle could never work. If we couldn’t provide the specifics, how could we possibly verify our records’ evidentiary value? Then I thought about how people use archival records. Practically, records are almost never used in isolation. Historians, for example, need to collect multiple primary resources (or at least significantly contextualize the one or two they have) to build an argument. Thus, I feel like the veracity of an anonymous record could maintain its use as a source of evidence through its relationship to similar records. This could be more difficult for genealogists who would be looking for specifics like names and other identifying features. How to get around that issue with anonymity in the mix would be more difficult.

    1. Hi Gwen,

      Could you expand on your thoughts about Bailey’s article?

      Also, I think you raise a good point about researchers being able to learn and construct arguments through contextualization, even with anonymity. That reminds of the Digital Harlem project (http://digitalharlem.org). It’s a database that shows users events (pulled from archives) that occurred in Harlem from 1915 to 1930. The names used in the project are pseudonyms in order to protect the identities of the people involved. Even with anonymity, the project is still very valuable for researchers and the project still appears credible.

  2. Hey Maya,

    Thank you for the great synopsis of all the readings for this week!

    With regard to your question about allowing creators to describe their own records, I think that of course this still makes sense to do, since the creator knows their records best. However, I think it’s also prudent to develop a system wherein trained archivists “quality check” the metadata that the creator provided. In doing this, they can standardize the way dates are formatted, maybe make the descriptive tags or subject terms more normalized (or just add new ones, and leave the creator’s tags alone!) and maybe beef up description.

    Alternatively, there can be some automated tools added to platforms for creators to submit their content, like suggested terms when a creator starts typing tags for their material, or automatic “semi-self-describing” metadata about the date, file format, etc of the file. I think there are a lot of options here, but I see your point about being concerned that creator made metadata may be insufficient.

    With regard to your question about anonymity, I think the context will be really important for judging authenticity. Like with 4Chan, most of the users are already anonymous, and that’s part of the culture of that specific site. Having the users be anonymous doesn’t really jeopardize the validity of the information, because that’s the way the information was originally conveyed. But if it’s in the case of tweets, where an author’s name is redacted, I would assume that the archive would retain that information, and validity of the information should be ensured because that author information still exists, it’s just not available to the public. As Gwen mentioned, the tweet could still be verified or made stronger if it’s used in relation to other tweets on a similar topic. However, I’m not really sure how this will specifically impact researchers, or what this practice looks like for analog records.

    1. Hi Maggie,

      I think having professionals “groom” the data or possibly add new data so that it’s standardized is a nice idea – at least for institutional archives. As Emily mentioned below, such interference would likely be inappropriate for community archives.

      I do think the practice of redaction already happens with analog records. I work in SCUA. Some items are restricted and need to be screened by an archivist before a researcher can view the material. For one of my projects at my job, I’m creating a preliminary inventory for an accession. Part of what I do is mark if the folder has any restricted content. I once had to mark a folder that actually included a social security number. Once this inventory is complete, in practice (ideally) if a box with restricted items is requested, the archivist will actually screen the box first to determine if the researcher can access it. I’m not sure what happens at this step, though, and if the archivist will remove the restricted information or not.

  3. The question about consistency in community-driven description is an interesting one and I think it depends on your vantage point. If the project is developed and funded by an established institution, there are likely policies and standards in place that can guide the creation of the metadata. In this context, archivists need to be able to share relevant guidelines and explain why things should be formatted a particular way. As Maggie mentions, you’d need to be able to invest a certain amount of time to do quality control and to provide additional training as needed. But for community-driven projects that aren’t tied to a particular institution, it might be harder to determine what quality, consistent data should look like and it may be more appropriate for the communities to come up with their own standards. For a project like A People’s Archive of Police Violence in Cleveland, it wouldn’t be right for me to nitpick someone else’s description that is based on their lived experience. We can offer formatting suggestions and explain why it might be beneficial to standardize terms, but the professional archivists should not be making all of the decisions in that type of situation.

  4. One of the most interesting things that I got out of the readings was the idea of crowdsourcing collection development. In the chapter from Professor Owen’s book, he describes web archiving projects that have begun collecting content based on what urls are included in tweets mentioning a particular term. The British National Library developed an open-source tool for this purpose called TwitterVane. The Internet Archive also embraced this idea and created a web archive using the URLs pulled from the 13 million tweets including the Ferguson hashtag. Dr. Owens writes, “In these cases, one digital collection becomes the basis for scoping another, and the two functionally annotate and serve as context for each other.” (156)

    I am curious–what do other people think about using the URLs in tweets to construct a web archive? Do you think it is an effective method of collection development? Can you think of any potential downsides to this approach? What types of twitter-generated web archives would you like to see created?

Leave a Reply

Your email address will not be published.