Digital Preservation Reflections and Report

Digital Preservation is a course I had been looking forward to since I first looked at the iSchool Two-Year Plan. I’ve even moved classes around to make sure I could fit in Digital Preservation Fall 2018. My specialization is Archives and Digital Curation and I’m greatly interested in this field. I felt strongly that once I took this class, I would just “know” digital preservation.

I’m happy to have taken this course because it sort of lifted the veil for me. Digital preservation is not just one thing; there are many methods and approaches to preserving digital objects. I did not expect to learn so much theory, and kind of did not enjoy it at first, but now at the end of the semester I realize why it is so important.

Here are a few of the most important lessons I learned this semester:

Variability in digital preservation approaches

There is no one-size-fits-all approach to digital preservation. It always “depends.” It depends on the institution’s preservation intent, on their resources, on their audience, or other factors. These influences all shape the actual digital preservation plan. I enjoyed learning about different preservation approaches and the reasons behind them, like the benefits of emulating Salman Rushdie’s laptop or why screenshots can be enough to suffice for web archiving. Knowing about the different approaches makes me feel confident that I will be able to address unique digital preservation issues that I may come across in the future.

Scalability of digital preservation plans

Institutions don’t have to immediately follow every step in the OAIS model to practice digital preservation. I really appreciated learning about the NDSA Levels of Digital Preservation as it makes scaling plans a lot easier. It’s nice to know that organizations of every size are able to work on preserving their digital content. I also enjoyed class discussions on our consulting projects for this reason. We were all learning the same lessons and were able to apply them differently to suit the needs and resources of our institutions.

Digital preservation is ongoing and constant

Digital mediums and objects can become at risk for obsolescence and corruption, so preservation actions never stop. I was surprised to learn this, as I imagined there would be some one-and-done software application to ensure records were safe. As it turns out, digital preservation is more complicated than that. It’s not even wise to rely on a single software application because that can become obsolete and unsupported, too. Observation, migration, storing copies, and other habits are all important for ensuring the longevity of digital objects.

My favorite thing about this course was definitely the consulting project. Although we learned a lot from class and readings, actually being able to put our learnings to use made me feel more confident about my digital preservation knowledge and capabilities. This was also my first time learning about and writing policy, so that has been valuable for me, too.

Discussion Question

Now that we have taken this course and learned so much about the practice, how can we contribute to the field of digital preservation?

GHS-Final-Report

Geneva Historical Society Digital Preservation Policy

Purpose

The purpose of this policy is to set rules and guidelines for the management and preservation of digital objects in the Geneva Historical Society. This policy aims to address the following concerns and risks:

  1. Duplication of digital items
  2. Difficulty locating digital items
  3. Deletion and corruption of files

The procedures discussed in this policy intended to mitigate these risks and concerns as much as possible.

Scope

The scope of this policy covers born-digital items and digitized objects in the Society’s collection.

Born-digital refers to items originally created in digital form, such as podcasts or the Society’s promotional or educational materials.

Digitized objects refer to the scanned copies the Society makes of their physical materials.

Standards

This policy is written to conform to the National Digital Stewardship Alliance’s Levels of Digital Preservation. Due to this, increasing the quality of metadata, storage maintenance, file fixity procedures, file formats, and information security are the focus of this policy.

Metadata

An inventory of the collection will be created and maintained in a Microsoft Excel workbook. Headers for the spreadsheet may include directory location, file format, file size, resolution, physical location of the item if digitized, or any other headings the Society finds useful. A back-up inventory will be created and used on an external website such as Dropbox or Google. The back-up will be updated daily. Having a regularly updated back-up copy of the inventory will minimize damages in case the workbook is accidentally deleted or tampered with. To ensure this inventory contains every item in the collection, employees will add object information immediately after digitization.

A file plan document will be created. This file plan will assert file naming and directory location policies. The file naming policy will be dictated by the Society. General guidelines for the naming policy include using underscores instead of spaces and only using lowercase letters. Legacy holdings must be renamed in order to fit the naming policy. Legacy holdings refer to digitized and born-digital objects created before the implementation of this policy. A tutorial for batch renaming files in Windows 10 is included in the Related Documents section.

Digital objects should be placed on a shared drive in the proper location following the Society’s directory location policy. Directories will be based on physical locations of objects, unless the Society finds a more suitable method. Legacy items should be relocated to fit the proper place in the directory.

Employees will refer to the file plan when naming and storing digital objects.

For quality control, the Society will practice “file clean-up days” twice a year, so that every employee can dedicate time to ensuring the metadata protocol is followed.

Storage Management

The Society currently uses a Synology NAS system. This server will be located in a cool temperature-controlled environment to prevent heat damage. The server should require administrative access so that a limited number of people are able to alter files.

Two additional copies of the Society’s collection will be created to meet the standards of NDSA’s Levels of Digital Preservation. For the first copy, the Society will use Preservica to store their collection. A second copy will be given to a partnering institution. The Society will document all storage systems in use and the information needed to access and use them.

Changes to a file must be replicated on all storage copies.

File Fixity

Using AVP’s Fixity application, the Society will perform fixity checks annually. These fixity checks must be performed on all three copies of the Society’s collection in order to be considered complete. Fixity information, such as hashtags, should be recorded in a spreadsheet with a high level of authorization.

File fixity refers to digital objects remaining as intended through time and transfer, meaning that details such as content and file size do not change.

File Formats

All the file formats in use will be documented in the Excel inventory. The Society will limit the formats in use, so that only known, open, and widely-used formats are chosen for preservation. At risk-formats will be migrated to safer formats when needed.

At-risk formats refer to formats that are becoming obsolete or inaccessible.

Information Security

The Society will identify who has read, write, move, and delete authorization to individual files. These authorizations and access restrictions will be documented.

To protect the data recorded in the Excel inventory and reduce risk of deletion, only one person at a time will have permission to edit the workbook. When the inventory is open and in-use, others will be able read it but not edit. Since the workbook will also be backed up daily, this will reduce any damage caused by accidental deletion.

If possible, unauthorized attempts to access files will be logged and reviewed daily.

Every computer will be upgraded to the same operating system. The Society will schedule days to manually update the operating systems and run virus scans. It is important to manually update operating systems in case an automatic update does not occur.

Review

This digital preservation policy will be reviewed annually to check on the effectiveness of the policy and whether it meets the needs of the Society.

Related Documents

NDSA Levels of Digital Preservation

How to batch rename files in Windows 10

Digital Formats: Factors for Sustainability, Functionality, and Quality

Next Steps for the Geneva Historical Society

Introduction

The digital holdings of the Geneva Historical Society, located in New York, are spread out over various websites and institutions. The Society has partnerships with the Hobart and William Smith Colleges, NewYorkHeritage.org, and the NYS Historic Newspapers website. The Society’s digital holdings can be found on these websites, their own Synology NAS storage system, and on Google Photos and Dropbox. These holdings include materials such as photographs, post cards, and microfilmed newspapers.

One of the largest problems the Society faces with their digital items stems from a lack of organization. When an employee digitizes an item, that item is saved and stored according to that employee’s own personal naming and filing system. Staff regularly do not know what other employees have scanned, leading to much duplication and difficulty in finding digitized materials.

Before writing a preservation policy for the Society, I will first review the Society’s current standing in the NDSA Levels of Digital Preservation (LoDP) and then recommend next steps that the Society can take to improve their standing. The NDSA levels can be found here.

Metadata

Although Metadata is ranked as the fourth priority on the NDSA LoDP, it is a critical concern for the Society. To address their duplication and disorganization issues, it is pertinent for the Society to create an inventory of their digital holdings. This inventory can be created on a Microsoft Excel spreadsheet and should have headers for information such as directory location, file format, and file size. Resolution would also be a beneficial header so that, in the future, staff members can know if the digitized image fits the proper resolution that they are looking for. There should be a backup of this inventory in case the original file is accidentally deleted or corrupted. Simple ways to backup the inventory include using Dropbox or Google Sheets.

Other than just creating the inventory, the Society would also benefit from standardizing the documentation of digital materials. This standardization should include a file naming policy so that items are always named in predictable and understandable ways, increasing findability. While creating the inventory, files should be renamed in order to fit the standard. Another way to standardize the documentation of the digital materials is to restructure the directories so that everyone may know the locations of items, instead of them being restricted to a single employee’s desktop. One way to structure the directories can be to base them on the physical locations of the materials. This way, if an employee knows where to find the physical object, they can use that same process to find the digitized version. Thus, the inventorying process would also involve relocating digital objects to fit into the new directory structures.

So that every employee will know how to name and locate items, the Society should create a “file plan” document. The purpose of this document will be to explain the standards for naming and storing items and why the rules exist. For future digitization, staff members should record file information in the inventory right after scanning.

Creating the inventory will be a high resource requirement. Due to the amount of time that creating the inventory will take, the Society can use community volunteers and dedicate staff hours to the project. The Society can also have a “file clean-up” day or time every so often so that every staff member can devote time to inventorying and cleaning up the collection.

Storage and Geographic Location

The Society currently uses a Synology NAS storage system. To satisfy the storage requirements of the NDSA LoDP, an organization needs to have multiple copies of their collection in different geographic locations. To meet the level 1 requirement of LoDP, the Society needs to create a copy of their digital collection. The Society could use an application such as Preservica to store their digital materials. To reach the second level, the Society will need to document the storage systems they are using and what they need to use them (such as access information). The Society will also need to create and find a storage location for a third copy of their digital collection. This third copy can potentially be given to one of the Society’s partners.

Fulfilling these steps will demand a medium resource requirement from the Society. Improving the Society’s standing in this area could cost both time and money (should the Society decide to use an application like Preservica).

File Fixity and Data Integrity

File fixity refers to the digital object remaining as intended through time and transfer, meaning that details such as content and file size do not change. The purpose of fixity checks are to ensure the digital objects have not changed. There are no fixity checks currently being done at the Society. To reach level 1 of LoDP, the Society needs to create fixity information upon creating and acquiring a new digital object. To do this, I recommend that the Society record the file size and file count. This information can be recorded in the inventory. To reach level 2 of LoDP, the Society needs to check file fixity for all acquired digital materials. The Society can do this using AVP’s free application Fixity. I would also recommend that the Society practice annual fixity checks to make sure none of their content has corrupted.

For materials already in the Society’s collection, the Society can record their file size and count while creating the inventory for future reference. Working on file fixity will be a low resource requirement for the Society. Documenting fixity information for existing files can be done while working on the inventory and the procedure for adding fixity information for new digital objects can be added to the file plan. Checking file fixity can be done using free applications, so there does not need to be a financial cost to the Society.

Information Security

Some information security is implemented at the Society in the Photo Archive. Although the public may use the computer, they cannot access all the files that staff can. These access restrictions can be further implemented to protect the digital collection. To reach level 1 of LoDP’s information security section, the Society needs to identify who has read, write, move, and delete authorization to individual files. This should be done after the inventory has been created and existing materials have been renamed and relocated. The benefit of having these restrictions is that unauthorized people will not be able to tamper with or misplace files, maintaining the order and efficiency of the organization. The Society should also document who has the authorizations and access restrictions for the content.

Implementing this level of information security will be a medium resource requirement for the Society. The Society will have to decide and document who will be able to access, alter, move, and delete which collection materials, which may require a lot of time.

File Formats

Although the employees at the Society already know that they use formats such as pdf, tiff, and jpg, it is still important to document all of the file formats in use. This can be done once the inventory is created. After documenting the formats for each item in the collection, the Society will be able to see each the formats being used. The Society should limit their formats to “known open formats,” meaning formats that are widely used and non-propriety. This lessens the risk of the format becoming obsolete. To reach levels 3 and 4 of LoDP, the Society will need to monitor these file formats to see if they are becoming obsolete and migrate at-risk formats when needed.

Working on the file format section of LoDP will be a low resource requirement from the Society, as it should be relatively easy to do once the inventory has been created.

Conclusion

The most pressing need for the Society is the creation of an inventory. Once the inventory is created, the Society will be able to advance in the other areas of LoDP with lower resource requirements.

Geneva Historical Society

Introduction

The Geneva Historical Society is located in Geneva, New York. The Historical Society is committed to preserving Geneva history and using digital and analog materials to educate the community and sustain community interest in Geneva’s memory.  The Society is made up of a small group of employees who wear several hats. The Society’s greatest digital preservation concern comes from too many duplicates, a lack of a content management system, and no inventory of digitized items.

Digital Content

The Society’s digital content includes historical materials that are part of their archival collection, promotional and educational materials, and digital items generated through business-related actions. The historical collection includes materials available on the Society’s website, on NewYorkHeritage.org, and on the Rochester Regional Library Council’s NYS Historic Newspapers website. There are photographs, videos, and audio available for public access on the Society’s websites. The items available on NewYorkHeritage.org includes post cards, papers, and photographs. On NYSHistoricNewspapers.org are the Society’s microfilmed newspapers. The Society no longer has the original newspapers and now rely on this website for research, both internal and for the public. The Society also has StoryMaps, which are hosted by Hobart and William Smith Colleges.

The Director of Education and Public Information uses digital content for promotional and educational purposes. The Educator uses photographs and maps from the Society’s collection, public domain materials from websites such as Library of Congress and Wikimedia, and photographs taken at the Society’s events. The Educator uses Illustrator, Indesign, and Photoshop files for publicity and programming. The Curator uses Publisher files for exhibits.

The Society’s digital holdings also include image collections on the website media gallery through WordPress, a MailChimp media gallery, content on Google Photos and Dropbox, and other institution-based materials.

Content Management

Much of the Society’s current struggle with their digital content comes from the lack of management for these materials. The staff at the Society practice independent digitization. When a staff member needs something digitized, they do it themselves. Employees are unaware of what items others have already scanned. These digitized items are kept in folders on the employee’s computer, using their own filing system. This process has led to much duplication.

This duplication continues on the shared computer located in the Photo Archive. Using an Epson V300 scanner, staff scan items and store them on this computer. The materials saved on this computer are accessible to all of the staff through their own computers. This leads to duplication of the staff’s individual files.

The employees do not all have the same computer or operating system. They use Windows but not everyone has Windows 10. The digital formats used by the Society are generally jpg, tiff, and pdf. The digital copies have varying resolutions and file sizes and there is no standardized filing system, so finding objects that other employees have saved is difficult and time-consuming. If the Educator is looking for a certain image and knows it has already been digitized, she will ask the Archivist or Curator if they know the location. If the image isn’t digitized or easily accessible, the Educator will use the Epson V300 scanner and save the image to her desktop.

The staff agree that improving the state of their digital content is a priority. The staff believe that the digital content is too unorganized and that time is wasted looking for items and scanning materials already digitized. Getting their digital content under control is critical to the Society’s mission, as both staff and members of the public are negatively impacted by the disorganization. Organization is needed for more efficient access and use.

NDSA Levels of Digital Preservation

In terms of the NDSA Levels of Digital Preservation, the Society is at Level 1 for the five areas. Although the Society has many copies, these duplicates are unintentional and uncontrolled and many have different factors (like resolution and size). The Society uses a Synology NAS storage system. Each employee’s desktop is backed up daily using Syncback Free to a Synology NAS over P2P network. The Society currently rests at Level 1 for the Storage section. When it comes to file fixity, there does not seem to be a process in place for checking fixity for digital objects. For the Information Security section, employees do not know what files other staff members are using as they digitize items independently, use different software, and have their own filing systems. There is no inventory for the Society’s digital content, so the Metadata section is also at Level 1. The Society does seem to use a limited set of formats with jpg, tiff, and pdf, but without an inventory of the digital objects, a complete list of the formats in use is unknown.

Further Collecting and Resources

There are more digital objects that the Society would like to collect. There are 50,000 photographs and business and family records that still need to be digitized. The archivist would actually like to digitize the entire archive but does not have the resources. To improve the state of the digital content, staff members could devote extra hours to digital preservation and the Society could accept community volunteers. The Society has previously worked with a yearly budget of $2000 for office equipment, including computers and software licenses.

Conclusion

The Society needs to get the content management of their digital materials under control, especially since they intend to add thousands of more objects into their holdings. Uncoordinated filing systems and a lack of an inventory contribute to their mass duplication and current struggle with inefficiency. Fortunately, the Society’s resources and staff commitment make the prospect of an improved digital preservation process likely.

Challenging traditional archival principles

Our readings this week covered description and arrangement in digital preservation and challenged the effectiveness of archival principles respect des fonds and provenance for new media, objects.

Database nature of new media objects

Lev Manovich details how new media objects are essentially databases. Digital objects are a layered collection of items. Users can interact with the same digital object in a variety of ways, meaning the objects lack a linear narrative.

Manovich introduces videogames as an exception. On the surface level, players interacting with the game follow a narrative and pursue defined goals. However, Manovich goes on to clarify that to create a digital object is to create “an interface to a database” and that the content of the work and its interface are actually separate. Even while playing a video game, which seems to follow a narrative, players are only going to points mapped out by the database creators. The database nature of new media objects contrasts the narratives often provided by analog objects, meaning new methods for describing and arranging digital objects are needed.

Describing New Media Objects

Professor Owens details Green and Meissner’s suggestion of More Product, Less Process (MPLP). Green and Meissner believe that organizations should avoid putting preservation concerns before access concerns. Collections should be minimally processed so that they can be accessed by researchers sooner. Item level description should be provided rarely. For arrangement and description, archivists should strive for the “golden minimum.”

Owens provides the 4Chan Archive at Stanford University as an example of using the MPLP approach for digital objects. The archive is available as a 4 GB download, an example of quick and easy access. Stanford opted to include limited but informative description, including the scope of the collection and metadata for the format, date range, and contributor.

Owens also states that digital objects are semi-self-describing due to containing machine-readable metadata. Owens uses tweets as an example. Underneath the surface, tweets contain a lot of informative metadata, such as the time and time zone.

In an effort to describe Web Archives, Christie Peterson tested Archivists’ ToolKit, Archive-It, DACS, and EAD. Peterson found that the “units of arrangement, description, and access typically used in web archives simply don’t map well onto traditional archival units of arrangement and description.” Discussing Archive-It, Peterson describes the break-down of the tool. Archive-It uses three categories: collections, seeds, and crawls. An accession of a collection of websites would be a crawl. Peterson found that there were no good options for describing a crawl. She could not say what the scope of the crawl was or explain why certain websites were left out. This means current tools and methods leave archivists unable to document their activity, creating a lack of transparency.

Challenging Archival Principles

Owens defines original order as “the sequence and structure of records as they were used in their original context.” Original order maintains context and saves time and effort from being spent reorganizing and arranging content, leading to faster access. However, maintaining original order can be difficult for digital objects.

Jefferson Bailey describes an issue with following traditional archival principles with digital objects. Since every interaction with a digital object leaves a trace of that interaction, there is no original order. Bailey explains that with new media objects, context can “be a part of the very media itself” since digital objects can be self-describing. Attempting to preserve original order is unnecessary as meaning can be found “through networks, inter-linkages, modeling, and content analysis.”

Bailey also gives a history of respect des fonds. This principle comes from an era of, and thus is designed for, analog materials. Respect des fonds made the organization of records focus on the creating agencies. Some critiques of the principle are that there is not always a single creator, those who structured the documents may not be the creators, and that original order “prioritizes unknown filing systems over use and accessibility.”

Jarrett Drake asserts that provenance is an “insufficient principle” for preserving born-digital and socially inclusive records due to its origins rooted in colonialism. The provenance principle asserts that records of different origins should not mix. The principle became popular in the United States in the early 20th century, when few were able to own and control their records.

When it comes to digital objects, Drake states “the fonds of one creator are increasingly less distinct from the fonds of other creators.” He provides the example of Google Drive, which allows multiple people to collaborate on document creation. Another change in the times that affects provenance is the rise in people who are able to create and own their records. Nowadays, people are able to name and describe themselves. According to Drake, archivists should support this and name creators in archival description according to their self-assertion.

According to Owens, using community-provided descriptions is becoming popular. To create the online exhibition The Reaction GIF: Moving Image as Gesture, Jason Eppink asked the Reddit community for canon GIFs and descriptions of them. Eppink wanted to mark what GIFs meant to those who used them and getting the description directly from the community enabled him to do that.

Our readings also assert that, when dealing with multiple copies, it’s easier to keep all of them. As Catherine Marshall states, “Our personal collections of digital media become rife with copies, exact, modified, and partial.” One copy may have better metadata, another better resolution, and so on. We have so many copies that the “archival original” is decentralized and not straightforward to determine. Marshall states that it is better to keep these copies than delete them. This is due to people having too many copies, storage being so cheap, and people not knowing which copy they’ll want in the future.

Discussion Questions

Our readings lately have been asserting the value in allowing communities to describe their records. In chapter 7, Owens points out that giving description over to the end user can “easily result in spotty and inconsistent data.” How can archives maintain a balance between empowering communities and keeping quality, consistent data?

What are your thoughts on permitting anonymity in archives? Do you think that it’ll lead to doubt over the validity of the record later on? How can archives demonstrate truthfulness in a record while protecting the creator’s identity?