National Trust for Historic Preservation: Next Steps in Digital Preservation

Introduction

The National Trust for Historic Preservation has a central headquarters for the operations in Washington, D.C., both on the business side and on the collections side. The collections staff works with the institutional documents for collections and with the staff at the sites that are found across the country. These sites fall into 3 categories: Stewardship sites, Co-Stewardship Sites, and Affiliated Sites. The majority of their digital collections are documents and image files. This Preservation Plan is designed to understand the needs of the individual sites and centralize the file structure of the NTHP so that files can be found easily and accessed by sites and by the national headquarters.

User Needs Gathering Phase

Goal: This phase is meant for the national headquarters to understand the needs of the sites so that the preservation plan can integrate those needs with the needs of the national headquarters.

The staff at NTHP headquarters will conduct a short survey to determine what the sites know they need to preserve. This survey should include the files they feel need to be preserved, why they need to be preserved, what files they think headquarters needs to have access to, the number of staff people they have, and any resources for digital preservation that they have. This process needs to be a conversation between the individual sites and the headquarters so that both sides get the information and the support they need. It is possible that their needs will be at odds; there may be files that they do not want to share with the national organization (in the case of the Co-Stewardship sites), but that are necessary for legal reasons for headquarters to have access to. This process will also help the sites feel as though they have a role and a voice within the process, and will prevent the national organization from implementing unhelpful policies.

Education and Training Phase

Goal: Provide the individual sites with support so that they have as much information before and during implementation that will make the process go smoothly.

Survey results and detailed workflows will be distributed to the sites as they prepare to manage and preserve their digital material. Any questions that sites have will be answered before the implementation date so that misinformation does not result in an accidental mishandling of files. Answers to questions that may be relevant to multiple sites or were asked by multiple sites should be widely distributed among the sites. Any technical or specific processes should be described in detail, including screenshots if necessary.

Pilot Program Phase

Goal: This phase is meant to test the policies and steps deemed necessary by the national headquarters and the individual sites.

The plan steps defined in the Implementation phase should be taken on by a few sites to begin with to test their effectiveness and the sites’ ability to implement them. The sites that are admitted to the Pilot Program should be a representative sample of Stewardship and Co-Stewardship sites, as well as sites with larger and smaller staffs. This will give headquarters an idea of where the plan may need to be adjusted or where extra support might be needed. During the Pilot Program and the Implementation Program, different scenarios for data loss, such as server loss, natural disaster or deletion, will be described as a way of showing the importance of these processes. This program will be conducted in conjunction with the Education and Training Phase. Education and Training will occur before sites implement the steps and will continue through the process as new pieces are added.

Implementation

Goal: The purpose of the Implementation Phase is to begin preserving and organizing the digital files held by the national headquarters and the sites.

After the pilot program has been completed, the training will be given to the rest of the Stewardship and Co-Stewardship sites. Feedback from the sites should be solicited at multiple points during the process to ensure that the sites are getting enough support. If a site cannot complete the minimal plan, then extra support should be allocating, including looking for grant funding.

Minimal Plan

Goal: To gather information on the files that currently exist and organize them so that the national headquarters and the individual sites know what they have.

Audience: These steps need be implemented both at Headquarters and at each of the individual sites.

File Management

  1. Going forward, all files will be named and organized according to an established file plan. The names of the files and their folders will be standardized, as will the folder names. This will prevent any confusion in the future about what information is contained within a folder or file.

  2. Files that have been scanned for researchers will be integrated into the file management system and into the file preservation plan as they are being scanned.

  3. The file management plan will be a .pdf and will be pinned to the top of the shared file directory. This will allow the regulations to be in any easily accessible place so that it can be found and followed.

  4. A File Migration workflow will be established to migrate important files from their locations on individuals’ drives to the shared drive so that no important documents are lost. It will mirror any already existing plans for accession and preservation of paper documents. If a file would not be kept for long-term use as a paper document, it should not be kept long-term in a digital form.

  5. On a pre-determined day once a year, a few hours should be devoted to catch any files that have entered into the system and were not arranged according to the file management plan. This can implemented using a group reoccurring Google or Outlook Calendar event to remind staff members that the file management system needs to be maintained.

Storage and Geographic Location

  1. The starting point for this section is an inventory of all files and their locations. This is a labor-intensive project but very necessary. Without an inventory it will be difficult to determine what needs to be migrated and even what all the main office has on their computers and may need from individual sites. The inventory will require at minimum the file name, the folder location, the file size and a controlled-vocabulary category list (ie. legal documents, donor records, etc.) The categories can then be grouped by risk (low, medium and high) which will determine which materials must be preserved in other steps and sections of the implementation plan.

  2. Files that are not in the correct location need to be moved to the correct location and have that location updated within the inventory.

  3. All files determined to have high level of need will be copied and saved to an external hard-drive. Any time static document with a high-risk is added to the collection it will be added to the external hard-drive. Any files that have a high priority and are being actively edited will not be added to the external hard-drive until they become static. A second copy in a separate location will prevent data loss in case of server damage. Storing files on an external drive takes less time than uploading documents and adding them to the individual collection spaces on Re:Discovery Proficio.

  4. Documents on personal drives that do not have great institutional value will not be included in this inventory. These documents should be migrated into the inventory as they gain institutional relevancy. This is similar to retention schedules of paper documents, but would apply to digital materials. Individual employees can maintain their own inventories for their personal use.

Fixity and Data Integrity

The basic level of file fixity will be accomplished through the inventory described above. File fixity is essentially whether the file has stayed the same on a bit level, ie. that the file is identical to an earlier version and no data within the file has been corrupted or deleted. The inventory will standardize where files are kept and also give the organization an idea of the files that are in their system. This prevents files from being lost because no one knew they existed and is the first step in monitoring files.

Information Security

The minimum suggestion for this section is to create a document that lists the editing permissions for the different categories. Site collections managers should have control over their own files and should share permission with the relevant staff at headquarters. This will show headquarters and site staff what files they should have control over and what is not meant to be shared.

File Formats

Only accept and create files in .pdf (for documents) and .png (for images). A standard for files will help the larger organization track obsolescence and migrate file types if necessary. These file types are used frequently in standard practices and will not require a lot of migration. Documents within the inventory and in the external hard-drive will be static and will not be edited often, so the stability of a .pdf is desirable. The main migration will occur when moving documents from a .jpeg or a .doc file format. These are not as stable and are harder to preserve.

Moderate Plan

Goal: These steps build on the steps in the minimal plan to preserve the files at a higher level.

Audience: These steps should be implemented at Headquarters and are recommended for the individual sites.

File Management

  1. Files that have not been used within a pre-determined amount of time and have been designated as not being institutionally relevant for long-term use will be deleted, the same as de-accessioning processes for paper documents. This will ensure that unnecessary files are not taking up valuable server space.

  2. File management upkeep will happen twice a year for each staff member to catch files that may not have been integrated into the file management system mentioned in the minimal plan.

  3. Historically-relevant photographs should be downloaded from the shared asset-management system twice a year to ensure that these images are preserved. The images should be downloaded with the highest resolution as a .tiff file and considered among the medium level of risk for a file.

Storage and Geographic Location

  1. The next level would be to migrate the documents with the highest preservation priority to Re:Discovery Proficio into the each collection’s individual record. This will give the documents a tertiary location and a third copy.
  2. Documents that are static that have a medium level of need will be added to the external hard-drive and will be added of the list of files that will be checked for fixity.

File Fixity and Data Integrity

  1. To make the documents outside of Re:Discovery more secure, yearly checks of file sizes will determine if a file has become corrupt. If the file size has changed, the file should be replaced with a copy either from Re:Discovery or the external hard-drive.
  2. The editing history for each file (date, who edited it and the new file size) should be recorded in a spreadsheet to not mistakenly report that there is a data issue. This editing history applies only to the documents that have a high level ofrisk. This is implemented to track file changes to ensure that the files are the same files that were originally saved.

Information Security:

The sites and national headquarters should create password protected folders for documents that have editing restrictions within the shared drive based on the editing permissions created in the Minimal Plan. This will protect the files from employees at other sites that should not have access to the other sites’ information. Folders for individual sites should only be accessible by that site’s managers and headquarters’ staff. As staff leave, these restrictions need to migrated to another staff member before the previous staff person leaves and then migrated again when a new person is hired. These steps should be included in any of the workflows for bringing in new employees or when other employees leave.

File Formats:

The sites and the national headquarters should create and accept files in .tiff (still image) and .pdf/a (document) and migrate files that are .png or .pdf to the other standards. These file types are widely used and are more stable than a .png or .pdf. The suggested migration ensures that there are not multiple file types existing on the drives that can make preservation more difficult because there is no one standard file.

Aggressive Plan

Goal: The goal of this plan is to preserve the files at a much higher technical level and to standardize legacy file names.

Audience: Suggested for Headquarters, advised for the sites. The File Management and Storage suggestions are of the highest priority.

File Management

  1. Legacy Files (files created before the file management system was created) will be re-named according to the created file management process. This will make the files more easily organized and recognized within the shared and personal drives.

  2. File Management audits will happen once a quarter so that there is not a large backlog of files that are not correctly managed.

Storage and Geographic Location

  1. A fourth copy of the high-priority material will be copied to another external hard-drive and swapped with another site. Sites should swap their hard-drives yearly and Headquarters swaps their hard-drive a site. A much higher level of storage preservation is to have a copy of high-risk files in another geographic location in case of a natural disaster. An easy way of accomplishing this for the individual sites is to pair them during the Education and Training phase and have the two sites swap hard drives so that they both have data in a separate geographic location.

  2. The medium level need documents will be added to Re:Discovery Proficio. Proficio gives the files a tertiary location and another copy, will also providing data integrity with little staff time needed.

  3. Static low-level documents will be added to the external hard-drive. These files may not have as much long-term value as the high-risk documents, but they still may be institutionally relevant and should be preserved.

Fixity and Data Integrity

  1. Fixity of high priority files will be monitored using AVP’s Fixity tool. AVP’s tool monitors data integrity at a much higher level than just looking at the file size. A file might become corrupted but the file size may not change and staff would not realize that there was something wrong. Fixity can determine whether each byte of the original file is still present and therefore if any of the data has been compromised.

  2. Fixity of medium priority documents will be added by monitoring the file size of the documents. Monitoring the file size will help staff catch if data has accidentally been deleted or changed in the file.

Information Security

Create a detailed log of changes made to each file in the file fixity spreadsheet. This will help distinguish between intentional changes to files and changes that were the result of data loss. For example, if a file size changes because a paragraph was deleted because a restriction is no longer relevant, the change in the file size could make staff believe that data was accidentally deleted.

Among these recommendations, those under File Management and Storage and Geographic Location will provide much more file security and should be implemented before the suggestions for Fixity or Information Security.

One Reply to “National Trust for Historic Preservation: Next Steps in Digital Preservation”

  1. Great work with a rather complicated organizational structure and set up!

    Starting with further exploration of user needs between the central office and the partner sites is a great idea. The complexity of this context and the various roles and people involved really warrants further exploration to make sure that the Trust knows what it’s related organizations want and would use.

    I think the way you have set all of this up nicely covers both the work they would like to do for the partner sites and the work they need to do to get the central office content better organized. It may also be worth noting that they could approach the internal work on their files in parallel with exploring the potential services they might offer the partner sites.

    The overall approach to the minimal, moderate, and aggressive plans for ensuring long term access to digital content is great. Each of these plans has some really good detail in them too. It is helpful that you have these spelled out in such a step-by-step way. With that noted, I could see how this could be potentially overwhelming to some readers, so it will just be good to confirm with your org that they follow and get the recommendations and trade offs between each of these sets of suggestions.

    Again, overall great work. This is a particularly complex case and you’ve put together a well thought out set of recommendations to address the range of issues you identified in your survey.

Leave a Reply

Your email address will not be published. Required fields are marked *