In a previous post, I gave an overview of the contents of the College Park Aviation Museum‘s (CPAM) digital collections and the staff’s current practices for managing and maintaining those collections. Based on the information gathered from my interview with the Curator of Collections, Laura Baker, this blog post will offer recommendations for the next steps the museum might take to improve their digital preservation strategies. These recommendations are based on the National Digital Stewardship Alliance (NDSA) Levels of Digital Preservation, which provides succinct, clearly stated guidelines across five areas of concern for digital collections–storage and geographic location, file fixity and data integrity, information security, metadata, and file formats.
For each category, the NDSA Levels provide four levels of progressively more advanced suggestions for digital preservation. This is helpful for an institution like CPAM, which is in the early stages of developing a digital preservation plan, because it allows for incremental change and takes into account the fact that many institutions have only limited staff and resources with which to begin addressing the needs of their digital collections. Dr. Owens, our professor and one of the authors of the NDSA Levels, emphasized that just getting an institution to level one is a significant accomplishment, and so we shouldn’t think of an institution as only succeeding in digital preservation if they are at level three or four. With this in mind, I am recommending that the CPAM try to achieve level one in all five categories, while also offering suggestions for moving farther along the levels should they decide now or at some point in the future to adopt a more ambitious plan.
Storage and Geographic Location
One of the basic principles of digital curation is to keep multiple copies of all digital files to protect against bit rot, system failures, or regional threats (like natural or man made disasters). Anyone who has ever lost files after a computer or hard drive has failed will understand why this step is so important. CPAM relies on their common drive to store a large portion of their digital files. As a first step, CPAM should consult with their parent organization, the Maryland-National Capital Parks and Planning Commission’s (MNCPPC) Office of the Chief Information Officer (OCIO) to determine what measures are already in place to back up the contents of their common drive and how often these backups are performed. To satisfy level one of the NDSA levels, CPAM should ensure that there are at least two complete copies of the contents of the common drive and that these copies are not located in the same place.
CPAM should also aim to get the images, videos, or other data that now exists in various formats into their common drive. This is a crucial step because formats like CDs, DVDs, USB drives, and VHS tapes are not made to last forever and can become damaged or lost. Migrating the media on these formats to a storage system will enable them to be stored and backed up in a way that will improve their longevity. It will also help to unify the collection in one place to simplify its care and management. Laura expressed concerns during our interview about running out of space on the common drive, so it may be necessary to ask MNCPPC for more space. If necessary, the museum could continue to use the common drive for administrative files but use another storage system like Dropbox (which starts at $12.50 a month) for media files like photographs, videos, or scans.
The two steps listed above would meet the minimum threshold for NDSA level one, but by putting more of its collections online, which is already a stated goal of CPAM, the institution could reach level two or three. The most economical way to do this is through a partnership with Digital Maryland, which will collect materials from the museum, digitize them if they are not already digital, and host the scanned images on their site. CPAM should ensure that they get copies of all digital files so that Digital Maryland and CPAM can serve as safeguards for one another in case a back-up of a file is needed. The partnership with Digital Maryland can thus serve as a digital preservation strategy while also producing more online content.
If the museum wants to explore other options, it might also look into uploading some of its digital content to the Internet Archive or Wikimedia Commons, both of which would host the material for free. There would be more labor involved with uploading the files (and digitizing them if they are not already digitized), but these sites have the advantage of reaching a wider audience. CPAM might look to the Internet Archive’s American libraries collections or Wikimedia Commons collections to see options of how their material might be presented on these sites. The caveat to uploading more content online is that these files need to be in the public domain or the museum needs to clear any potential copyright issues.
Additional steps needed to complete level two and three would be to document storage system(s) and media formats and what is needed to use them (level two), and establishing a process for monitoring the obsolescence of the storage system(s) and media formats (level three).
File Fixity and Data Integrity
The concepts of “file fixity” and “data integrity” are not particularly well-known outside of the IT sector, but what they mean in layman’s terms is making sure that a digital file has not been altered or corrupted. In other words, is the institution preserving the file they intended to preserve? For a small institution like CPAM, which does not have a digital preservation specialist, it may initially seem challenging to know how to get started, especially when jargony terms like “check sums,” “cryptographic hash function values,” or “digital signatures,” are bandied about. Fortunately, there are free tools that exist for automated monitoring and reporting on data integrity.
One service that Dr. Owens recommended is AVP’s Fixity. According to the website, “Fixity scans a folder or directory and creates a manifest of the files, including their paths and their checksums, against which a regular comparative analysis can be run. Fixity monitors file integrity through the generation and validation of check sums, and file attendance through monitoring and reporting on new, missing moved, and renamed files.” The user can use the tool as needed or chose to schedule these tasks daily, weekly, or monthly, setting the specific day and time that they automatically occur.
Since CPAM is new to this type of preservation work, I would recommend downloading Fixity, exploring the settings and experimenting with how it works, and then adopting a plan that seems realistic given their goals and priorities. To reach level one, CPAM would need to check file fixity when new content is added to their storage system if that information is available, and it should create the file fixity information if it doesn’t already exist. To reach level two, CPAM would need to check fixity on all ingests, use write-blockers on original media, and virus-check high risk content. Level three requires fixity checks at regular intervals, maintaining logs of fixity information, detecting corrupt data, and virus-checking all content. If these steps sound too demanding, Dr. Owens has said that doing a fixity check once a year is better than doing nothing at all.
To reach level one on information security, CPAM should identify who has the ability to read, write, move, and delete individual files, and restrict those authorizations when appropriate. To illustrate why these steps are necessary, consider what might happen if an employee resized an image file to make it fit on the website or in an email newsletter but accidentally overrides the original file. Or maybe a volunteer saves a file with a file with a generic name like “Oral History,” and accidentally erases an older file that was on the common drive with the same name. Limiting the number of people who can perform these types of tasks with the digital collections will mitigate the possibility of accidents like these occurring. From my interview with Laura, I know that there is some hesitation to adopt file restrictions because there were past incidences in which files were restricted and were no longer accessible when an employee left the museum. These potential problems could be largely avoided, however, by documenting access restrictions (level two) and speaking with the MNCPPC’s OCIO to discuss how to override access restrictions associated with former employees.
A more rigorous approach would involve maintaining logs of who performed what actions on files (level three) and performing audits of these logs (level four).
One of the challenges encountered in my partnership with CPAM is that there is no master inventory of CPAM’s digital collections. The museum has inventories of material digitized on a project-by-project basis, but because they are not unified in one place, finding the inventories may involve asking the person(s) in charge of the project to locate them, and it seems like some materials may have “fallen through the cracks” and may not be inventoried at all. My recommended first step is to track down as many inventories as can be located and put them into one folder on the common drive. Then create a list of material that still needs to be inventoried and put that task on a to-do list. The next step would be to see if these multiple inventories could become one master list, either through copying and pasting data into an excel workbook or logging material into past perfect. If this is not practicable, then maybe a “Guide to Digital Collections” could be created that would explain where to find each inventory and a summary of what each inventory contains.
Going to NDSA level two would require creating administrative metadata for digital files (such as when and how it was created, and who can access it) and transformative metadata (logging any changes to the file). Level three requires storing standard technical and descriptive metadata about the digital files.
When creating new digital files, it seems that the museum is inclined to use standard, popular formats which increases the likelihood that the museum will still have access to these files even if the formats becomes obsolete. If a popular format like PDFs become obsolete in the future, for example, IT specialists will have to invent tools for accessing and migrating these files because they are so common.
NDSA level one requires that the museum give input whenever possible into the creation of digital files to encourage the use of preferred formats. The Smithsonian has a policy on digital formats that includes a useful table that the museum might want to use as a model. To achieve level two, CPAM should inventory the file formats currently in the musuem’s collections, and to reach level three, it should monitor file format obsolescence issues.
To conclude, I’d like to acknowledge that these suggestions may initially seem overwhelming for a museum with limited staff and resources, but that the museum should keep in mind that they are not expected to implement all these steps at once. While it would be ideal for the museum to reach level one in all five categories, even moving to level one in a few of the five categories would be better than doing nothing at all, and steps corresponding to levels two or three can be viewed as “stretch goals” for the future. Overall, these suggestions should be seen as flexible guidelines, and perhaps also as ideas for future internships or grant applications so that the museum’s current resources do not get overtaxed.
One Reply to “Next Steps in Digital Preservation for the College Park Aviation Museum”
Great work! I appreciate the care you put into helping to clarify the value that can come from moving up the steps on even a few of these points. You’re right to recognize that all of this can be a bit overwhelming and the most important thing is that an organization starts to work to mitigate the potential most pressing risks of loss.
On storage, your suggestion for them to start by connecting with their parent organization is great. There is a good chance that they can get what they need from them, or at least get a better understanding of what the parent organization can offer them for support.
Your suggestions on establishing a master inventory for their digital collections are great. It’s really important for them to develop and maintain that global view of their content.
Again, overall great work. Looking forward to seeing your digital preservation policy draft.