No One Size Fits All, But Some Guiding Principles

Apologies for my late blogging!

The four policies I chose to review are the Dartmouth College Library’s Digital Preservation Policy, the Public Record Office of Northern Ireland (PRONI)’s Digital Preservation StrategyIllinois Digital Environment for Access to Learning and Scholarship (IDEALS)’s Digital Preservation Policy, and Rhizome at the New Museum’s Digital Preservation Practices and the Rhizome Artbase. Although these policies come from a range of archives/institutional repositories, libraries, and museums, there were definitely a lot of commonalities.

Digital preservation policies should align with institutional collection policies and mission – identify the types of items most important to the institution. These policies need to establish what types of material the institution will collect and preserve, but also be flexible enough to for appropriate on-the-ground decision-making between policy review periods. Stanford’s web archiving policy focuses on at-risk content while also making sure that the policy supports other collection policies and strengths, and prioritizes what is likely to be useful to Stanford’s researcher base.

So, a good digital preservation policy would establish what the institution’s mission or responsibility for digital collections is (such as in PRONI’s policy, which reads an original remit about paper-based collections to also apply to born-digital and digitized material); explain challenges to preservation and/or risks (such as Dartmouth and Rhizome’s policies); define audiences/users for digital materials being selected and preserved; establish collecting and preserving priorities (all five policies I looked at do this); delineate principles behind preservation (Dartmouth’s policy explains life cycle management and lists several resources they have access to, like Portico, LOCKSS, and HathiTrust); and sets a regular schedule or deadline for the policy to be reviewed.

There were some interesting differences I noted in these policy documents as well: IDEALS’s policy does not list specific formats, PRONI’s mentions that they have a list of accepted file formats but that is not included in the policy, and Dartmouth’s lists their preferred formats within the policy itself. Rhizome’s policy reads more like a whitepaper than a policy document, and goes into more depth on the multiple different directions future actions could take.

Discussion Questions

    1. Stanford’s web archiving policy is for an institution with high staffing levels and adequate funding. While all of the major points still apply to smaller institutions, how do you scale this type of robust, well-defined collections policy to understaffed or all volunteer-run organizations, such as the ones many of us in class are working with?
    2. Rimkus, Padilla, Popp, & Martin’s analysis of file format policies across ARL institutions brought up that repository managers place more trust in file formats that originate from library reformatting programs. Is some of this built-in trust because many librarians come from humanities backgrounds? Could increasing diversity in library staff’s backgrounds (i.e. more people with media production, art, design, or programming backgrounds) change the level of confidence repository managers and policy creators have in other formats?

Next Steps for the Little Compton Historical Society

The Little Compton Historical Society is a small organization dedicated to preserving the history and cultural heritage of Little Compton, Rhode Island. Although they face the common problem of limited resources that many organizations of their size do, the LCHS currently has several systems in place that will help them to reach greater success in preserving their digital holdings. Expanding on those established resources, this report will provide guidance on how LCHS staff can improve and ensure the prolonged safety of their digital collections. The recommendations are based on the National Digital Stewardship Alliance (NDSA) Levels of Digital Preservation, a set of easy-to-use guidelines used to assess an institution’s current preservation status based on 5 areas: storage and geographic location, file fixity and data integrity, information security, metadata, and file formats.

First Steps (Minimum)

1. Copy any digital files held on external media to a stable location.

The first and most important task that the LCHS must address is the storage of digital materials on external media such as laptops, CD-Rs, and USB flash drives. Whether currently in use or not, these files should be copied to a stable storage system as soon as possible to avoid the possibility of data loss. As these types of media age, the files held within them become more difficult to access and preserve. Unused laptops become slow as the programs on them are not updated and – if the files on them are not fully backed up – they are at considerable risk. USB flash sticks and drives are easily corrupted and nearly impossible to repair, making them best suited for short-term storage. While commercially produced CD-ROMS have been known to have a lifespan of 30 years, CD-Rs that have had material recorded or “burned” onto them have a drastically different longevity, with some experts estimating that they last only 5 years. Further, as technologies rapidly develop there is a chance that CDs will require specialized equipment to access in the future, as we see with floppy disk drives today.

All of the files that are currently stored on these types of media should be transferred to a stable location on the main server. Even files that are not currently being used, such as the oral history collection, must be copied to ensure their long-term preservation. When copying files from external media, be sure to record any important information that is stored along with it so that nothing vital is lost in the transfer.

2. Create a full backup of all digital files on a physical hard drive.

Once all digital files have been moved to a stable location, it is highly recommended to do a complete backup of all digital holdings. By consolidating all files that are currently held on external media into a centralized, stable location, the LCHS will be assured that current and future backups are protecting everything. One way to ensure that vital information is kept safe is to create a backup on an external hard drive, then store it in a secure location either within or outside of the organization. This could mean Marjory storing it in a locked cabinet at the LCHS, or Fred taking it home with him. The external drive should then be updated monthly or quarterly, as resources allow.

3. Create a complete inventory of all digital holdings.

Now that all relevant digital files have been compiled and backed up, the LCHS would benefit greatly from a comprehensive inventory of all digital holdings. Since many of the holdings are already stored in PastPerfect, a great start is to use the tools provided by them. PastPerfect has an optional Inventory Manager upgrade that allows users to “create inventory lists, print barcode labels, track collections electronically, and ensure accurate records.”

The digital files not currently held in PastPerfect also need to be included in the inventory. A simple excel sheet can get the job done – and it is often useful to mirror the digital file names and organization on how the physical items are already organized. Compiling all of the digital holdings into one master inventory will help to combat the problem of duplicate files that currently appear in multiple locations. In the future, care should be taken to follow set standards for adding new digital files to the inventory.

An added benefit of an inventory is that have all of the information on digital holdings in one secure place will allow the LCHS to take a hard look at its holdings, reassessing which files are the most vital and whose loss would be the most detrimental to the organization. As historical societies are often stretched for time and resources, the preservation of those files can then be prioritized above others to ensure their continued safety.

Further Steps: Moderate to Aggressive

At this point, the LCHS should have at least 1 full, complete copy of its digital holdings stored on a physical hard drive. To meet the highest NDSA standards for storage, the LCHS could create additional physical copies of backups and do a “buddy swap” with organizations in other states. This can also be accomplished with an offsite backup service, like the one currently used – Backblaze. DropBox can also be used for a 2nd cloud backup, although this would require additional funding.

4. Run a full backup with Backblaze.

The LCHS currently uses Backblaze to routinely back up all files. There are several benefits to their service, most notably that backups are conducted automatically and don’t require constant oversight, and storage space is unlimited. Since this backup stores everything offsite, using Backblaze also boosts the LCHS to Level 2 on the NDSA levels: having at least one copy in a different geographic location. While not much more oversight is needed, Backblaze recommends checking in once a week to ensure that backups are running as scheduled.

5. Establish the fixity of digital files.

Fixity is “the property of a digital file or object being fixed or unchanged.” In other words, checking fixity means making sure that your files haven’t changed without you meaning them to. While there are technologies and programs that exist to maintain fixity at a higher level (see AVP’s fixity tool for an example), given the limited resources at the LCHS, this can be accomplished much more simply. Once all digital files have been consolidated and organized, have a volunteer record how many files are in each folder and the folder sizes. Once every quarter, delegate someone to do a quick check to make sure that all of the folder sizes and file counts are the same as they were originally. If there are any changes, it is clear that something has been added, deleted, or altered. If this was unintentional, the files can be restored using one of the backups. This is a simple way to quickly check that your digital files have not been tampered with, whether intentionally or not.

6. Create set standards for file and folder names.

It is vital to the continued organization and maintenance of digital files that the LCHS maintain set language and standards for file and folder names. The LCHS receives many donations of materials that often end up residing where they are originally downloaded, rather than fully incorporated into the collections. Developing a set process for these donated materials – and the particular aspects of what that process will look like – largely depends on the time available to Marjory or whomever is available to make sure the process is completed. When a new donation of files is received, resist the urge to leave them on the desktop. This can be as simple as having a folder titled “Donated Materials” on the network drive, with files labeled by donor name and the date of the donation. The most important thing is to establish consistency, a system that is easy to maintain but with a structure that is easily understood within the context of the larger collections. Once it is established and written down, the task of actually moving those materials to a permanent location can be delegated to a volunteer or docent.

For files stored in PastPerfect, there are tools available to maintain this naming consistency. If authority files have not yet been set, this is a good place to start. Double-check that the authority files in PastPerfect accurately reflect how the files will be organized and entered in the future. This should also extend to the files that are not currently in PastPerfect but stored on the LCHS server; continuity between systems is key. Separating the core files containing the digital collections from current, ongoing projects will help to ensure that nothing vital is altered and can be further protected in the next step.

7. Further restrict access to computers and digital files.

Discussion of naming standards and collections organization also speaks to the NDSA category regarding Information Security. Since many different people collaborate on projects at the LCHS, it is difficult to fully oversee the access levels of every single file. The LCHS has already begun to address the problem of information security by restricting docents’ access to the servers where important information is stored. LCHS staff is encouraged to continue this trend by putting restrictions on individual folders that do not need to be accessed by volunteers, restricting their ability to accidentally delete or alter files they do not need to work with. As a next step, documenting these distinct levels of access granted to each user will bring the LCHS up to Level 2 on the NDSA chart.

Summary

In measuring the current digital holdings of the LCHS against the NDSA Levels of Digital Preservation, the LCHS is below the threshold for minimum preservation standards in several areas, most notably storage, file fixity, and metadata. However, with a commitment to instituting changes and some time, the status of the digital holdings can be greatly improved. Taking these actions sooner, rather than later, will decrease the risk of catastrophic data loss and help make the digital collections more user-friendly and widely accessible. Below is a quick before-and-after glance at each of the NDSA fields as related to the LCHS:

Storage and Geographic Location

Digital files are currently dispersed throughout various locations, some of which are backed up and others which are not. Materials that are currently stored on external media such as laptops, CD-Rs, and USB flash drives should immediately be moved to the network. Safely housed on the network, they will be properly backed up and no longer subject to data loss caused by environmental factors, physical degradation of media, and data corruption. The multiple levels of backups that are already in place will thus be more complete and can be expanded to include the recommendations listed in the steps above regarding physical hard drives, Backblaze, and DropBox.

Volunteer Collection Manager Fred Bridge has a solid system in place for integrating digital files into PastPerfect that can serve as a model for how other digital files are accessioned. The key to continuity of the current systems (and any new ones that are put into place) is to document these processes. Any staff or volunteer who currently handles digital files is encouraged to record their personal organization methods – even just by writing them out in a Word document – to prevent future confusion when and if they are not present to access files. Doing so will, at the very least, shed light on where and how items are currently stored, even if resources and time don’t allow for a full-scale reorganization and standardization of all collections.

File Fixity and Data Integrity

There are no current systems in place to monitor the fixity of digital files, but this is easily corrected. As explained in Step 5 above, fixity means making sure that files have not changed without the organization meaning them to. Creating a document that keeps track of how many files are supposed to be in each folder and the size of them is a simple first step towards advancing in the NDSA levels. With limited staff and resources, it is not expected that the LCHS will be able to invest in highly technical fixity tools or immediately advance to a high NDSA level in this regard but making it a priority to monitor file and folder counts on a quarterly basis is a big move in the right direction.

Information Security

Recent changes to access made at the LCHS have greatly improved their status in this area. As a next step, the LCHS should further restrict individual files and folders from non-staff or project leads. As new volunteers or docents are granted computer access, make sure that they are only granted access to the materials they need; it is always better to grant access to smaller sets of files than to give widespread access to files that can be accidentally altered or erased. Creating and maintaining a Word document that lists who has been granted access to certain files will bring the LCHS up to Level 2 in this area.

Metadata

There are many reasons that a complete inventory of digital files will benefit the LCHS. For collections management purposes, an inventory gives staff a better handle on what they have, where there are gaps, and what they would like to collect in the future. The problem of duplicate files existing in several locations is much easier to tackle when there is a master inventory – users will know where to find certain files without having to re-download and save them in a separate location. This inventory can be started using PastPerfect’s built-in tools and expanded as needed to include all of the digital files held by the LCHS. Once the master inventory is completed, make sure that it is safely stored in multiple locations, both physically and digitally. As mentioned above, it may be helpful to separate permanent collections from current projects so that there is no conflation between the two.

File Formats

The LCHS is currently on the right track towards basic preservation when it comes to file formats. Nearly all of the digital files are in commonly used formats such as JPEG, TIFF, and PDF, making the work of preserving them all the easier. If the opportunity arrives to accept files in other formats, the LCHS should strongly encourage donors and partner organizations to continue to use only commonly-used formats that run little risk of obsolescence. If the resources arise to delegate the task to a volunteer or docent, it is also helpful to create a list of all file formats currently in the collections. This will help to identify and monitor the lifespan of the digital files since certain formats require updates, while others fade from use. Maintaining an inventory will raise the LCHS to Level 2 of the NDSA levels on file formats.

CONCLUSION

The ultimate goal of digital preservation is to ensure the long-term access of materials for users, both now and in the future. The current tools that the LCHS uses, like PastPerfect, include options for expanding access that should be fully utilized when resources allow. For example, PastPerfect includes the option to make your records appear in Google searches, if desired. While it is understandable that the LCHS would not want all of its materials to be freely accessible to the public on the internet, one way to increase traffic and use of digital collections is to make some of the materials available on a wider platform, like Wikimedia Commons or Flickr. Staff can choose which images to share, while also keeping some of them available for sale or use only with permission.

All of the steps detailed in this report are informed suggestions based on the current state of the LCHS collections and its goals for the future. While this is by no means a comprehensive preservation plan, following these guidelines will allow for greater information security and less risk for their valuable digital holdings, whether currently in use or not. The ultimate goal is to maintain these collections for the foreseeable future to the best of our knowledge and capability.

Next Steps for the Geneva Historical Society

Introduction

The digital holdings of the Geneva Historical Society, located in New York, are spread out over various websites and institutions. The Society has partnerships with the Hobart and William Smith Colleges, NewYorkHeritage.org, and the NYS Historic Newspapers website. The Society’s digital holdings can be found on these websites, their own Synology NAS storage system, and on Google Photos and Dropbox. These holdings include materials such as photographs, post cards, and microfilmed newspapers.

One of the largest problems the Society faces with their digital items stems from a lack of organization. When an employee digitizes an item, that item is saved and stored according to that employee’s own personal naming and filing system. Staff regularly do not know what other employees have scanned, leading to much duplication and difficulty in finding digitized materials.

Before writing a preservation policy for the Society, I will first review the Society’s current standing in the NDSA Levels of Digital Preservation (LoDP) and then recommend next steps that the Society can take to improve their standing. The NDSA levels can be found here.

Metadata

Although Metadata is ranked as the fourth priority on the NDSA LoDP, it is a critical concern for the Society. To address their duplication and disorganization issues, it is pertinent for the Society to create an inventory of their digital holdings. This inventory can be created on a Microsoft Excel spreadsheet and should have headers for information such as directory location, file format, and file size. Resolution would also be a beneficial header so that, in the future, staff members can know if the digitized image fits the proper resolution that they are looking for. There should be a backup of this inventory in case the original file is accidentally deleted or corrupted. Simple ways to backup the inventory include using Dropbox or Google Sheets.

Other than just creating the inventory, the Society would also benefit from standardizing the documentation of digital materials. This standardization should include a file naming policy so that items are always named in predictable and understandable ways, increasing findability. While creating the inventory, files should be renamed in order to fit the standard. Another way to standardize the documentation of the digital materials is to restructure the directories so that everyone may know the locations of items, instead of them being restricted to a single employee’s desktop. One way to structure the directories can be to base them on the physical locations of the materials. This way, if an employee knows where to find the physical object, they can use that same process to find the digitized version. Thus, the inventorying process would also involve relocating digital objects to fit into the new directory structures.

So that every employee will know how to name and locate items, the Society should create a “file plan” document. The purpose of this document will be to explain the standards for naming and storing items and why the rules exist. For future digitization, staff members should record file information in the inventory right after scanning.

Creating the inventory will be a high resource requirement. Due to the amount of time that creating the inventory will take, the Society can use community volunteers and dedicate staff hours to the project. The Society can also have a “file clean-up” day or time every so often so that every staff member can devote time to inventorying and cleaning up the collection.

Storage and Geographic Location

The Society currently uses a Synology NAS storage system. To satisfy the storage requirements of the NDSA LoDP, an organization needs to have multiple copies of their collection in different geographic locations. To meet the level 1 requirement of LoDP, the Society needs to create a copy of their digital collection. The Society could use an application such as Preservica to store their digital materials. To reach the second level, the Society will need to document the storage systems they are using and what they need to use them (such as access information). The Society will also need to create and find a storage location for a third copy of their digital collection. This third copy can potentially be given to one of the Society’s partners.

Fulfilling these steps will demand a medium resource requirement from the Society. Improving the Society’s standing in this area could cost both time and money (should the Society decide to use an application like Preservica).

File Fixity and Data Integrity

File fixity refers to the digital object remaining as intended through time and transfer, meaning that details such as content and file size do not change. The purpose of fixity checks are to ensure the digital objects have not changed. There are no fixity checks currently being done at the Society. To reach level 1 of LoDP, the Society needs to create fixity information upon creating and acquiring a new digital object. To do this, I recommend that the Society record the file size and file count. This information can be recorded in the inventory. To reach level 2 of LoDP, the Society needs to check file fixity for all acquired digital materials. The Society can do this using AVP’s free application Fixity. I would also recommend that the Society practice annual fixity checks to make sure none of their content has corrupted.

For materials already in the Society’s collection, the Society can record their file size and count while creating the inventory for future reference. Working on file fixity will be a low resource requirement for the Society. Documenting fixity information for existing files can be done while working on the inventory and the procedure for adding fixity information for new digital objects can be added to the file plan. Checking file fixity can be done using free applications, so there does not need to be a financial cost to the Society.

Information Security

Some information security is implemented at the Society in the Photo Archive. Although the public may use the computer, they cannot access all the files that staff can. These access restrictions can be further implemented to protect the digital collection. To reach level 1 of LoDP’s information security section, the Society needs to identify who has read, write, move, and delete authorization to individual files. This should be done after the inventory has been created and existing materials have been renamed and relocated. The benefit of having these restrictions is that unauthorized people will not be able to tamper with or misplace files, maintaining the order and efficiency of the organization. The Society should also document who has the authorizations and access restrictions for the content.

Implementing this level of information security will be a medium resource requirement for the Society. The Society will have to decide and document who will be able to access, alter, move, and delete which collection materials, which may require a lot of time.

File Formats

Although the employees at the Society already know that they use formats such as pdf, tiff, and jpg, it is still important to document all of the file formats in use. This can be done once the inventory is created. After documenting the formats for each item in the collection, the Society will be able to see each the formats being used. The Society should limit their formats to “known open formats,” meaning formats that are widely used and non-propriety. This lessens the risk of the format becoming obsolete. To reach levels 3 and 4 of LoDP, the Society will need to monitor these file formats to see if they are becoming obsolete and migrate at-risk formats when needed.

Working on the file format section of LoDP will be a low resource requirement from the Society, as it should be relatively easy to do once the inventory has been created.

Conclusion

The most pressing need for the Society is the creation of an inventory. Once the inventory is created, the Society will be able to advance in the other areas of LoDP with lower resource requirements.

Greenbelt Museum’s next steps

Greenbelt Museum and its digital content
The Greenbelt Museum is a community museum that focuses on the “New Deal history and living legacy of Greenbelt, Maryland” (“Greenbelt,” n.d.). The Museum provides tours of an original Greenbelt home, walking tours of the community, rotating exhibits, and educational programming. Their collection scope includes items that were made and/or used in the town; associated with a resident, location, or event in Greenbelt; and that originated or were used from 1936-1952 (“Collections,” n.d.). In total there are approximately 2,000 artifacts in the Museum’s collection, with an estimated 50% already digitized. This total does not include the Museum’s archives, comprised of textual records, maps, and photos.

Their digital holdings consist primarily of image files in TIFF format, but also include recordings and transcriptions of oral histories. More recent oral histories are in MP3 format; however older recordings are stored on cassette tapes and have not been digitized. The Museum’s primary collection of oral histories, taken in 1987, were transcribed and scanned into PDF format. These scans were done a decade ago, and the Museum’s Director/Curator, Megan Searing Young, has indicated that they likely need to be rescanned. Finally, there are not corresponding analog copies of every digital object in the Museum’s possession.

Next steps for digital preservation
The following recommendations were crafted using the National Digital Stewardship Alliance’s (NDSA) Levels of Digital Preservation as a guideline. These levels cover five areas crucial to digital preservation practices: storage and geographic location, file fixity and data integrity, information security, metadata, and file formats (Bailey, J. et al., n.d.).

Storage and geographic location
As described in the survey report about Greenbelt Museum, there are two copies of the Museum’s files in two different locations: one on Searing Young’s work computer; the second with Greenbelt city’s IT department. Increased communication between IT employees and the Museum is necessary so Searing Young and her staff will know where the second copy is kept and how it is stored. Greenbelt’s IT likely already has a data management plan in place that the Museum could benefit from.

Eventually the Museum should transfer their files to a cloud storage service in case something was to happen to the hard storage. Luckily, there are a variety of affordable options available. Cloud services like Box start at $15 per month for Business accounts, and G Suite through Google starts at $5 per month for their Basic accounts. For content that is not protected by copyright and that the Museum is willing to share freely, Wikimedia Commons allows anyone to upload and store files.

As a more secure measure, the Museum could consider swapping a backup copy of their collection with another community museum or historical society in a different geographic region. This would ensure that one copy of their files is kept in a location at risk of disaster threats than Greenbelt, MD.

While digitization is not a primary focus of digital preservation practices, the Museum would benefit from converting their oral histories stored on cassette tapes to .MP3 format. Since they already have a community page on the StoryCorps Archive website, newly digitized stories could be added to this collection. If the Museum does not have the resources available to digitize the cassettes in-house, there are a myriad of companies that provide this service at reasonable prices.

File fixity and data integrity
Fixity is a somewhat jargon-y term for stability. When an organization checks for “file fixity,” they are making sure that their files have not changed over a period of time or during a transfer (De Stefano, P. et al., 2014). There are different ways to do this, and the following steps will start small.

The Museum should first make sure that the two already existing copies are exact. This includes not just exact file count but that the types of files (.TIFF, .PDF, etc.) and the number of each file format on both copies are the same (more about this in the Metadata section). This should be checked on a regular basis and, ideally, whenever the Museum acquires new digital content. At a minimum, fixity should be monitored on an annual basis by Searing Young in cooperation with Greenbelt IT staff. The same would need to be done for cloud storage if the Museum adopted that practice. A spreadsheet could be used to keep track of file counts, with one copy kept on each backup.

There is also Exactly, a free and open source fixity tool produced by AVP, a software development firm. It allows for the secure transfer of digital content from sender to recipient so organizations can authenticate the integrity of their files. The AVP website also offers user guides for those new to using Exactly. This tool could be quite useful for the Museum, especially when receiving born digital content from donors or volunteers, like oral histories.

Metadata
Since metadata is not consistent across file folders, the Museum should first decide on a metadata standard that will be applied evenly to each file. Inventory will need to be done of both the copy on Searing Young’s computer and the copy held by Greenbelt IT.  Metadata can either be documented in the file names (which the Museum currently does) or on spreadsheets for each folder.

Greenbelt Museum does not use PastPerfect to manage their digital archival files, but the organization would benefit from a management system for this collection. Not only would it be easier to apply metadata consistently, but it would allow for more secure data storage and workflow. This could be done either by incorporating the archival files into the PastPerfect catalog, or using different software, such as Preservica or Arkivum.

Information security and file formats
Currently, Searing Young is the only employee with access to the Museum’s digital archival files, so she is presumably the only person who is authorized to edit, move, or delete records. If digital preservation becomes a more central focus of the Museum’s work however, they will need to identify who has permission to access these files and maintain a log of all changes done to the repository.  

Additionally, the Museum should provide guidance to staff, volunteers, and donors on what file formats they prefer (i.e., .TIFF instead of .JPG images) when accepting digital content. An inventory should also be kept of the kind of file formats used so the organization can easily assess if they are using an outdated format.

Conclusion
Finally, the Museum has a Collections Policy that has not been updated since 2006, before Searing Young joined the organization. While the policy is quite comprehensive and could provide guidance, it should be revised to meet the current collection goals of the Museum and to ensure that day to day activities align with policy.

Much can be done to improve the digital preservation strategies of Greenbelt Museum, and it will be best achieved in small, incremental steps. Once inventory is done of the organization’s digital holdings, staff can determine where they should direct their focus; what areas might need more attention than others; and which files they value most.

References
Bailey, J. et al. (n.d.). Levels of digital preservation. Retrieved from https://ndsa.org/activities/levels-of-digital-preservation/

“Collections management policy and manual.” (n.d.). Unpublished internal document, Greenbelt Museum.

De Stefano, P. et al. (2014). Checking your digital content. Retrieved from http://hdl.loc.gov/loc.gdc/lcpub.2013655117.1

“Greenbelt Museum mission statement.” (n.d.). Unpublished internal document, Greenbelt Museum.

Baltimore Community Museum-Next Steps

Introduction

The Baltimore Community Museum documents the history of the small town of Baltimore, Ohio and the surrounding areas. The museum’s collections include documents from the township, papers of prominent citizens, photographs, and historical artifacts. The museum’s collections are extensive for a town with a small population and staff are currently working to gain better control over the museum’s holdings through an inventory project. The director of the museum and her interns are scanning noteworthy items as they come across them while working on this inventory. The Baltimore Community Museum is primarily scanning documents and photos that are damaged and items that are of great importance to the history of the community. Staff estimate that they have created about 600 files thus far. Issues of a local newspaper, the Twin City News, have also been previously digitized. Staff would like to ensure that they don’t lose access to this valuable content and hope to be able to make the scanned items available on the museum’s website in the future. The museum is particularly interested in developing resources for genealogists.

Current Practices

The National Digital Stewardship Alliance Levels of Digital Preservation provide tiered recommendations for organizations interested in preserving digital content. The NDSA divides its recommendations into five categories and suggests four levels of action for each category (Philips, Bailey, Goethals, & Owens, 2013). By analyzing the Baltimore Community Museum’s current practices and comparing them to the suggested practices in the NDSA levels, we can identify opportunities for future growth. The Baltimore Community Museum is approaching Level 1 for the Storage and Geographic Location category. The museum uses Google Cloud for photographs and Dropbox for documents, but the materials do not overlap. Staff are not sure how to conduct fixity checks on the files they are creating, so the museum does not currently meet the requirements for Level 1 of the File Fixity and Data Integrity category. The museum director and her interns are the only people who can read, modify, and delete files. While members of the community are able to come in and view the files on the staff laptop, this does not happen frequently. Staff supervise visitors who are using the laptop, so it is unlikely that visitors would accidentally delete or change files. Therefore, the Baltimore Community Museum meets the requirements for Level 1 of the Information Security category. While staff generally know what has been scanned thus far, there is no formal inventory of the digital content, so the museum is not quite at Level 1 for the Metadata category. The museum typically creates JPEG files and has also used PDFs in the past. Because this is a limited set of formats, the museum has reached Level 1 in the File Formats category. Fortunately, there are some simple steps that the museum can follow to achieve higher NSDA levels. In the following sections of the plan, I will describe a range of actions that staff could take to better manage the Baltimore Community Museum’s digital content.

Beginner Plan

The National Digital Stewardship Alliance Levels of Preservation stress that organizations should keep multiple copies of their content to prevent loss due to bit rot or storage system failure. Files should also be stored in multiple geographic locations to protect against a natural or man-made disaster in a particular region. Google Cloud and Dropbox are both solid options for the Baltimore Community Museum for the time being. These cloud storage providers can sync files to the museum’s staff laptop. It is also likely that these systems are storing files in a location outside of the Ohio region, though this process is not entirely transparent. Currently, the museum saves some files in Google Cloud and others in Dropbox because the free version of Dropbox has limited storage. One option might be to combine all of the files in Google Cloud in order to simplify tasks like fixity checks and completing inventories.

It will be essential for the Baltimore Community Museum to begin checking the fixity of its files. Fixity refers to “the property of a digital file or object being fixed or unchanged” (National Digital Stewardship Alliance, 2014, p. 1). Fixity can also be thought of as a “digital fingerprint” that serves as evidence that the museum’s files are the same as they were before (Owens, 2018, p. 60). The simplest way to monitor fixity is to keep track of the number of files being created and their expected file size. If the file size or file count unexpectedly changes, this can be a sign that there is a problem.  In general, it is a good idea to check fixity information when the content is first created and before and after it is transferred to a different storage system. (National Digital Stewardship Alliance, 2014, p. 3-5). If museum staff do not have time to produce a full inventory of the digital content with detailed metadata, keeping track of the file size and file count numbers will still be better than nothing. One way to simplify this process would be to schedule a regular time each month to update the inventory and check fixity information. This would be a quick way to meet the Level 1 requirements in the File Fixity and Data Integrity category.

Currently, access to the Baltimore Community Museum’s digital content is restricted to the museum’s director and interns. The museum is also using a limited set of common file formats. Because these are stronger areas for the organization, staff may not need to make as many changes to improve their practices. The museum should consider creating formal documentation that describes the current access restrictions to reach Level 2 in the Information Security category of the NDSA levels. When interns leave the organization, any shared passwords should be changed. It would also be useful to maintain a log that employees update whenever they delete or move files. This would help the museum to meet the Level 3 requirements for Information Security. To reach Level 4 in this area, the museum’s director can perform audits of the security logs.

The museum’s director should also keep track of the file formats that the museum uses and encourage new staff members to continue creating JPEGs when scanning items from the collections. Normally, an organization would need to create an inventory of all the file formats that are in use to reach Level 2 in the File Formats area. Right now, the museum mainly uses JPEGs and PDFs, but they could create an inventory in the future if they start using a larger set of formats. To arrive at Levels 3 and 4, organizations need to monitor file formats for obsolescence and should be prepared to migrate or emulate files. However, since JPEGs and PDFs are so commonly used, these additional steps might not necessary for the Baltimore Community Museum. The staff should use their limited time to strengthen practices in other areas.

Intermediate Plan

Some National Digital Stewardship Alliance members have expressed concern that cloud storage systems do not allow organizations to maintain full control over their content (Altman et al., 2013). In addition to storing files in a service like Google Cloud, the Baltimore Community Museum may want to create an additional copy of its files that is not housed in a third-party system. The museum’s files will likely fit on a USB drive or external hard drive. This additional copy could be kept in a place that is only accessible to the museum’s director for further security. The museum could also consider finding a “backup buddy” in another part of the country. This would involve trading external drives with another institution to minimize the risk of losing all of the museum’s copies in a regional disaster. No matter how the organization chooses to proceed, it will be important to document the storage system and to ensure that staff members know how to access all of the various copies. If the museum is able to maintain access to three complete copies in different geographic locations, they will satisfy the Level 2 requirement for the Storage and Geographic Location category.

In addition to keeping track of the expected file size and file count, the Baltimore Community Museum can also use cryptographic hashes to monitor file fixity. Cryptographic hash functions like MD-5, SHA-1, and SHA-256 are algorithms that “[take] a given set of data (like a file) and computes a sequence of characters that then serves as a fingerprint for that data” (Owens, 2018, p. 109). This may sound daunting, but it is possible to automate this process. AVP’s Fixity tool is a free service that can scan folders or directories and check for fixity issues. The museum can ask Fixity to monitor files on a monthly basis and send email reports when it detects changes to the files (AVP, 2018).

Advanced Plan

In addition to maintaining multiple copies of its files, the Baltimore Community Museum could also consider uploading items to the Internet Archive, which offers free storage and some support for digital preservation. This option would be appropriate for content that is in the public domain and would require museum staff to add some metadata to the items it uploads (Schumacher et al., 2014). The Internet Archive could facilitate public access until the museum is able to add more collections to its website. While more advanced software solutions like Preservica would provide greater functionality, they would also be more expensive. Because the museum is not creating a massive number of files, free options like Google Cloud and Dropbox should serve the organization’s needs for the time being. An option like Preservica may be worth considering if the museum greatly expands its digitization program.

One of the Baltimore Community Museum’s biggest challenges is figuring out how to organize and keep track of its digital content. Establishing regular file naming conventions could help to solve this problem. In order to improve practices in the Metadata category of the NDSA levels, organizations are supposed to store administrative, transformative, technical, descriptive, and preservation metadata. However, many of these types of metadata do not need to be created manually. For now, the museum could focus on creating an inventory of its files with the file location, fixity information, and descriptive metadata. Maintaining a log of fixity information is also one of the requirements for Level 3 of the File Fixity and Data Integrity category. If staff have time, they could begin adding individual scanned items to PastPerfect to gain better intellectual control over the museum’s digital content.

Conclusion

Some of these options are more labor-intensive than others. Even if staff only have enough time to pursue the simplest recommendations, this would still be an important step towards actively managing the Baltimore Community Museum’s digital content. Once the museum establishes basic procedures for making copies and checking file fixity, it will likely become easier to implement some of the other suggestions.

References

Altman, M., Bailey, J., Cariani, K., Gallinger, M., Mandelbaum, J., & Owens, T. 2013).  NDSA Storage Report: Reflections on National Digital Stewardship Alliance Member Approaches to Preservation Storage Technologies. D-Lib Magazine, 19(5/6). http://doi.org/10.1045/may2013-altman

AVP. (2018). Fixity User Guide Version 1.2. Retrieved from https://www.weareavp.com/wp-content/uploads/2018/07/Fixity_v1.2_UserGuide.pdf

National Digital Stewardship Alliance. (2014). What is Fixity, and When Should I be Checking It? Washington, D.C. Retrieved from  http://www.digitalpreservation.gov/documents/NDSA-Fixity-GuidanceReport-final100214.pdf

Owens, T. (2018). The Theory and Craft of Digital Preservation. Baltimore: Johns Hopkins University Press.

Phillips, M., Bailey,  J., Goethals,  A., & Owens, T. (2013). The NDSA Levels of Digital Preservation: An Explanation and Uses.  IS&T Archiving, Washington, USA. Retrieved from            http://www.digitalpreservation.gov/documents/NDSA_Levels_Archiving_2013.pdf

Schumacher, J., Thomas, L.M., VandeCreek, D., Erdman, S.,  Hancks, J., Haykal, A., …Spalenka, D. (2014). From Theory to Action: Good Enough Digital Preservation for Under-Resourced Cultural Heritage Institutions (Working Paper). Retrieved from http://commons.lib.niu.edu/handle/10843/13610