Introduction:
For clarity, the recommendations provided in this plan are based on the National Digital Stewardship Alliance’s Levels of Digital Preservation (NDSA LoDP). This plan attempts to improve WYPR’s “level” of digital preservation practices in areas of Storage and Geographic Location; File Fixity and Data Integrity; Information Security; Metadata; and File Formats. These 5 elements are represented in a table with 4 progressive levels of quality of practice. The goal of this project is to improve WYPR’s practices from Level 0/1 to Level 3 or 4.
However, the recommendations provided in this plan were made in recognition that WYPR is not a cultural heritage repository (which the NDSA Levels of Digital Preservation are intended for), and as such its needs are different. The intention of this plan is to improve current management practices to a reasonable level without overburdening producers and WYPR staff.
Structure of the Plan:
This plan will break down suggestions for next steps into the aforementioned 5 categories, which are further broken down into Short Term, Mid Term, and Long Term goals, based upon the level of effort required to implement, or the urgency of the action. At the end of each recommendation, the level of the action will be rated according to the NDSA’s levels (e.g. (Level 1-4.))
While Long Term actions may require additional effort or take longer to complete, these are still necessary to ensure sustained improvement of practice for managing digital content at WYPR.
Executive Summary of Recommendations:
In brief, the plan identifies two specific steps that would significantly improve WYPR’s digital preservation practices and effectiveness without a significant increase in workload or staff time.
First, the plan recommends investment in a third party cloud storage platform, such as Carbonite or Dropbox, to begin storing a third copy of all digital files, diversify geographic storage of files and mitigate disaster risk, and introduce basic fixity checking of all files. These services require little effort to maintain and costs of services are offered on a sliding scale.
Second, the plan recommends developing a comprehensive inventory of all files (script packs, audio recordings) in Excel spreadsheet. This inventory should include documentation of expected file size and amount, descriptive information about files (file name or content), administrative information (storage location, date created, file format, date transferred or copied), and who implemented those changes.
This is a multi use tool that provides a clear list of all production files in one easily accessible place, allows basic fixity checking, introduces documentation of how files are managed, and prevents confusion or loss of information that may occur when staff retire or leave the station. While the spreadsheet may require persistent upkeep, this is a task that may be delegated to an intern, which may be checked for accuracy periodically by WYPR staff.
Additional recommendations include:
- Updating the Producer’s Manual to provide up to date explanations of the process of recording and storing digital audio from broadcasts. This will increase efficiency and reduce staff time devoted to verbally training interns or new staff.
- Include guidance for file naming conventions, descriptive metadata, acceptable file formats, and the process of transferring files.
- Locking or otherwise restricting access to files stored on the external hard drive for security purposes.
Finally, it is recommended that WYPR staff discuss the state of the WJHU open reel magnetic tapes stored in the first floor closet. These tapes are in poor condition and are outside of the scope of WYPR, and it is recommended that they be donated to either Johns Hopkins, the American Archive of Public Broadcasting, or another similar repository for archival material.
Storage and Geographic Location:
Currently, WYPR staff maintain at least two copies of their digital files, with script packs kept both on the station’s internal server and an external hard drive, while digital audio of the shows are kept both on a series of CDs and are transferred annually to an external hard drive, while some recordings of older shows are kept on the station server. This adheres to the NDSA’s Level 1 recommendations for storage, but this could be improved as high as Level 3.
In order to achieve higher quality of care for digital content, it is recommended that WYPR begin to store at least 3 copies of all files in different types of storage media, which are then kept in different geographic locations that face different disaster threats. Additionally, a written log should be kept of all types of storage solutions utilized and how they can be accessed for standardization and efficiency purposes. Specific recommendations will be provided below:
Short Term:
- Create a list of all storage media currently utilized, and provide directions on how to access digital content stored on that device. (Level 2).
- Get at least one copy of all files stored in a different geographic location, either at someone’s home, in the cloud, or at another NPR station. (Level 2).
- Create a third copy of all script packs and digital audio content and store it on a third type of storage media. (Level 2).
- Discuss possible options for diversifying geographic risk (storing files offsite), potential partnerships with local NPR stations for storing each other’s files, consider costs.
Mid Term:
- Assemble and arrange all files pertaining to Midday, including those created by past hosts and producers, in a hierarchical series of folders on the shared drive to ensure that files relating to the show are easily accessible. Mimic this structure for files stored on other storage media (e.g. the external hard drive, or cloud storage.)
- To diversify geographic storage and disaster risk, invest in a cloud storage service (Carbonite, Drop Box, Google Drive) or an additional external hard drive. Copies of all files should be transferred to these locations periodically at a designated time (monthly, quarterly, annually). (Level 3).
- Cloud storage options are automatically geographically dispersed, have built in access controls for security of data, and have unlimited storage, but typically charge monthly and there is slightly less control over data. The typical monthly charge can be anywhere between 20 to 50 dollars a month depending on services. This would be an efficient means of “killing four birds with one stone”, as cloud storage services address concerns of storage space, disaster risk, security, and to an extent file fixity.
Long Term:
- Address issues with inconsistent or out of date directions for digital file management in the Producer’s Manual. Develop a written procedure for how to store digital audio and script packs, including a requirement to maintain 3 copies on 3 different storage media, and outline when files need to be copied or transferred to different storage media. Outline what is and what is not acceptable as a storage device (e.g. no floppy disks.) This step is crucial to ensure consistency and adherence to standardized procedures, which ultimately will streamline the management of and access to WYPR’s files. (Level 2-4).
- This process of creating procedure will be part of the next step of the author’s project.
- Begin to consider alternatives to relying upon CDs as storage method for digital audio, as these have a high failure rate beyond 7 years. External hard drives, server storage, or cloud storage are all more reliable alternatives.
- Monitor storage media for degradation or obsolescence. (Level 3).
File Fixity and Data Integrity:
WYPR does not perform or maintain any fixity checking on its digital content. Fixity checking will ensure the long term preservation and integrity of files by identifying issues that arise from transferring or copying files. Additionally, “If checks against fixity information for a set of objects begin fail at high rates, it can be an indication of media failure.” (What is Fixity, and When Should I be Checking it?, 2014, 2). This is especially important for the audio recordings stored on CDs, which is a highly failure prone storage media, to make sure that files are not lost to bit rot.
A simple response to this problem would be to create an inventory of all script packs and digital audio recordings, which includes expected file count and size, file format, when a file was created or transferred, and who took that action. This will get WYPR to at least Level 1 of the LoDP, and is less time consuming than performing individual fixity checking on each file. However, this inventory will need to be kept up-to-date, and an audit of all files should be performed at least annually to ensure that “everything’s where it’s supposed to be”.
Short Term:
- Perform a basic fixity check by creating an excel spreadsheet inventory of all files, including script packs and digital audio recordings, writing down file size of each nested folder and amount of files in each folder. Additional information documenting when files are transferred to different storage media can be kept in this spreadsheet. (Level 1).
Mid Term Goal:
- Begin using a system such as BagIt to gather all files being transferred to the external hard drive or cloud storage service, which automatically generates checksum (fixity information) for all files contained within as a simple .TXT file. (Level 1-2).
- Utilize third party cloud storage service, such as Carbonite, to run fixity checks of stored data. Most cloud storage services offer this service, though the frequency and detail of these checks varies from service to service. Ultimately, this can save time for production staff at WYPR. (Level 3).
Long Term:
- Check fixity of all files on an annual basis by referencing the file inventory to check if file size and amount are accurately depicted, or by by comparing original fixity information (check sums) to newly generated check sums. Basically, this is just to check that everything you think is there, is actually there. (Level 3).
Information Security:
Current security practices do not meet any of the requirements for the NDSA Levels of Digital Preservation, as files are stored either on a station wide server, stored in an external hard drive which is kept on a desk, or CDs which are on a shelf in an open office. These files are easily accessible to anyone in the station, and there are no formal instructions on who has read, write, move, and delete authorization, nor logs of what actions have been taken with files.
The most immediately actionable response is also the simplest: lock up external hard drives or offices when they are not in use. Additionally, the host and producers of Midday should discuss access restrictions for their content, making this known to other staff at the WYPR station. Doing both of these things will easily bring information security practices to Level 2.
Short Term:
- Lock external harddrive in a drawer or safe when not in use to prevent tampering or theft of files. Or, lock office when not at the station. (Level 1).
- Document access restrictions for content. Create a written document that states who can read, who can write, and who can delete or move files. Make this known to other staff at the station. (Level 1/2).
Mid Term:
- Maintain an excel sheet of who copied, edited, or deleted files and when, particularly during the transfer of files from one storage media to another. This record of files can also serve as a useful inventory of all digital content and can be used to perform basic fixity checking. This dual use tool is especially important. (Level 3).
Long Term:
- Check logs annually to ensure that files have not been altered. (Level 4).
Metadata:
Metadata for current files are limited, with primarily administrative data being generated that documents when the files were created. Much administrative metadata such as file type, date, and what program was used to create it is automatically generated and kept by the operating system. The focus will be to improve descriptive metadata practices to make locating and identifying content of past shows easier for producers, particularly when identifying programs for rebroadcast at the end of the year.
Short Term:
- Create excel spreadsheet inventory of all script packs and audio recordings created for Midday to establish a sense of WYPR’s holdings. (Level 1).
- Create standardized file naming conventions that clearly describe the content of the script pack, or audio recording. Consider the important elements of what is being described– date of the show? Subjects/Topics? Guests? (Level 3).
Mid Term:
- Document administrative metadata in excel spreadsheet inventory, including date created, who created, file format, and documentation of file transfers. (Level 2).
Long Term:
- Maintain file inventory and adhere to established file naming conventions.
File Formats:
There are currently several different types of files utilized by staff at WYPR, including .DOC, .WAV, and .MP3 files, though the producer’s manual provides some guidance on the required formats for recording digital audio from broadcasts. These are fairly common file formats and do not face significant risk of obsolescence. This currently meets Level 1 standards, but additional standardization and documentation of acceptable file formats is necessary.
Short Term:
- Create a list of all file formats currently in use at WYPR. (Level 2).
- Create a limited, standardized list of acceptable file formats. (e.g. don’t use .TXT files for script packs, or don’t use .WMA for audio recordings.) (Level 1).
Long Term:
- Include documentation of acceptable file formats in updated iteration of the Producer’s Manual. (Level 1).
Great work. Your upfront observation that WYPR is not a repository is well taken. It’s important for them to think about how to keep their content alive for the near term and likely relevant to their mission to identify potential partnerships with a library or an archives that can help ensure more long term access to all or part of their content.
With that noted, it may be worth noting that under long term as well. If they don’t already have a relationship with an archives, it would be great if they could seek out one that might be interested in being a repository for their material once it starts to move out of active use. My sense is that UMD, MICA, or JHU.
The formatting comes through a little funky in a few spots (extra line breaks, indentation, etc.) that is likely something that will convey better in the final version of all of this that you put together into the PDF.
All of your individual recommendations are great. It’s clear that you have developed a nuanced understanding of both their content and their context and I think your suggestions for how they should improve their work are sound.
Great work!