Next Steps Preservation Plan for the Archeology Program Office

OVERVIEW

The Archeology Program Office of the Prince George’s County Department of Parks and Recreation was established in 1988 to excavate, preserve and protect archeological sites in county parks. It is part of the Maryland-National Capital Park and Planning Commission. As part of its mission, the program curates millions of artifacts, and over the years, related documentation has been created in various formats and on disparate media. Documentation is primarily in the form of digital and/or physical copies of reports, catalogs, slides, print photographs, negatives, drawings, maps, and videos. All artifacts and documentation are stored in the same facility. Their goal is to have all digital content centralized on one shared network which is currently in development.

This report outlines recommended next steps to preserve their digital content following a framework designed by the National Digital Stewardship Alliance (NDSA). NDSA’s Levels of Digital Preservation presents a series of recommendations based on five areas of concern for digital preservation – storage and geographic location, file fixity and data integrity, information security, metadata and file format. Level 1 recommendations relate to the most urgent activities need for preservation and serves as a prerequisite to the higher levels.

STORAGE AND GEOGRAPHIC LOCATION

Establish a central storage system for all digital content and maintain at least one complete copy stored in another location.

Digital content is currently stored locally on five desktop computers, two laptops, a back-up drive of files from former staff, approximately 300 compact discs and about 700 3.5” floppy disks. Staff have been working on a shared drive to organize content in one place. While staff are backing up their own drives locally, there is no complete copy of all digital content.

Following NDSA’s Level 1 recommendation, staff should move all files off disparate media and into a storage system and create a complete copy of files that should be stored in a different location. This serves a number of purposes:

– Transferring to a new storage system will guard against potential loss of data on old media.

– Minimizing number of storage locations will facilitate data integrity monitoring.

– Creating a complete copy will provide a back-up in case files are corrupted or lost.

– Storing a copy in a different location will guard against loss specific to one location such as damage to equipment as a result of severe weather or a catastrophic event.

– This process is also a necessary first step in achieving a goal for the program, which is to have all files organized and accessible without extensive searching.

1. Create a complete copy of what is accessible right now.

The development of the shared network in coordination with the program’s IT department will likely serve as their established storage system. A portion of files are currently inaccessible on floppy disks. However, the bulk of digital content is stored locally on various hard drives.  Any content that is currently accessible on hard drives should be copied and stored in another location to safeguard against loss. See “File Fixity and Data Integrity” below for checking data integrity and creating a file manifest. In the short term, storage can be on an external hard drive that is kept in another location in a secure place. Alternatively, staff could use a cloud storage service such as Dropboxwhich would have the added benefit of storing files offsite. Once files are integrated into the shared drive, a copy can be created from this set of content. Staff can work with the IT department to see if they can use a back-up storage system that is already in place with other departments.

2. Establish a file structure in the shared directory

Staff can begin integrating current files that they work with into the networked drive in order to develop a workable structure that will be practical for finding files. Historical documents can then be organized into that structure. Once a structure and naming system has been established, staff should document and follow this structure. This can be done concurrently with the following step.

3. Copy files from compact disks and 3.5” floppy disks using an unnetworked workstation

It will likely take a while to sift through decades worth of files. Given the urgency of potential loss from old media and a potential risk of viruses from disks used by former staff, copy over all the files from compact and floppy disks first onto a an unnetworked drive and run a virus scan before combining with current files.

a. Drive space needed: Assuming approximately 700 MB of data per CD and 1.44 MB per floppy disks, there is approximately 211 GB of data if all the disks are full.

b. Install anti-virus software on unnetworked workstation: Speak with the IT department about installing antivirus software so that it can run in the background while copying over files. Establish a schedule to run full virus scans depending on how regularly files are being copied over.

 c. Install drivers for 3.5” external floppy drive on unnetworked drive: At the time of the survey, staff had an external drive to read 3.5” floppy disks but lacked the software to run the drive. This step is necessary in order to access any of this information.

d. Install fixity software on unnetworked and networked drives: See the section “File Fixity and Data Integrity” below.

e. Transferring files from CDs and floppy disks to unnetworked hard drive: Currently staff use context such as file extensions, file location, file name, and the author name to find content. Therefore, it’s recommended that they preserve these elements and related descriptive information that may be written on the disks as much as possible until they can sort out all the content. Disks were labeled in a number of ways. The majority of floppy disks where organized in boxes of about 10 disks with labels on the boxes suggesting that there were groups of related disks. Some disks had vague or no labeling. Related files kept together on disks or boxes of disks should be copied into a directory folder with the descriptive information from the label. This can either be transcribed onto a text file or a picture can be taken of the label and included in the directory.

f. Timing: While each floppy disk may hold a relatively small amount of data, the process of copying over hundreds of disks can be labor intensive. However, during this window, the work is not being backed up. Therefore, consider either setting a short window of time to copy over all of these files, or setting a schedule where small batches of files are copied over from disks and scanned for viruses before being copied to another location. This can either be added to shared drive or another external drive that is stored in another location. Once files have been safely transferred, they can be integrated into the file structure on the shared drive.

4. Set a schedule of back-ups to maintain a complete copy.

METADATA

The NDSA Level 1 recommendation it to create an inventory of digital content and storage location. Like the digital content itself, the inventory should be backed up and stored in another location. As mentioned in the Section “File Fixity and Data Integrity” below, AVP’s Fixity software includes a function to generate a manifest of file paths that will assist with inventory. Staff can maintain this inventory as they continue to develop the file structure on the shared drive.

For file and directory naming, consider creating a controlled vocabulary and syntax to make it easier for staff to find files. This can include specific terms for archeological site names, document type (e.g., site form, report), and a version, year or other modifier (e.g., draft, final) when needed.

FILE FIXITY AND DATA INTEGRITY

File fixity is a way of ensuring that files have not changed. It is recommended to run fixity checks whenever files are transferred (Owens, p. 110). This will generate an alphanumeric string called a checksum that can be compared before and after the transfer. Changing the content of the file including the format will change the checksum value. If the IT department does not already use fixity software, Fixity is a free tool from AVP that can be used to generate and compare checksum values and ensure that all files have been transferred. The software also generates a manifest of file paths along with the checksums that could prove useful in establishing an inventory of digital content.

The Level 2 recommends virus checking high risk content while Level 3 recommends virus checking all content. Virus checking high risk content is addressed 3b of “Storage and Geographic Location.” Staff should have antivirus software installed at their workstations and run scheduled scans.

Level 3 also recommends checking fixity as fixed intervals to ensure data integrity over time. Consider establishing a yearly schedule of validating fixity. Any corrupt or missing files can be replaced with a copy that passes fixity validation.

INFORMATION SECURITY

This step will outline who has access to the content and what they can do with it. This will prevent files from getting deleted or changed by unauthorized staff. NSDA Level 1 recommends to identify who is authorized to read, write, move or delete individual files. Related to this, Level 4 in the section on file fixity also recommends that no one is authorized to have write access to all copies. This reduces the likelihood of changing or deleting all copies of one or more files.

Staff have taken initial steps in the process by creating three different directories with different levels of access for their users: one directory for onsite archeology staff, one directory for the rest of the Prince George’s County Parks Department, and one directory for Dinosaur Park, another program of the Prince George’s County arm of the Maryland National Capital Park and Planning Commission that shares the same workspace.

However, more levels of access may be necessary if only one person in the office is allowed to have read or write permissions. In addition, staff should clearly delineate working files from historical files that should not change. This will help to prevent the document from being changed or deleted. It will also help with fixity validation since working files will likely involved changes in content which will change the checksum value. This can be accomplished by setting permissions to specific subdirectories or to specific sets of files. Document access restrictions and store in a location that all users can access.

FILE FORMAT

 File formats can become obsolete. In some cases, once the format is obsolete the file might not be able to be opened in another format or will not be rendered in exactly the same way. The purpose of this section is to minimize these problems by using formats that are less likely to become obsolete, or that can be effectively rendered in another format. Widely used formats are generally considered to remain accessible because there will be a demand to either keep them accessible or develop a means of migrating them (Owens, 121).

Since some media have not yet been accessed, a current inventory of file formats that have been used is not available. Formats currently in use are jpegs andfiles generated from different versions of Microsoft Word, Excel and Access. Mapping files are created using GIS (geographic information systems) technologies. Older files have been created using WordPerfect, CAD (computerized aided drafting) software, and Paradox relational database. Staff are currently having trouble with opening Paradox files since the database is no longer supported and cannot be opened using current versions of Access or Excel.

Formats such as jpeg and Microsoft Word and Excel are commonly used, although the latter two undergo regular updates which may render slight changes if a file is opened in a new version. As the files are incorporated into the new directory structure on the shared drive, staff should develop an inventory of formats that they are using, work with the IT department to monitor them for obsolescence, and be prepared to migrate as needed.

FUTURE STEPS

NDSA Level 2 for storage recommends creating a third copy of the content and Level 4 recommends at least three copies stored in locations with different disaster threats. The Archeology Program Office could combine this recommendation with a means of sharing some of their content. This could be through a subject-specific repository for archeology or something more general like the Internet Archive.

Levels 2-4 address steps to maintain storage media so that files continue to be accessible in the long term. Staff should work with their IT department to document storage used for their shared drive and back-up copies, monitor for obsolescence, and have a plan in place for updating systems.

Staff also expressed an interest in resuming digitization of their physical documentation. Some reports and slides have already been digitized. As a starting point, staff can discuss their experiences with past efforts and lessons learned to establish goals for the program and how this will fit into the file structure that they are creating for current digital content. The Still Image and Audiovisual Working Groups of the Federal Agencies Digital Guidelines Initiative can be a good resource to establish best practices for digitization.

Pluralist, multimodal, derivative access points: preservation for who/what?

As I was reading for class this week, I couldn’t help but see all of the pieces through the lens of the research projects I’m currently engaged in, because these questions of access (Of what? For who? How?) have been so central to all three projects. It might be that my projects are more focused on access, but to my mind, it’s much more likely that access is the reason for any archival endeavor. As Owens points out, “The purpose of preservation is enabling access.”

There were way too many interesting threads in the readings for this week, so I picked a few that particularly hit home for me:

Screen Essentialism

Owens points out that cultural heritage institutions often want to hew more closely to the “boutique” approach of digital access, rather than a “wholesale” one, and while a “boutique,” curated approach to access this is generally framed as being more user-friendly, it also comes with the risk that access will remain secondary or tertiary. The more user-friendly frame often means that collections/items aren’t made available until the institution has a sophisticated access system in place. “Screen essentialism” in this view of access refers to the fact that there is no one inherent way of accessing digital objects; Owens urges us to “get over the desire to have the kinds of interfaces for everything where one just double clicks on a digital object to have it ‘just work.'”

Padilla and Higgins too warned of screen essentialism and “data essentialism”; oversimplifying the nature of data and obscuring complexities by viewing both the systems used to locate, process, and understand data, and the nature of data itself. Christen, on the other hand, describes Mukurtu as a system that does need to “just work” and have difficult computational processing happen below the surface, but in this case it’s not a matter of having a single system that works for every possible user, but creating and implementing a system that allows for customization an individual basis, because that is what best serves the collections Christen works with.

Collections as Data, Data as Data

Padilla and Higgins’s piece focuses on defining data, and thinking about digital library collections as pieces of humanities data, especially in how this mindset affects access to digital cultural heritage: “The authors hold that Humanities data are organized difference presented in a form amenable to computation put into the service of Humanistic inquiry.” So, practically thinking, information professionals should be considering how to make collections available and what access points would aid in these collections being “amenable to computation.” Padilla and Higgins’ emphasis on derivative (often DH) projects serving as incredibly useful access interfaces for digital collections, as well as mention of metadata as useful data in its own right, aligns well with the chapter we read this week in Owens’ book.

While I strongly agree with the ideas Padilla and Higgins are putting forth, I do harbor some concerns about how this article, and the larger research project it morphed into (Collections as Data) might be in conflict with archival practice and values. For instance, does the focus on interfaces developed outside of the archive, such as “The Real Face of White Australia” (disclaimer: I’m a Tim Sherratt stan), undercut the importance of contextual relationships between parts of an archival collection? Projects like this aid in access to and understanding of archival material, but are parts of a wider whole that many users may not realize. How can we maintain context (cough, the provenance debate) while also making digital archival collections “amenable to computation”? Are archivists being cut out of this information exchange, and if so, how we do re-insert ourselves?

Discussion Questions

  1. How does thinking through access impact processing workflows? Does MPLP work as an approach to all collections? How does prioritizing access play into undertaking documentation strategy projects?
  2. Do you have experience with ethical and/or privacy issues that might prevent you from batch converting and uploading immediately? What about when legal/copyright issues and ethics are at odds? When can you legally make something available but might not want to?
  3. Owens emphasizes keeping any sensitive material is a risk that information professionals must seriously consider. But  one of the projects I’m working on, Safely Searching Among Sensitive Content has made me think about sensitivity in so many contexts – reputational harm, for instance, is incredibly broad. How do you know you’re making the right decision? In addition, during the initial work on developing an access system for email collections containing sensitive material for SSaSC, we found that our platform’s search functions work better when the algorithms have access to the sensitive material, even while accounting for the fact that that material won’t normally be shown the user. How does work like this complicate our thoughts on collecting sensitive material?
  4. How does thinking about access to only metadata change or not change the way you would process and catalog collections? Does this apply to both descriptive metadata and technical metadata?
  5. What multimodal methods of access might work for the small institution you’re partnered with? Which would not (currently, at least?)
  6. In Padilla and Higgins’ piece, they posit that librarians/archivists/info professions are well-suited to “offering training in the skills, tools, and methods needed to take advantage of Humanities data.” Is this the case, on the ground? Why or why not? What are the major challenges we need to overcome, at an institutional and field level, in order to better serve users in this way? Is training in these skills different than simply providing multimodal access?
  7. Have you seen the feminist HCI values of “plurality, self-disclosure, participation, ecology, advocacy, and embodiment” in practice? How do you anticipate using them?
  8. Christen opens her article by stating that “Archives have long been ambivalent places for Indigenous communities whose cultural materials are held in their storerooms.” (21) In what ways do we, as a profession, reinforce that ambivalence? Question it? Does multimodal access, as delineated by Owens, ameliorate this ambivalence enough?

In thinking through these discussion questions, I was continuously reminded of Miriam Posner’s blog post, “Money and Time.” Every concern about staff resources and ways to implement access seems to align with the sustainability, resources, and burnout concerns that Posner brings up in relation to DH centers and initiatives: “You can optimize, streamline, lifehack, and crowdsource almost everything you do — but good scholarship still takes money and time.” Multimodal, plural, cultural sensitive access to digital objects and collections still takes money and time.

digital objects and determining value

This week’s readings were widespread in their content and at times had me feeling a bit at sea with the detailed descriptions of hard drive technology, digital forensics, file formats, etc. (There’s nothing like reading these kinds of things to remind me that I’m nowhere near as technologically proficient as I’d like to think.) I’m grateful for Prof. Owens’ book since it describes digital media and their structures in an accessible, understandable way. I’ll briefly recap his three key points laid out in chapter two, since I saw these ideas echoed throughout the other readings.
1. “All digital information is material.”
Such a basic fact, and yet (as the book mentions) I generally think of my personal digital files in abstract terms, like being lost “in the cloud” or behind this mysterious wall, because my technological know-how is limited.
2. The logic of digital media and computational systems is “the logic of database.”
People interact with digital objects much differently than they engage with analog media. Since databases are ordered based on the query asked of them, digital information can and will always be presented in a myriad of arrangements.
3. “Digital systems are platforms layered on top of each other.”
This one took me a little longer to understand, but I take it to mean that every digital object has multiple informational layers which people are often unaware of. Depending on what someone is studying or looking for, they are going to care about preserving certain layers of the object over others. And these layers are often interdependent on each other.

While reading, I kept thinking of how much we take for granted as we use all of our various devices to function in the world, and the enormous amounts of data and media that will be left behind once we are gone. This quote from the Kirschenbaum article sums up my questions perfectly: “ […] how do these accumulations, these massive drifts of data, interact with irreducible reality of lived experience?” Within the digital preservation field, how do we reconcile that tension between the materiality of our digital footprints and the ephemeral, intangible stuff of life? I’m personally not convinced that you can fully capture someone’s working or personal environment through their digital papers, even with emulation of their computer (thinking of the Salman Rushdie anecdote from the Digital Forensics report). Or even from an ethical standpoint that it’s always advisable. How do we know what digital information is worth saving or recovering, and who deserves access to it?

As the Digital Forensics report points out, it is not immediately clear what digital items are going to have historical or cultural value in the future, making it harder to know what to preserve. And then how can professionals adequately preserve relationships between different items, events, and media (a random aside–the Jackson Citizen Patriot is my hometown’s newspaper. I did a big double take when I read about the photo of the snowmobilers’ accident and its significance.)? This reminded me of our conversation last week about authorial intent, as well. If a creator doesn’t wish for their entire digital footprint to be saved indefinitely (or saved at all), but there is potential cultural value to their information, whose concerns are prioritized? I have a lot of mixed feelings about this. As mentioned before, Trump would love to cover his tracks–he tries daily, either by literally tearing up memos or obfuscating and lying. But the office of the Presidency is bound by laws that prevent this (or try to), and I doubt anyone would argue that these records are not necessary for future generations. Perhaps the question should be, at what point does a person become so culturally or historically influential that their wishes about their data are overridden by other, more pressing concerns?

The Chan & Cope article address these questions from an institutional standpoint. As a museum studies student I was both fascinated by their argument and struggled with it. I definitely like “the stuff” of museums. While I visit museums to be engaged and to relax, oftentimes what draws me to an exhibit (particularly art museums) is a particular piece or an artist whose works I love. I agree with Chan & Cope that collection strategies should serve a different purpose today; there should be a real intention behind acquisition that goes beyond prestige or hoarding mentality.

However, I’m not quite convinced that a “post-objects curatorial practice” is the natural solution. And is it really “post-objects” if a museum instead exhibits the contextual documents surrounding a systems design? The piece that was missing for me was, does a “post-objects” approach reflect the needs of a museum’s community? Collecting a contemporary, provocative item (digital or analog) might generate a lot of buzz, but will it mean something to the average museum-goer beyond taking a selfie with said object? Relevance isn’t necessarily about what’s trending in the moment (btw, The Art of Relevance, a book by Nina Simon, is an excellent read that explores this topic in the museum field in depth).

I realize I have more questions than definitive ideas or opinions in this post. Interested to share more thoughts and discussion with everyone in the week to come!

My Life as a Warning to Others

Hello, everyone. My name is Andy Cleavenger, and I am beginning my fourth year of this two year program.

My life up to this point has been spent as a photographer and multimedia specialist at a government contractor. I work in their Communications department. My interest in this class stems from my role as the sole caretaker of our department’s image collection. For over 17 years I have been the only one capable of performing image searches, and the only one concerned with the preservation of those images. I’m in the Digital Curation track to learn how to effectively turn my collection into a self-service resource available to all employees. And I’m in this class specifically to make sure I’m doing everything possible to ensure the long-term preservation of our image collection.

I must admit that the first axiom listed in Owens – “A repository is not a piece of software” – just about made me stand up from my chair and shout “see, I told you!” at my former boss. We have always treated the image collection as a problem that can be solved with a magic-bullet purchase of DAM software.

“We bought it… we’re done!”

This is of course, extremely common. Like most offices, they forget about the systems that will come after the present one, or the unceasing march of technological progress that dictates both the increasing complexity of the images as well as the expanding diversification of their use. This was nicely summed up in Owens’ last axiom: “Doing digital preservation requires thinking like a futurist.”  I fear that they may regret some of the decisions they’ve made such as stripping all filenames from their videos, throwing everything into a single directory, and then depending on an external proprietary catalog file to save all related metadata.

We are now married to that system… and it’s failing us.

The remaining articles on either side of the digital dark age debate made some equally compelling points. Ultimately, I felt that Lyons and Tansey both came closest to hitting the mark on what form a digital dark age would take, as well as the forces that would drive it. Lyons frames the problem as one of cultural blindness. That is to say that institutions that exist within and serve a particular society tend to have difficulty in recognizing the value in – or even being aware of – the records of other communities. As such, the digital dark age will manifest itself in the silence of these socio-politically disadvantaged communities within the archival record.

This is not an unfamiliar argument, but I tend to think the motivations for its reality are less a conspiratorial omission than they are due to a sad pragmatism driven by extremely finite resources. This point was reflected well in Tansey. She makes the point that the long trend of cuts to budgets and staff force institutions to set priorities that obviously leave gaps in the archival record. In other words, even if an institution has an awareness of fringe communities, and possibly even has a sympathetic collections policy for including those records, the pragmatism of limited resources may still dictate their omission as the institution focuses on its highest priorities.

I have certainly seen this in my position in the Communications department. I’m curious if others in class have seen examples like this in their own workplaces?

Digital Preservation Policy: Web Archiving for the Washingtoniana Collection

Introduction:

In my previous posts on this blog I have surveyed the digital preservation state of the District of Columbia Public library’s Washingtoniana collection. This survey was preformed via an interview with Digital Curation Librarian Lauren Algee  using the NDSA levels of digital preservation as a reference point.

In our survey we discovered that the DCPL Washingtoniana collection has very effective digital preservation which through a combination of knowledgeable practices and the Preservica service (an OAIS compliant digital preservation service) naearly reaches the 4th Level in every category of the NDSA levels of Digital Preservation. With this in mind my next step plan for the archive looks at a number of areas the archive has been interested in expanding and presenting some thoughts on where they could begin taking steps towards preservation of those materials.

Of particular interest in this regard is the collecting of website materials. Being dynamic objects of a relatively new media, collecting these items can be fairly complex as it is hard to precisely pin down to what extend is a website sufficiently collected. Websites may appear differently on different browsers, they may contain many links to other websites, they change rapidly, and they often contain multimedia elements. As such outlined below will be a policy which discusses these issues and specifically offers a digital preservation plan for websites.

Website Digital Preservation Policy for the Washingtoniana collection

The Washingtoniana collection was founded in 1905 when library director Dr. George F. Bowerman began collection materials on the local community. The collection stands as one of the foremost archives on the Washington, D.C area, community, history, and culture. Naturally it makes sense then with the increasing movement of DC social life and culture to online or born digital platforms that the Washingtoniana collection would consider collecting websites.

Selection

The same criteria for determining selection of materials for Washingtoniana materials should apply here. Websites should be considered if they pertain to Washington, DC or its surrounding areas, events that take place in or discus that area, pertain to prominent Washington D.C. related persons, DC related institutions, or websites otherwise pertaining to Washington D.C. community, arts, culture, or history.

Like any physical preservation decision, triage is an essential process. Websites that are likely to be at risk should be high priority. In a sense all web content is at risk. Websites that are for a specific purpose, or pertain to a specific event may have a limited operational window. Websites for defunct businesses, political election sites, and even an existent website on a specific day may be vulnerable and thus a candidate for digitization. In addition the materials in question should not be materials which are being collected elsewhere, and should be considered in relation to the rest of the collection.

Although automation tools may be used for identification, discretion for selection is on librarian hands. In addition, suggestions from patrons relevant to the collection should be considered, and a system for managing and encouraging such suggestions may be put in place.

Metadata

A metadata standard such as MODS (Metadata Object Description Standard ) should be used to describe the website. MODS is a flexible schema expressed in XML, is fairly compatiable with library records, and allows more complex metadata than Dublin Core and thus may work well. Metadata should include but not be limited to website name, content producers, URL, access dates, fixity as well as technical information which may generated automatically from webcrawlers such as timestamps, URI, MIME type, size in bytes, and other relevant metadata. Also, extraction information, file format, and migration information should be maintained.

Collection

A variety of collection tools exist for web archiving. The tool selected should be capable of the below tasks as outlined by the Library of Congress web archiving page

  • Retrieve all code, images, documents, media, and other files essential to reproducing the website as completely as possible.
  • Capture and preserve technical metadata from both web servers (e.g., HTTP headers) and the crawler (e.g., context of capture, date and time stamp, and crawl conditions). Date/time information is especially important for distinguishing among successive captures of the same resources.
  • Store the content in exactly the same form as it was delivered. HTML and other code are always left intact; dynamic modifications are made on-the-fly during web archive replay.
  • Maintain platform and file system independence. Technical metadata is not recorded via file system-specific mechanisms.

A variety of tools are capable of this task, a web crawler such as the Heritrix open source archival webcrawler or a subscription solution Archive-IT should be used. Both are by the Internet Archive, however the first is more of an open source solution while the second is a subscription based service which offers storage on Internet Archive servers.

Upon initial collection fixity should be taken using a Checksum system. This can be automated either with a staff written script or a program like Bagit, which automatically generates fixity information. This information should be maintained with the rest of the metadata for the digital object.

Websites should be kept in the most stable web archival format available. At the moment of this posts writing that format should be the WARC (Web ARChive) file format. This format allows the combination of multiple digital resources into a single file, which is useful as many web resources are complex and contain many items. Other file formats may be accepted if archived webpages are received from donors.

Preservation

Upon initial ingestion items may be kept on internal drives, and copied to at least one other location. Before the item is moved into any further storage system the file should be scanned for viruses, malware, or any other undesirable or damaging content using safety standards as agreed upon with the division of IT services. At this point fixity information should be taken as described above, and entered into metadata record.

Metadata should be described as soon as possible, as to which point the object with attached metadata should be uploaded into The Washingtoniana’s instance of Preservica.

Although Preservica automates much of the preservation process, a copy of the web archive should be kept on external hard drives. On a yearly interval a selection of the items within the harddrive should be checked against the items in Preservica to insure the Preservica fixity checks and obsolesce monitoring are working as desired.

References

Jack, P. (2014, February 27). Heritrix-Introduction. Retrieved November 14, 2016, from https://webarchive.jira.com/wiki/display/Heritrix/Heritrix#Heritrix-Introduction
Web Archiving-Collection development. (n.d.). Retrieved November 16, 2016, from https://library.stanford.edu/projects/web-archiving/collection-development
The Washingtoniana Collection. (n.d.). Retrieved November 16, 2016, from http://www.dclibrary.org/node/35928
Web Archiving at the Library of Congress. (n.d.). Retrieved November 16, 2016, from https://www.loc.gov/webarchiving/technical.html
Niu, J. (2012). An Overview of Web Archiving. Retrieved November 16, 2016, from http://www.dlib.org/dlib/march12/niu/03niu1.html
AVPreserve » Tools. (n.d.). Retrieved November 17, 2016, from https://www.avpreserve.com/avpsresources/tools/
Kunze, J., Bokyo, A., Vargas, A., Littman, B., & Madden, L. (2012, April 2). Draft-kunze-bagit-07 – The BagIt File Packaging Format (V0.97). Retrieved November 17, 2016, from http://www.digitalpreservation.gov/documents/bagitspec.pdf
MODS: Uses and Features. (2016, February 1). Retrieved November 17, 2016, from http://loc.gov/standards/mods/mods-overview.html
About Us. (2014). Retrieved November 17, 2016, from https://archive-it.org/blog/learn-more/