National Archives, The Digital Vaults

The National Archives is probably the most well-known archive in the United States. However, most people only ever see its most famous documents on display, The Constitution, The Declaration of Independence and the Bill of Rights. While these are the foundational documents of our country, the archives housed thousands and thousands of other pieces that can tell us a lot about the history of our country. Most of these documents are collecting dust, only seen by a lone researcher every decade or so.

                However, recent digitization efforts for the collections at the National Archives are changing this situation. Digitizing records allows greater access for researchers who may be otherwise restricted by travel and financial considerations. Additionally, digitization really opens up the archives for the general public. It allows those people who would normally only enter the National Archives through the front to digitally march through the researcher entrance and explore what they can find.

                The National Archives Digital Vault allows visitors to browse their collections through tagging. Records are organized by tags and you can follow one to another through these connections. In fact, this is the premise for their pathways game, which starts with a record and provided clues to help to find a related document. It is meant to highlight the different ways that documents can be connected to one another. The site also has the option to create your own collection. Any records of personal or research interest can be dragged into a separate space and saved as a collection.

                In addition to tagging, you can filter through documents for type or time period. This is helpful for visitors with specific types of documents or subjects in mind to easily find items. I also really liked this feature because it maintains some of the importance of physically visiting an archive. Oftentimes researchers stumble upon records at archives that they may not have been specifically looking for, but are relevant to their research topic. By showing a number of records related to the one someone is looking at prevents this from disappearing entirely.

                The site also offers visitors the opportunity to create their own products. For example, one could create their own pathways challenge. Visitors can also create movies or posters using the documents that they have saved in their own collections.

                I feel that these types of tools serve multiple purposes. They open archives up to the general public and allow them to explore records they would most likely never see otherwise. In this sense, the design is very accommodating to browsing without a specific topic in mind. However, it also has the features necessary for a more focused search, thereby allowing the utility of the digital collections to be extended to serious researchers as well.

Digitization, Digital Preservation, and File Formats

So I successfully scared everyone off from blogging about the The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. If you get a bit lost in this reading I suggest reading over mwest24’s post on it from last year.

Along with the post I would suggest the following vocabulary terms that we should all become familiar with. I will weave these terms together into a bit of a how computer files and storage work discussion that will help set us up for Kirschenbaum’s book.

As this is technical info, I am just going to crib most of it for Wikipedia. Don’t get too lost in the details, but please read these over and click the links out to Wikipedia if you have no idea about some of the terms.

Key Terms for File Characteristics 

Dots per inch (DPI) is a measure of spatial printing or video dot density, in particular the number of individual dots that can be placed in a line within the span of 1 inch (2.54 cm). The DPI value tends to correlate with image resolution, but is related only indirectly.

character encoding system consists of a code that pairs each character from a given repertoire with something else—such as a bit pattern. In our work, the most useful things to know about are ASCII and Unicode.

Data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying marginally important information and removing it.

Embedded Metadata: Metadata that is embeded inside a given file instead of existing outside the file in some other database or something. Examples include ID3 tags for MP3 audio files and Exif for image files.

Storage Terms

binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text

bit (a contraction of binary digit) is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states. These may be the two stable states of a flip-flop, two positions of an electrical switch, two distinct voltage or current levels allowed by a circuit, two distinct levels of light intensity, two directions of magnetization orpolarization, the orientation of reversible double stranded DNA, etc.

The byte (play /ˈbt/) is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer[1][2] and for this reason it is the basic addressable element in many computer architectures.

disk image is a single file or storage device containing the complete contents and structure representing a data storage medium or device, such as a hard drivetape drive,floppy diskoptical disc, or USB flash drive. A disk image is usually created by creating a complete sector-by-sector copy of the source medium and thereby perfectly replicating the structure and contents of a storage device.

File Types:

Documents: Of particular importance for us are .txt .doc .pdf .xml and .html

Images: Of particular importance for us are .jpg .tiff  and JP2000

Audio: Of particular importance for us are .mp3 .wav

Digital Video Encoding: this one is tricky, we will talk about .mov .mpg .swf .mp4 and .avi

http://www.digitalpreservation.gov/formats/fdd/descriptions.shtml

Print Project Proposal: 9/11 and Online Archives

The print project that I am proposing for this course stems from my interest in the role of digital media in the evolution of cultural memory. The central question driving my attention to this area of study lies in discerning whether or not technology is having a significant impact on public discussion and collective understanding surrounding the remembrance of historical events. By exploring more recent history, in particular the destruction and loss of lives in three American cities on September 11, 2001, it becomes possible to explore how the expanding digital humanities movement is changing our understanding of the archive. The proliferation of born-digital content leading up to this national tragedy has resulted in numerous online archives dedicated specifically to this event, as well as the availability of materials in special collections as part of larger projects. These archives espouse vastly different purposes, aggregate varying types of content, and originate from a variety of civic, federal, commercial, and individual sources.

In surveying the websites yielded by appropriate keyword searches, I hope to create a clearer picture of how the archive is being enacted online. For each site, I will attempt to provide a concise summary of what the stated goals for the project are, who funds and maintains its offerings, how content is collected, whether content is limited to a specific subset or leans toward universal collection of relevant artifacts, how content is organized and presented, whether user-generated content is allowed or encouraged, what types of policies are in place regarding access and responsibilities for long-term upkeep of the collections, and whether the site appears to curate their materials with or without bias. It will also be useful to explore whether or not these online archives appear to have implied audience, either professional or amateur, and whether the low cost of online broadcast opens up the mnemonic discussion to minority voices such as conspiracy theorists and 9/11 deniers.

I am further interested in seeing what role the traditional physical objects associated with historical practice have found in online archives. Not having yet delved into the research, I would assume that these sites are dominated by born-digital content that is easy to upload and manage while tangible objects languish under the same time and resource constraints limiting how quickly they can be documented and processed for viewing online that are currently affecting the digitalization of historically offline archives of pre-digital artifacts. Where relevant in the case of websites that simultaneously offer physical access to collections, as is the case with the National Archives, I will discuss this divide between offline and online at additional length by looking at differences in policies for access of materials and how much of the total content is available online.

Finally, the diversity of online archives presenting content relevant to this particular historical event includes some that allow user comments and/or reviews of specific content, as well as usage or download statistics. Whenever possible, this information will be included and discussed in hopes of sketching out how the content is being used and how those users or site visitors identify themselves in relation to the material.

This paper will work in concert with another paper that I am preparing this semester that will look specifically at policy issues surrounding user-generated content in the online archive. Hopefully, these attempts to create a tentative framework of how online archives currently function will underwrite future research into what effect broader access to these primary materials has on the shape of the public discourse of cultural memory.

Bridget Sullivan Print Project Proposal

In recent years, museums and archives have made a concerted effort to take advantage of digital media in connecting with public audiences. These institutions have undertaken a multitude of projects to make their collection available to a greater audience through digital access to these types of collections. For my print project, I would like to take a closer look at some of these approaches to presenting historic material culture to a public audience and how digitization efforts have affected the way that the public engages with historical narratives through material culture.

 

Specifically, I would like to focus on the digital offerings of the National Archives and the Library of Congress. Historically, these are two of the most widely used research facilities for American history. As such, they have fallen into the category of most archives, which tend to discourage visitation from anyone outside of serious historical researchers. There is little opportunity to explore the holdings of these types of institutions and they can even be intimidating for newer researchers.

 

However, digitization has broken down the barrier between the public and these repositories of American public knowledge. Both have taken great strides to make portions of their collections available to all types of researchers through the Internet. Further, these efforts have been targeted at different audiences. The National Archives and the Library of Congress have both made documents and finding aids available through general search features of their websites. However, they have also gone beyond the basics of digitization. Each has created online offerings that are more suited to general exploration of their collections, as opposed to research with a specific focus and mission.

 

The National Archives offers the Digital Vaults, a way to digitally wander through their collections. Documents are linked by categorical tagging. It also allows explorers the ability to create their own collections of documents and artifacts that are interesting to them. Similarly, the Library of Congress has created MyLOC. Explorers can register for their own account and create collections of interest to them. These collections can incorporate all aspects of the website, including general information about visiting the Library of Congress as well as online exhibits.  

 

I will compare and contrast these two sites, focusing on the audiences they target and the various pathways these audiences have to interact with the collections of these institutions. Additionally, I will address how the ability to interact with collections online has affected the demographics of those who take an interest in these collections.

Digitization 101

“The National Initiative for a Networked Cultural Heritage (NINCH) is a US-based coalition of some 100 organizations and institutions from across the cultural sector: museums, libraries, archives, scholarly societies, arts groups, IT support units and others. It was founded in 1996 to ensure strong and informed leadership from the cultural community in the evolution of the digital environment. Our task and goal, as a leadership and advocacy organization, is to build a framework within which these different elements can effectively collaborate to build a networked cultural heritage.”

This guide promotes itself as a long term, collaborative effort among professionals in the business of cultural heritage preservation and the technical support professionals who make it possible to digitize historical materials. This comprehensive survey of and guide to digitization programs can, and probably should, be used as a fundamental reference for any serious effort in digitally preserving cultural history. The six core  ‘Good Practices’ put forth by NINCH are:

1) Optimize interoperability of materials

2) Enable broadest use

3) Address the need for preservation of original materials

4) Indicate strategy for life-cycle management of digital resources

5) Investigate and declare intellectual property rights and ownership

6) Articulate intent and declare methodology.

This comprehensive guide is laden with jargon, technical references and anecdotal evidence about digitization projects for professional historians. When your time comes to manage a digitization project, I encourage you to read this guide in full, but for now let’s stick to the basics.

At the beginning of Chapter V, the author lays out some ubiquitous questions and concerns like, what format(s) is best, how much detail is necessary, and what are the user activities we should be supporting when digitizing? We’re told we should also consider the nature of the original materials, the purpose of digitizing something and the availability of expertise, tech support and funding to succeed with a certain project.

Different original materials will come in different shapes and sizes. Let’s briefly consider some of the issues, variations, tools, etc. that accompany each format of original material.

Text-based manuscript material:

  • Issues: ‘Proprietary Software’- word processing/imaging platforms like Microsoft Word & Adobe whose licensing and longevity are unreliable
  • Solution– “standards-based methods”- new encoding language like ‘Standard Generalize Markup Language’ (SGML) and “Extensible Markup Language” XML, which “avoid the problems of proprietary software, offering data longevity and the flexibility to move from platform to platform freely.”
  • Variation– Page Image vs Full Text
  • Tools– Scanners. Optical Character Recognition Software. Data capture service.
  • Formats– SGML, XML, TEI, ASCII, HTML, EAD, DTD, METS

 

Images/ 2D art:

  • Issues– Delicacy/irregularity of materials. Quality of digital image. Consistent standards
  • Solution– ‘Intermediaries”, Prioritization of researcher’s needs and investment in quality digitization tools
  • Variation– The needs of different mediums to produce the best digital rendering. For example, digitizing an oil painting has a different set of requirements from digitizing a black and white photograph.
  • Tools– High quality scanners or cameras, adequate storage space, specialized software
  • Formats– TIFF, JPEG, PDF

 

Audio/Visual materials:

  • Issues– Extinction of recording equipment, transmission of files, time, storage and money constraints
  • Solutions– Deal with it
  • Variation– Many different recording methods over the history of audio material come with their own machines, vices and challenges.
  • Tools– Analog playback devices, analog-to-digital converter, editing software
  • Formats– Audio: WAVE, MP3, RealAudio      Video: MPEG, QuickTime, RealVideo    Metadata: METS, SMIL

 

The NINCH Guide also discusses issues of Quality Control and Quality Assurance that are basically the promises made by contributors to digitization projects to their researchers and audiences. These teams are responsible for “the procedures and practices that [are] put in place to ensure the consistency, integrity and reliability of the digitization process.” Progress and quality standards in a digitization project should be built-in from the start and vetted regularly.

The primary goal of digitization is to preserve the original materials by taking them out of regular circulation. But, much foresight and specificity is required to make a digitization project worth the time and money. The idea is that digitization should only have to happen once and the file format will remain flexible throughout the evolution of technology.

The Ugly Truth About Preservation

Is Bert Evil?  And, Should We Care?

There once was a website called Bert Is Evil.  It no longer exists.  Is it important that it no longer exists?  Perhaps it becomes important when we realize that it disappeared after September 11.  The image of Bert was inserted to an anti-America image and the creator of Bert Is Evil was threatened with legal action, so he deleted the site.  But, if you want it is still possible to see what Bert Is Evil looked like.  This is because of Internet Archive, a private organization that tries to archive the Internet.  This is a noble goal but one organization cannot do such work alone.  In his article Scarcity of Abundance? Preserving the Past in a Digital Era, Roy Rosenzweig writes about issues dealing with digital and physical preservation, the issues with each, and the relationship between historians and archivists.

The idea that the Internet should be preserved is catching on and people are wondering who is going to do it and how it is going to be done.  When this article was written, 2003, the government was not preserving the digital world, or records created digitally.  The National Archives does not require that digital record be kept digitally.  Rosenzweig then makes the point that even if digital record preserved the technology that they are preserved with may not be readable five years later.  It is not that the technology has deteriorated it is that it is now outdated. Rosenzweig states: “well before most digital media degrade, they are likely to become unreadable because of changes in hardware…or software.” While I am not always on the digital bandwagon the Internet has changed everything and it is time to conquer the problem of preservation.

But what will happen once the Internet is being archived faithfully?  It is very possible that once this happens there will be amazing amounts of primary source material available. No longer will historians be able to complain about not having enough information, they will be complaining about having too much information.  What will the world come to when historians have a plethora of information?  My sarcasm aside Rosenzweig’s point is a good one.  What will happen when scholars have too many sources, have too much information, have too many places to look?  While not having enough information can be frustrating at least a topic can be narrow and have focus.  This possibility could be a reality but only if society starts preserving all of the digital material being created right now.  Just preserving these records though is not the end it is merely the beginning of a process.

The Archivist and the Historian Should be Friends

What I found most interesting in the Rosenzweig’s article was his dissection of the relationship between archivists and historians.  After reading Nicholson Baker’s book Double Fold I had some idea of the disagreement between historians and archivist, and yes I am a bit biased on the side of keep everything.  After reading Rosenzweig’s article I think the relationship between the groups is better explained.  Historians want to save everything while archivist have to figure out how to store everything that gets preserved, not the easiest job.  What comes through the last part of the article is about the different directions that historians and archivists approach the topic of preservation.  Rosenzweig makes the point that both parties will have to change their attitudes before they have another conversation about preservation again, but he is adamant that both groups need to talk about how to preserve the past and how to preserve digital records as well.

Rosenzweig makes the point that in the beginning the historian and the archivist were friends.  They were part of the same professional organization, the AHA, when “historians saw themselves as having a responsibility for preserving as well as researching the past.”  Now perhaps this is a bit strong but the divide between being a historian and an archivist has become great.  To go by this strong characterization historians want to save everything while archivist have to figure out how to store everything that gets preserved.  What comes through the last part of the article is about the different directions that historians and archivists approach the topic of preservation.  Rosenzweig makes the point that both parties will have to change their attitudes before they have another conversation about preservation again, but he is adamant that both groups need to talk about how to preserve the past and how to preserve digital records as well.

Scarcity or Abundance is a valuable article for better understanding the complexities of preservation digital or otherwise.  The Internet has changed the way that documents are created and preservation of physical documents has never been easy.  How history is being preserved is important and it is equally important that historians and archivist understand what needs to be, should be, and is being preserved.