National Archives, The Digital Vaults

The National Archives is probably the most well-known archive in the United States. However, most people only ever see its most famous documents on display, The Constitution, The Declaration of Independence and the Bill of Rights. While these are the foundational documents of our country, the archives housed thousands and thousands of other pieces that can tell us a lot about the history of our country. Most of these documents are collecting dust, only seen by a lone researcher every decade or so.

                However, recent digitization efforts for the collections at the National Archives are changing this situation. Digitizing records allows greater access for researchers who may be otherwise restricted by travel and financial considerations. Additionally, digitization really opens up the archives for the general public. It allows those people who would normally only enter the National Archives through the front to digitally march through the researcher entrance and explore what they can find.

                The National Archives Digital Vault allows visitors to browse their collections through tagging. Records are organized by tags and you can follow one to another through these connections. In fact, this is the premise for their pathways game, which starts with a record and provided clues to help to find a related document. It is meant to highlight the different ways that documents can be connected to one another. The site also has the option to create your own collection. Any records of personal or research interest can be dragged into a separate space and saved as a collection.

                In addition to tagging, you can filter through documents for type or time period. This is helpful for visitors with specific types of documents or subjects in mind to easily find items. I also really liked this feature because it maintains some of the importance of physically visiting an archive. Oftentimes researchers stumble upon records at archives that they may not have been specifically looking for, but are relevant to their research topic. By showing a number of records related to the one someone is looking at prevents this from disappearing entirely.

                The site also offers visitors the opportunity to create their own products. For example, one could create their own pathways challenge. Visitors can also create movies or posters using the documents that they have saved in their own collections.

                I feel that these types of tools serve multiple purposes. They open archives up to the general public and allow them to explore records they would most likely never see otherwise. In this sense, the design is very accommodating to browsing without a specific topic in mind. However, it also has the features necessary for a more focused search, thereby allowing the utility of the digital collections to be extended to serious researchers as well.

Digitization, Digital Preservation, and File Formats

So I successfully scared everyone off from blogging about the The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. If you get a bit lost in this reading I suggest reading over mwest24’s post on it from last year.

Along with the post I would suggest the following vocabulary terms that we should all become familiar with. I will weave these terms together into a bit of a how computer files and storage work discussion that will help set us up for Kirschenbaum’s book.

As this is technical info, I am just going to crib most of it for Wikipedia. Don’t get too lost in the details, but please read these over and click the links out to Wikipedia if you have no idea about some of the terms.

Key Terms for File Characteristics 

Dots per inch (DPI) is a measure of spatial printing or video dot density, in particular the number of individual dots that can be placed in a line within the span of 1 inch (2.54 cm). The DPI value tends to correlate with image resolution, but is related only indirectly.

character encoding system consists of a code that pairs each character from a given repertoire with something else—such as a bit pattern. In our work, the most useful things to know about are ASCII and Unicode.

Data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying marginally important information and removing it.

Embedded Metadata: Metadata that is embeded inside a given file instead of existing outside the file in some other database or something. Examples include ID3 tags for MP3 audio files and Exif for image files.

Storage Terms

binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text

bit (a contraction of binary digit) is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states. These may be the two stable states of a flip-flop, two positions of an electrical switch, two distinct voltage or current levels allowed by a circuit, two distinct levels of light intensity, two directions of magnetization orpolarization, the orientation of reversible double stranded DNA, etc.

The byte (play /ˈbt/) is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer[1][2] and for this reason it is the basic addressable element in many computer architectures.

disk image is a single file or storage device containing the complete contents and structure representing a data storage medium or device, such as a hard drivetape drive,floppy diskoptical disc, or USB flash drive. A disk image is usually created by creating a complete sector-by-sector copy of the source medium and thereby perfectly replicating the structure and contents of a storage device.

File Types:

Documents: Of particular importance for us are .txt .doc .pdf .xml and .html

Images: Of particular importance for us are .jpg .tiff  and JP2000

Audio: Of particular importance for us are .mp3 .wav

Digital Video Encoding: this one is tricky, we will talk about .mov .mpg .swf .mp4 and .avi

http://www.digitalpreservation.gov/formats/fdd/descriptions.shtml

Print Project Proposal: 9/11 and Online Archives

The print project that I am proposing for this course stems from my interest in the role of digital media in the evolution of cultural memory. The central question driving my attention to this area of study lies in discerning whether or not technology is having a significant impact on public discussion and collective understanding surrounding the remembrance of historical events. By exploring more recent history, in particular the destruction and loss of lives in three American cities on September 11, 2001, it becomes possible to explore how the expanding digital humanities movement is changing our understanding of the archive. The proliferation of born-digital content leading up to this national tragedy has resulted in numerous online archives dedicated specifically to this event, as well as the availability of materials in special collections as part of larger projects. These archives espouse vastly different purposes, aggregate varying types of content, and originate from a variety of civic, federal, commercial, and individual sources.

In surveying the websites yielded by appropriate keyword searches, I hope to create a clearer picture of how the archive is being enacted online. For each site, I will attempt to provide a concise summary of what the stated goals for the project are, who funds and maintains its offerings, how content is collected, whether content is limited to a specific subset or leans toward universal collection of relevant artifacts, how content is organized and presented, whether user-generated content is allowed or encouraged, what types of policies are in place regarding access and responsibilities for long-term upkeep of the collections, and whether the site appears to curate their materials with or without bias. It will also be useful to explore whether or not these online archives appear to have implied audience, either professional or amateur, and whether the low cost of online broadcast opens up the mnemonic discussion to minority voices such as conspiracy theorists and 9/11 deniers.

I am further interested in seeing what role the traditional physical objects associated with historical practice have found in online archives. Not having yet delved into the research, I would assume that these sites are dominated by born-digital content that is easy to upload and manage while tangible objects languish under the same time and resource constraints limiting how quickly they can be documented and processed for viewing online that are currently affecting the digitalization of historically offline archives of pre-digital artifacts. Where relevant in the case of websites that simultaneously offer physical access to collections, as is the case with the National Archives, I will discuss this divide between offline and online at additional length by looking at differences in policies for access of materials and how much of the total content is available online.

Finally, the diversity of online archives presenting content relevant to this particular historical event includes some that allow user comments and/or reviews of specific content, as well as usage or download statistics. Whenever possible, this information will be included and discussed in hopes of sketching out how the content is being used and how those users or site visitors identify themselves in relation to the material.

This paper will work in concert with another paper that I am preparing this semester that will look specifically at policy issues surrounding user-generated content in the online archive. Hopefully, these attempts to create a tentative framework of how online archives currently function will underwrite future research into what effect broader access to these primary materials has on the shape of the public discourse of cultural memory.

Bridget Sullivan Print Project Proposal

In recent years, museums and archives have made a concerted effort to take advantage of digital media in connecting with public audiences. These institutions have undertaken a multitude of projects to make their collection available to a greater audience through digital access to these types of collections. For my print project, I would like to take a closer look at some of these approaches to presenting historic material culture to a public audience and how digitization efforts have affected the way that the public engages with historical narratives through material culture.

 

Specifically, I would like to focus on the digital offerings of the National Archives and the Library of Congress. Historically, these are two of the most widely used research facilities for American history. As such, they have fallen into the category of most archives, which tend to discourage visitation from anyone outside of serious historical researchers. There is little opportunity to explore the holdings of these types of institutions and they can even be intimidating for newer researchers.

 

However, digitization has broken down the barrier between the public and these repositories of American public knowledge. Both have taken great strides to make portions of their collections available to all types of researchers through the Internet. Further, these efforts have been targeted at different audiences. The National Archives and the Library of Congress have both made documents and finding aids available through general search features of their websites. However, they have also gone beyond the basics of digitization. Each has created online offerings that are more suited to general exploration of their collections, as opposed to research with a specific focus and mission.

 

The National Archives offers the Digital Vaults, a way to digitally wander through their collections. Documents are linked by categorical tagging. It also allows explorers the ability to create their own collections of documents and artifacts that are interesting to them. Similarly, the Library of Congress has created MyLOC. Explorers can register for their own account and create collections of interest to them. These collections can incorporate all aspects of the website, including general information about visiting the Library of Congress as well as online exhibits.  

 

I will compare and contrast these two sites, focusing on the audiences they target and the various pathways these audiences have to interact with the collections of these institutions. Additionally, I will address how the ability to interact with collections online has affected the demographics of those who take an interest in these collections.

Digitization 101

“The National Initiative for a Networked Cultural Heritage (NINCH) is a US-based coalition of some 100 organizations and institutions from across the cultural sector: museums, libraries, archives, scholarly societies, arts groups, IT support units and others. It was founded in 1996 to ensure strong and informed leadership from the cultural community in the evolution of the digital environment. Our task and goal, as a leadership and advocacy organization, is to build a framework within which these different elements can effectively collaborate to build a networked cultural heritage.”

This guide promotes itself as a long term, collaborative effort among professionals in the business of cultural heritage preservation and the technical support professionals who make it possible to digitize historical materials. This comprehensive survey of and guide to digitization programs can, and probably should, be used as a fundamental reference for any serious effort in digitally preserving cultural history. The six core  ‘Good Practices’ put forth by NINCH are:

1) Optimize interoperability of materials

2) Enable broadest use

3) Address the need for preservation of original materials

4) Indicate strategy for life-cycle management of digital resources

5) Investigate and declare intellectual property rights and ownership

6) Articulate intent and declare methodology.

This comprehensive guide is laden with jargon, technical references and anecdotal evidence about digitization projects for professional historians. When your time comes to manage a digitization project, I encourage you to read this guide in full, but for now let’s stick to the basics.

At the beginning of Chapter V, the author lays out some ubiquitous questions and concerns like, what format(s) is best, how much detail is necessary, and what are the user activities we should be supporting when digitizing? We’re told we should also consider the nature of the original materials, the purpose of digitizing something and the availability of expertise, tech support and funding to succeed with a certain project.

Different original materials will come in different shapes and sizes. Let’s briefly consider some of the issues, variations, tools, etc. that accompany each format of original material.

Text-based manuscript material:

  • Issues: ‘Proprietary Software’- word processing/imaging platforms like Microsoft Word & Adobe whose licensing and longevity are unreliable
  • Solution– “standards-based methods”- new encoding language like ‘Standard Generalize Markup Language’ (SGML) and “Extensible Markup Language” XML, which “avoid the problems of proprietary software, offering data longevity and the flexibility to move from platform to platform freely.”
  • Variation– Page Image vs Full Text
  • Tools– Scanners. Optical Character Recognition Software. Data capture service.
  • Formats– SGML, XML, TEI, ASCII, HTML, EAD, DTD, METS

 

Images/ 2D art:

  • Issues– Delicacy/irregularity of materials. Quality of digital image. Consistent standards
  • Solution– ‘Intermediaries”, Prioritization of researcher’s needs and investment in quality digitization tools
  • Variation– The needs of different mediums to produce the best digital rendering. For example, digitizing an oil painting has a different set of requirements from digitizing a black and white photograph.
  • Tools– High quality scanners or cameras, adequate storage space, specialized software
  • Formats– TIFF, JPEG, PDF

 

Audio/Visual materials:

  • Issues– Extinction of recording equipment, transmission of files, time, storage and money constraints
  • Solutions– Deal with it
  • Variation– Many different recording methods over the history of audio material come with their own machines, vices and challenges.
  • Tools– Analog playback devices, analog-to-digital converter, editing software
  • Formats– Audio: WAVE, MP3, RealAudio      Video: MPEG, QuickTime, RealVideo    Metadata: METS, SMIL

 

The NINCH Guide also discusses issues of Quality Control and Quality Assurance that are basically the promises made by contributors to digitization projects to their researchers and audiences. These teams are responsible for “the procedures and practices that [are] put in place to ensure the consistency, integrity and reliability of the digitization process.” Progress and quality standards in a digitization project should be built-in from the start and vetted regularly.

The primary goal of digitization is to preserve the original materials by taking them out of regular circulation. But, much foresight and specificity is required to make a digitization project worth the time and money. The idea is that digitization should only have to happen once and the file format will remain flexible throughout the evolution of technology.