Skip to content

Digital History Methods

HIST 477/677 American University

  • Home
  • About
  • Syllabi
  • Projects
    • 2022 Projects
    • 2021 Projects
    • 2020 Digital History Methods Projects
    • 2019 Digital History Methods Projects
    • Digital Preservation Consultant Projects – 2018 – UMD
    • Digital Preservation – 2016- Projects – UMD
    • Digital Art Curation -2016 – Projects – UMD
    • Digital History Methods – 2018 – Projects – AMU
    • Digital Public History – 2015 – Projects – UMD
    • Digital History- 2011 –– Projects – AMU

Tag: AIP

Posted on April 25, 2016

the digital preservation moth

Among the many challenges involved in identifying @mothgenerator-related material of significance and hatching a plan to preserve it, the greatest by far has been metadata.

In assembling an Archival Information Package for material documenting @mothgenerator, I had high hopes of being able to put together a METS file with descriptive, administrative, technical, and structural metadata for the entire AIP. I had been looking at Archivematica’s documentation for METS implementation in AIPs as a potential model but quickly realized that, between the variety of content and material types in the AIP and only beginner-level understanding of METS elements and attributes, I would have a nearly impossible time trying to piece together on my own what a full-service digital preservation system could produce relatively quickly.

Another compromise came in assigning and not assigning item-level descriptive metadata. This would be relatively easy for some digital objects — a folder full of JPEGs, for example — while less efficient for others — such as the CSS and Javascript files accompanying a downloaded Twitter archive. I have to confess that time became a factor here, as I hadn’t previously researched exactly what each of the Twitter archive files was for, and am still working through what WARC files are made of. “May contain: Data of various types.” Consider the solutions here to be in the spirit of extensible processing: one pass at baseline description, with plans to take a closer look in the future.

What should constitute baseline description, anyway? Each of the material groupings described in the statement of preservation intent needed its own folder; ultimately six total:

  • 1_Drawing_Program. Files associated with the drawing algorithm behind Moth Generator.
  • 2_Twitter_WARC. Captures of the @mothgenerator Twitter feed recorded with WebRecorder as WARC files.
  • 3_Twitter_Archive. Tweet archive downloaded from the @mothgenerator account. Includes tweet data as both JSON and CSV.
  • 4_Digital Images. 4,000 digital images previously generated by the drawing algorithm and published by @mothgenerator, captured and stored as JPEGs.
  • 5_Process_Docs. A collection of tools, texts, and images created in the process of building Moth Generator.
  • 6_Artist_Interviews. Captures of online interviews, news coverage, and essays related to Moth Generator, the artists’ bodies of work, and Twitter bots in general.
An overview of folders in the Moth Generator AIP.
An overview of folders in the Moth Generator AIP.

Carrying out a preservation plan in full would require a lot of artist participation, which I happen not to have for various reasons. As a result many of these folders include dummy files standing in for one or several thousand like them. 1_Drawing_Program contains a text file representing Javascript files and libraries needed for a complex text processing and drawing algorithm. I don’t have actual tweet data for the @mothgenerator account in 3_Twitter_Archive. And 4_Digital_Images includes 25 JPEGs of more than 4,000 published to date on the @mothgenerator Twitter feed. Because this AIP isn’t really where it needs to be to substantively document Moth Generator, I’ve made it available to download here rather than at the Internet Archive — which would have implied a certain readiness for public consumption.

One of several ways in which FITS and I didn't get along.
One of several ways in which FITS and I didn’t get along.

Adventures in metadata creation continued as I experimented with different tools for generating file inventories and checksums. Having had repeat bad luck running FITS on its own, I next tried DataAccessioner, a GUI tool developed at the Duke University Libraries. To use DataAccessioner, one identifies a source and target directory, excludes material not for accessioning, enters Dublin Core description at the collection, folder or file level, and clicks Migrate. DataAccessioner will move the selected material to and from the specified locations, and output an XML file with technical and administrative metadata like file formats, file size, and checksums.

Entering metadata (left) and looking over file structure (right) in DataAccessioner.
Entering metadata (left) and looking over file structure (right) in DataAccessioner.

Here’s a look at the file I ended up with, after some false starts. It includes all description assigned at folder level, plus PREMIS data for each folder and file in the collection:

Collection-level metadata (DC) created with DataAccessioner.
Collection-level metadata (DC) created with DataAccessioner.
File-level technical and administrative metadata (PREMIS) created with DataAccessioner.
File-level technical and administrative metadata (PREMIS) created with DataAccessioner.

This file is the key outcome, for me, of using DataAccessioner in the first place. It uses FITS for file identification, and the ability to add description is helpful, if super slow by hand. The XML output can be transformed with XSLT or with this handy-sounding GUI — have not yet had a chance to try it. There are other ways to transfer files — if this was 100% archival material I might have tried creating a bag with BagIt or Exactly — but this XML output is really helpful.

Although I had started “cataloging” AIP contents at item-level, the prospect of re-entering it all in DataAccessioner or writing XML more or less by hand did not fill me with joy. A heavy-duty repository system would have let me ingest metadata from a spreadsheet (don’t hate, appreciate). I settled on assigning Dublin Core (15 elements) to the collection as a whole and to each of the six high-level folders. For the last folder — 6_Artist_Interviews — I went one level further down to distinguish between the rights situations of articles saves as HTML pages versus recorded as WARC files. I used the Getty Art & Architecture Thesaurus for subject terms, in part to see how far Moth Generator could stretch it.

Overall, this stage of the project has renewed my appreciation for the batshit crazy world of metadata creation and reconciliation in digital preservation, much of which is now accomplished by cleverly designed tools and therefore taken for granted by the blissfully ignorant rest of us. I subscribe to the prevailing (?) wisdom that sometimes it’s best to let the bits describe themselves, but also need the occasional reminder that blood / sweat / tears makes this possible.

I had also been feeling pretty smug about how good @mothgenerator was looking in WebRecorder, and thought I had things wrapped up. But the Digital Preservation Moth politely begged to differ.

<a href="https://twitter.com/mothgenerator/status/720541444860198912">The Digital Preservation Moth via @mothgenerator</a>
The Digital Preservation Moth via @mothgenerator

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Categories

  • administrative (5)
  • Database and New Media (7)
  • Definitions (11)
  • Designing Digital Projects (13)
  • Digital Collections (26)
  • Digital Preservation (58)
  • Digitization (9)
  • Materiality (5)
  • Project Proposals (51)
  • Projects (70)
  • Site Review (22)
  • Text Analysis (11)
  • Uncategorized (1,253)
  • video games (8)
  • Visualization (12)
  • Web Community (19)

Recent Comments

  • Patrick Sullivan on Finding Our Place in the Cosmos: From Galileo to Sagan and Beyond
  • klpeter on The Searchable Museum
  • klpeter on Understanding Digital Content: Media, Materiality, and Format
  • mrjackson on Understanding Digital Content: Media, Materiality, and Format
  • RinDavenport on HomelessCast: A History and Investigation of Homelessness
  • RinDavenport on SoundCloud
  • AndyLewis on Digital Archives
  • mrjackson on Walk it Out! Digital Project Proposal
  • mrjackson on SoundCloud
  • mrjackson on Audacity

Recent Posts

  • Videogames, Interactivity & Action March 28, 2023
  • Mobile Media, Place, and Mapping March 25, 2023
  • The Mall, Museums, and Adornments March 25, 2023
  • Reading Response: Digital exhibition, hypermedia narrative March 19, 2023
  • The Searchable Museum March 17, 2023
  • Finding Our Place in the Cosmos: From Galileo to Sagan and Beyond March 17, 2023
  • RHIZOME: The Theresa Duncan CD-ROMs March 17, 2023
  • Understanding Digital Content: Media, Materiality, and Format March 4, 2023
  • Understanding Digital Content Practicums March 4, 2023
  • Link to Blog March 1, 2023
  • Digital Archives February 25, 2023
  • Digital History Practicum: Digital Archives February 25, 2023
  • Digital audio: Oral history and sound studies (Feb 22) February 18, 2023
  • Audacity February 18, 2023
  • SoundCloud February 18, 2023
  • Project Proposal: The Power of Place in Washington, D.C. February 15, 2023
  • Ava learns Python (and tells you all about it) February 15, 2023
  • HomelessCast: A History and Investigation of Homelessness February 15, 2023
  • Walk it Out! Digital Project Proposal February 15, 2023
  • Do You Know DC? A #historytok February 15, 2023
  • Digital Project Proposal – Standing Around February 13, 2023
  • Digital Project Proposal: @Eleanor Roosevelt February 13, 2023
  • Digital History Project: February 13, 2023
  • Digital Proposal – Mapping Historic LGBTQ+ Spaces and in Seattle February 13, 2023
  • Digital Project Proposal February 13, 2023
  • Print Project Proposal: Historians React to Period Shows February 8, 2023
  • Who’s writing history? An analysis on the demographics of authorship in history journals February 8, 2023
  • ChatGPT and Historical Education February 8, 2023
  • Is Hulu’s ‘The Great’ really that great? February 8, 2023
  • It’s all Greek to me!: Print Project Proposal February 8, 2023
  • Print Proposal – A Regional Analysis of US Nationalism February 8, 2023
  • Print Project Proposal: February 7, 2023
  • Print Project Proposal February 7, 2023
  • How are you using digital history? How should you be using it? February 7, 2023
  • Print Project Proposal: All the Presidents’ Names February 6, 2023
  • WordPress February 4, 2023
  • Omeka.net February 4, 2023
  • “Project” as Scholarly Genre: Designing Digital Projects February 3, 2023
  • Google Ngram Viewer January 28, 2023
  • Data Analysis: Distant reading, Text Analysis, Visualization  January 28, 2023
  • Time Magazine Corpus January 28, 2023
  • Voyant Tools January 28, 2023
  • Introducing Patrick Sullivan January 25, 2023
  • Andy Lewis January 25, 2023
  • Meet Meredith Jackson January 25, 2023
  • Introducing: Austin Bailey January 25, 2023
  • Hi everyone! January 24, 2023
  • Introducing: Megan Henry January 24, 2023
  • Who is Grace Conroy? January 24, 2023
  • Vincent Gonzalez Introduction January 23, 2023
  • Introducing Katie Peter January 22, 2023
  • Defining Digital History January 21, 2023
  • Practicum Wikipedia Talk Pages January 21, 2023
  • Practicum By the People January 21, 2023
  • Practicum: Word Clouds January 21, 2023
  • Practicum: History Pins January 21, 2023
  • Introducing myself January 20, 2023
  • Final Project: Archaeology for the Next Generation- 3D Printing and Public Archaeology in the Classroom (video) May 2, 2022
  • Final Project: Soundwalk Ghost Tours in Georgetown May 1, 2022
  • Final Project: The South Bronx is Burning Historypin Tour April 30, 2022
  • Final Project: Topic Modeling Enslavement Narratives April 30, 2022
  • The Rest of History (Podcast) | Final Project with Sam Burnett, from Site Contributor Lauren Pfeil April 29, 2022
  • Final Project: The Past That Haunts Us April 29, 2022
  • Final Project: African American History Digital Resources April 29, 2022
  • Final Project: Archaeology for the Next Generation- 3D Printing and Public Archaeology in the Classroom April 29, 2022
  • Final Project–Skating Soviet: A History of Olympic Figure Skating April 29, 2022
  • Final Project: Racism in the NFL April 29, 2022
  • Final Project – Egyptian Mythological Representation in Video games April 29, 2022
  • Final Project: Holocaust Memory Online April 29, 2022
  • Anne Frank and America April 29, 2022
  • Final Project: Creatures of Comfort April 29, 2022
  • Final Project: The Interactive Museum – Blogging About Digital Interactives! April 28, 2022
  • Final Project: Mapping Alexandria’s Black Communities April 28, 2022
  • Final Project: Teaching History with Youtube April 28, 2022
  • Who? What? Where? – A New Podcast April 28, 2022
  • Final Project — The Abandoned DC Archive April 28, 2022
  • FINAL PROJECT: Southern Temple Bombing Timeline April 27, 2022
  • *Interpretive* Final Project April 25, 2022
  • Final Project: Cleopatra and the Public Blog April 24, 2022
  • Final Project- The Rest of History Podcast w/ Lauren Pfeil April 23, 2022
  • Final Project – Sex Education Timeline April 22, 2022
  • Kathleen Fitzpatrick’s “Planned Obsolescence”: Like Retinol for the Scholarly Monograph | From Site Contributor Lauren Pfeil April 13, 2022
  • Notes on “Scholarly Communications in the History Discipline,” by Griffiths, Dawson, & Rascoff April 11, 2022
  • Notes on “The Ivory Tower and the Open Web: Introduction,” by Dan Cohen April 10, 2022
  • Reading Response: Supporting the Changing Research Practices of Historians April 9, 2022
  • HOW TO: The Programming Historian April 8, 2022
  • HOW TO: MLA CORE April 8, 2022
  • Practicum: iCivics April 6, 2022
  • Practicum: You are the Historian April 6, 2022
  • Practicum on Mission US: Serious History, Serious Games April 6, 2022
  • Practicum: PressForward April 6, 2022
  • Practicum: Scalar April 5, 2022
  • Mission America Online games about American History Grant Proposal April 5, 2022
  • Nakamura Gender and Race Online April 5, 2022
  • Mir & Owens Modeling Indigenous Peoples April 4, 2022
  • Notes on “Critical Play: Radical Game Design” by Mary Flanagan April 2, 2022
  • Practicum: The Will To Adorn App March 30, 2022
  • Historypin Practicum March 30, 2022
  • ARIS Games March 30, 2022
  • Museum on Mainstreet, Practicum March 27, 2022

Archives

  • March 2023 (10)
  • February 2023 (28)
  • January 2023 (19)
  • May 2022 (2)
  • April 2022 (37)
  • March 2022 (18)
  • February 2022 (60)
  • January 2022 (38)
  • December 2021 (1)
  • April 2021 (37)
  • March 2021 (23)
  • February 2021 (30)
  • January 2021 (15)
  • May 2020 (1)
  • April 2020 (47)
  • March 2020 (22)
  • February 2020 (62)
  • January 2020 (30)
  • May 2019 (14)
  • April 2019 (47)
  • March 2019 (27)
  • February 2019 (53)
  • January 2019 (34)
  • December 2018 (18)
  • November 2018 (29)
  • October 2018 (23)
  • September 2018 (25)
  • August 2018 (3)
  • May 2018 (6)
  • April 2018 (39)
  • March 2018 (12)
  • February 2018 (44)
  • January 2018 (22)
  • April 2017 (1)
  • December 2016 (10)
  • November 2016 (33)
  • October 2016 (29)
  • September 2016 (37)
  • August 2016 (1)
  • May 2016 (14)
  • April 2016 (38)
  • March 2016 (24)
  • February 2016 (13)
  • January 2016 (10)
  • December 2015 (1)
  • May 2015 (8)
  • April 2015 (12)
  • March 2015 (17)
  • February 2015 (19)
  • May 2012 (2)
  • April 2012 (49)
  • March 2012 (27)
  • February 2012 (55)
  • January 2012 (34)
  • June 2011 (1)
  • April 2011 (24)
  • March 2011 (21)
  • February 2011 (25)
  • January 2011 (21)
  • November 2010 (1)
Proudly powered by WordPress