Among the many challenges involved in identifying @mothgenerator-related material of significance and hatching a plan to preserve it, the greatest by far has been metadata.
In assembling an Archival Information Package for material documenting @mothgenerator, I had high hopes of being able to put together a METS file with descriptive, administrative, technical, and structural metadata for the entire AIP. I had been looking at Archivematica’s documentation for METS implementation in AIPs as a potential model but quickly realized that, between the variety of content and material types in the AIP and only beginner-level understanding of METS elements and attributes, I would have a nearly impossible time trying to piece together on my own what a full-service digital preservation system could produce relatively quickly.
Another compromise came in assigning and not assigning item-level descriptive metadata. This would be relatively easy for some digital objects — a folder full of JPEGs, for example — while less efficient for others — such as the CSS and Javascript files accompanying a downloaded Twitter archive. I have to confess that time became a factor here, as I hadn’t previously researched exactly what each of the Twitter archive files was for, and am still working through what WARC files are made of. “May contain: Data of various types.” Consider the solutions here to be in the spirit of extensible processing: one pass at baseline description, with plans to take a closer look in the future.
What should constitute baseline description, anyway? Each of the material groupings described in the statement of preservation intent needed its own folder; ultimately six total:
- 1_Drawing_Program. Files associated with the drawing algorithm behind Moth Generator.
- 2_Twitter_WARC. Captures of the @mothgenerator Twitter feed recorded with WebRecorder as WARC files.
- 3_Twitter_Archive. Tweet archive downloaded from the @mothgenerator account. Includes tweet data as both JSON and CSV.
- 4_Digital Images. 4,000 digital images previously generated by the drawing algorithm and published by @mothgenerator, captured and stored as JPEGs.
- 5_Process_Docs. A collection of tools, texts, and images created in the process of building Moth Generator.
- 6_Artist_Interviews. Captures of online interviews, news coverage, and essays related to Moth Generator, the artists’ bodies of work, and Twitter bots in general.

Carrying out a preservation plan in full would require a lot of artist participation, which I happen not to have for various reasons. As a result many of these folders include dummy files standing in for one or several thousand like them. 1_Drawing_Program contains a text file representing Javascript files and libraries needed for a complex text processing and drawing algorithm. I don’t have actual tweet data for the @mothgenerator account in 3_Twitter_Archive. And 4_Digital_Images includes 25 JPEGs of more than 4,000 published to date on the @mothgenerator Twitter feed. Because this AIP isn’t really where it needs to be to substantively document Moth Generator, I’ve made it available to download here rather than at the Internet Archive — which would have implied a certain readiness for public consumption.

Adventures in metadata creation continued as I experimented with different tools for generating file inventories and checksums. Having had repeat bad luck running FITS on its own, I next tried DataAccessioner, a GUI tool developed at the Duke University Libraries. To use DataAccessioner, one identifies a source and target directory, excludes material not for accessioning, enters Dublin Core description at the collection, folder or file level, and clicks Migrate. DataAccessioner will move the selected material to and from the specified locations, and output an XML file with technical and administrative metadata like file formats, file size, and checksums.

Here’s a look at the file I ended up with, after some false starts. It includes all description assigned at folder level, plus PREMIS data for each folder and file in the collection:


This file is the key outcome, for me, of using DataAccessioner in the first place. It uses FITS for file identification, and the ability to add description is helpful, if super slow by hand. The XML output can be transformed with XSLT or with this handy-sounding GUI — have not yet had a chance to try it. There are other ways to transfer files — if this was 100% archival material I might have tried creating a bag with BagIt or Exactly — but this XML output is really helpful.
Although I had started “cataloging” AIP contents at item-level, the prospect of re-entering it all in DataAccessioner or writing XML more or less by hand did not fill me with joy. A heavy-duty repository system would have let me ingest metadata from a spreadsheet (don’t hate, appreciate). I settled on assigning Dublin Core (15 elements) to the collection as a whole and to each of the six high-level folders. For the last folder — 6_Artist_Interviews — I went one level further down to distinguish between the rights situations of articles saves as HTML pages versus recorded as WARC files. I used the Getty Art & Architecture Thesaurus for subject terms, in part to see how far Moth Generator could stretch it.
Overall, this stage of the project has renewed my appreciation for the batshit crazy world of metadata creation and reconciliation in digital preservation, much of which is now accomplished by cleverly designed tools and therefore taken for granted by the blissfully ignorant rest of us. I subscribe to the prevailing (?) wisdom that sometimes it’s best to let the bits describe themselves, but also need the occasional reminder that blood / sweat / tears makes this possible.
I had also been feeling pretty smug about how good @mothgenerator was looking in WebRecorder, and thought I had things wrapped up. But the Digital Preservation Moth politely begged to differ.
