At some point during the semester, it started to seem strange to me that digital art curation didn’t also mean a trail of audio / visual / moving-image process documents. I used GIFs and video as shorthand on the blog, trying to illustrate or punctuate a point here and there, but didn’t synthesize anything through visual production.
Backing slowly away from Plan A — a “lessons learned” post composed entirely of Hamilton GIFs and lyrics — I’m instead taking some space here to speculate: What are some alternative ways to represent the products and processes involved in digital art curation? Here’s a look at some of the drawings I made while grappling with @mothgenerator.
I used this diagram as a working tool while writing my statement of significance for Moth Generator. It was a useful way to start looking ahead to the kinds of experiences and characteristics different stakeholder groups might value and expect. Because the grid was designed to express authenticity and access from the users’ (not creators’) perspective(s), mapping my project to it was a natural fit for an overall shift towards a more user-centered preservation strategy. It’s also a sneaky example of how techniques for visualization can shape the content, purpose, and management of information. So, nice work, Dragan — your nefarious grid convinced me!
2_Distributions of significance
As a way to trace the evolution of this project from conceptual (identifying significance) to somewhat-less conceptual (declaring preservation intent, assembling a dummy AIP), I mocked up this rainbow circle mapping Moth Generator’s components, stakeholders, and significant characteristics to the contents of the eventual AIP. It’s interesting to see how conceptual elements converge around certain parts of the AIP, but I wouldn’t drawn any conclusions from that about priorities or complexity. It’s not as though more connecting lines means more value (maybe mo’ problems). I mocked this up without much of an agenda beyond, “Let’s draw some lines and see what happens,” and am at least pleased with how it represents the project’s trajectory.
This grid diagrams the range of tools considered, tested, and ultimately used to capture, describe, and package material into an AIP. I tried to represent the overlapping functions of many of these tools, where they address the digital art curation lifecycle, and the degree of success I had with each. In choosing shades and ordering the vertical axis, I tried to avoid designating things as “failures” or “bad tools,” since success or failure in digital curation is often a matter of mismatch (right tool / wrong purpose, or vice versa) rather than of quality. One early idea for the vertical axis was to sort tools on a spectrum from ideal to contingency to NO, but in the end I chose to list them by earliest point of intersection with the lifecycle. Interestingly, making this diagram called my attention to how actually useful DataAccessioner (one of the “contingency” tools) really was.
There is so much digital preservation software out there — check COPTR or the POWRR tool grid if you doubt it. With this little drawing, I mostly want to convey the value of diversifying and experimenting. The grid has been a useful way for me to track what I’ve done and what to try next. Rhetorically, it says, “Keep trying!”
UPDATE (5/5/16): Images now give the actual course number. Sleep-deprived regrets.
In assembling an Archival Information Package for material documenting @mothgenerator, I had high hopes of being able to put together a METS file with descriptive, administrative, technical, and structural metadata for the entire AIP. I had been looking at Archivematica’s documentation for METS implementation in AIPs as a potential model but quickly realized that, between the variety of content and material types in the AIP and only beginner-level understanding of METS elements and attributes, I would have a nearly impossible time trying to piece together on my own what a full-service digital preservation system could produce relatively quickly.
What should constitute baseline description, anyway? Each of the material groupings described in the statement of preservation intent needed its own folder; ultimately six total:
1_Drawing_Program. Files associated with the drawing algorithm behind Moth Generator.
2_Twitter_WARC. Captures of the @mothgenerator Twitter feed recorded with WebRecorder as WARC files.
3_Twitter_Archive. Tweet archive downloaded from the @mothgenerator account. Includes tweet data as both JSON and CSV.
4_Digital Images. 4,000 digital images previously generated by the drawing algorithm and published by @mothgenerator, captured and stored as JPEGs.
5_Process_Docs. A collection of tools, texts, and images created in the process of building Moth Generator.
6_Artist_Interviews. Captures of online interviews, news coverage, and essays related to Moth Generator, the artists’ bodies of work, and Twitter bots in general.
Adventures in metadata creation continued as I experimented with different tools for generating file inventories and checksums. Having had repeat bad luck running FITS on its own, I next tried DataAccessioner, a GUI tool developed at the Duke University Libraries. To use DataAccessioner, one identifies a source and target directory, excludes material not for accessioning, enters Dublin Core description at the collection, folder or file level, and clicks Migrate. DataAccessioner will move the selected material to and from the specified locations, and output an XML file with technical and administrative metadata like file formats, file size, and checksums.
Here’s a look at the file I ended up with, after some false starts. It includes all description assigned at folder level, plus PREMIS data for each folder and file in the collection:
This file is the key outcome, for me, of using DataAccessioner in the first place. It uses FITS for file identification, and the ability to add description is helpful, if super slow by hand. The XML output can be transformed with XSLT or with this handy-sounding GUI — have not yet had a chance to try it. There are other ways to transfer files — if this was 100% archival material I might have tried creating a bag with BagIt or Exactly — but this XML output is really helpful.
Although I had started “cataloging” AIP contents at item-level, the prospect of re-entering it all in DataAccessioner or writing XML more or less by hand did not fill me with joy. A heavy-duty repository system would have let me ingest metadata from a spreadsheet (don’t hate, appreciate). I settled on assigning Dublin Core (15 elements) to the collection as a whole and to each of the six high-level folders. For the last folder — 6_Artist_Interviews — I went one level further down to distinguish between the rights situations of articles saves as HTML pages versus recorded as WARC files. I used the Getty Art & Architecture Thesaurus for subject terms, in part to see how far Moth Generator could stretch it.
Overall, this stage of the project has renewed my appreciation for the batshit crazy world of metadata creation and reconciliation in digital preservation, much of which is now accomplished by cleverly designed tools and therefore taken for granted by the blissfully ignorant rest of us. I subscribe to the prevailing (?) wisdom that sometimes it’s best to let the bits describe themselves, but also need the occasional reminder that blood / sweat / tears makes this possible.
I had also been feeling pretty smug about how good @mothgenerator was looking in WebRecorder, and thought I had things wrapped up. But the Digital Preservation Moth politely begged to differ.
The purpose of this collection is to document the Twitter bot Moth Generator by capturing its potential as a machine for moth creation; its interactivity in social media context; and its outputs as a collection of digital images.
In previous writing about Moth Generator, I had framed this project as a way to preserve the potential for new moths and the bot’s interactivity because the artists valued these qualities most. But perhaps it’s possible, or even preferable, to reframe these qualities as aspects to capture on behalf of users.
A comment on the statement of significance pointed to the role of scarcity in the significance and preservation worthiness of digital material. In some ways, the artists’ ability to experience moth generator is less “fragile” than those of users. The artists own their code; they control the means of moth production, including the ability to generate images at a higher quality than the bot publishes. Users, on the other hand, have zero control over the moth-drawing algorithm, limited insight about how it works (and then at conceptual level only), and no real sense of when a moth will come again. These elements characterize the delight and anxiety of Moth Generator from an audience perspective, and their longevity is less sure. After considering possible ways to capture moths as people encounter them, I’ve chosen instead to focus on collecting the elements of Moth Generator that will make it possible, in future, to recreate the sense of serendipity, beauty, and strangeness that users found compelling about the Twitter bot.
The process began with sketching a quick take on the authenticity- and access-related qualities of Moth Generator, following the grid created here for Geocities. It was quickly apparent that no solution might exist or emerge to cover the gamut of levels of authenticity, much less provide easy access. Given the tools and resources that exist today, it appears that collection and access strategies will both need to rely on triangulation. I’ve looked to Henry Lowood’s assertion that “digital repositories should consider the Authentic Experience as more of a reference-point than a deliverable, as a research problem rather than a repository problem,” as support for a triangulation approach in documenting user experiences.
Preserving Moth Generator’s potential means that the collection will include the source code for the drawing program behind the generator. Preserving interactivity means crawling or recording a the Twitter feed as it looks and behaves today, as well as acquiring tweet datasets reflecting how others interact with the bot. Preserving Moth Generator as a collection of digital images means extracting all images published to the Twitter feed, maybe even versions of Moth Generator outputs published elsewhere (perhaps as GIFs). Finally, material such as artist interviews and screenshots of programming errors (collected by the artists) provide creative context for the bot and its output.
Four groups of material comprise the contents of this collection. The following sections walk through one possible plan for collecting and preparing these materials for long-term preservation and reuse, including a number of challenges and decision points.
As described in the statement of significance for Moth Generator, the artists see this project as, at heart, a drawing program that creates images from text input according to certain rules. Preserving the source code for this algorithm, along with any artist-produced documentation, is necessary to maintain the potential for creating new moths — a key to the experiences of both artists and audience. Managing the source code also allows for repurposing or recontextualizing all or part of the drawing program in future. Whether embedding moth generation in a different publishing program or game, or remixing the program and rules to create new kinds of drawings, viable and well-documented source code can help extend the reach of the Twitter bot.
A version-control system like Git could be extremely useful for maintaining both code and documentation. GitHub is a popular Git repository hosting service for making code available, and publishing the drawing program there would make it easier for others to update, fork, and reuse. However, since the artists have thus far kept the code behind Moth Generator mostly hidden, and plan to reuse it themselves in future projects, it’s unclear if an open, public repository would be acceptable. Moreover, as has been frequently noted by archivists, putting something on GitHub is not digital preservation; it can’t be the only way to keep source code for the long term.
Preserving interactivity means capturing the Twitter feed as it looks and behaves today, as well as gathering evidence of how Moth Generator and its audience interacted. While it may not be possible at this time to preserve in full what it’s like to encounter and interact with Moth Generator as a Twitter user, it’s possible to triangulate the experience by adopting several complementary strategies.
First, capturing a WARC (web archive) file of the @mothgenerator Twitter feed using the WebRecorder application produces a file equivalent to a recording that can be played back in a browser. While not all websites render correctly in WebRecorder, Twitter feeds usually benefit from its ability to capture and render dynamic content. WebRecorder’s autoscroll feature allows one to capture an entire Twitter feed by automating the process of scrolling to the end of the page and prompting Twitter to load more data. WebRecorder helpfully records individual tweets as well as the overall feed, which means that comments, retweets, and “favorites” (formerly a star, now a heart icon) are also documented. A sense of shared appreciation with other Twitter users is important to Twitter bots’ interactive appeal, whether or not users actually connect with one another beyond favoriting or retweeting the same thing. Recording the @mothgenerator Twitter feed grabs one view, from the users’ perspective, of how Moth Generator might have engaged its audience.
Collecting and preserving tweet data is another way to document Moth Generator’s interactivity. Think of it as collecting the evidence of ongoing interactivity. A large number of tools exist to help capture social media data for preservation and research. For this project, twarc — a versatile command-line tool for searching and filtering tweets by making calls to Twitter’s Search API — suggested itself as a good fit. Tweet data is returned as JSON, a structured format from which it’s possible to extract different kinds and combinations of tweet content and metadata. Here’s how to submit a call for tweets mentioning “mothgenerator”:
twarc.py –search mothgenerator > tweets.json
Twitter users submit text to be transformed into moths, retweet moths, and mention @mothgenerator in tweets. Each type of interaction could potentially be captured in a tweet data set.
It’s also possible to use twarc and the Search API to acquire data for all tweets published by the @mothgenerator account. Using the stream from user option can produce a data set with the potential to support derivative works:
twarc.py –follow “3277928935” > tweets.json
Unfortunately, data collection tools that rely on Twitter’s API are not a viable option for capturing tweet data from the lifetime of the @mothgenerator account. Twitter’s Search API “searches against a sampling of recent Tweets published in the past 7 days.” Two key items to note are the retrospective time limit — no tweets older than 7 days can be retrieved — and the word “sampling.” As Ed Summers (the developer behind twarc) points out in a blog post, this troublingly opaque statement lets us know that limited tweets are available, but not how what’s available is selected.
If given permission from the artists to access the @mothgenerator account and download an archive of tweets past, it will at least be possible to fill in outbound tweet data created by the bot. Twitter archives provide both JSON and CSV formats. twarc can be used to capture tweets to and from @mothgenerator going forward, but past interactivity may be mostly limited to what can be obtained indirectly through WebRecorder. Twitter offers access to a Full-Archive Search API as a paid service, investing in which may not make sense unless a larger social media preservation program were in play.
Looking ahead to access to tweet data, Twitter’s Developer Agreement & Policy places limits on the quantities of complete tweet data that can be made publicly available. Up to 50,000 public tweets can be provided for download in a spreadsheet, PDF, etc. When it comes to JSON data, the policy reads, “If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs.” Users of this data would need to use these IDs to retrieve additional data from the API. (See the Be a Good Partner to Twitter section for details.) While the number of tweets involved in this project may not even come close to 50,000, providing data in both forms offers the best combination of manipulability and completeness under the circumstances.
Creating a digital collection
According to this map of a tweet, images posted to Twitter appear in the text field, their URLs truncated to pic.twitter.com… With access to @mothgenerator’s tweet archive and/or to JSON tweet data going forward, it’s possible to filter data sets down to tweets containing images, and to extract those links from tweet texts. Each image will also be recorded in the WARC file of the @mothgenerator feed, and will appear in the browser replay. But I’m still looking for a reliable and efficient way to acquire nearly 4,000 individual images as JPEGs, either from WARC files or from their URLs.
JPEGs are not a highly recommended long-term preservation format for images but, as pointed out in this comment, the choice of Twitter as a publishing platform means that only JPEGs of a certain (low) quality are made available to @mothgenerator’s audience. Batch migrating low-quality JPEGs to TIFF won’t improve their resolution, but it may be necessary for long-term preservation. While individual images garner varying reactions on Twitter, the moths also matter to users in aggregate. So it’s important to ensure that they survive in bulk, as a collection, rather than cherry-picking personally compelling examples.
Committing to preserve the JPEGs as they appear on Twitter may also forestall any concerns the artists may have about ownership of @mothgenerator and its outputs. If restrictions are established on access to and reuse of the drawing program, the artists would maintain the exclusive ability to produce and sell high-quality prints of moths in future. (This is only an example.) Lower-quality images would not then be seen as detracting from future work and its profitability.
Revealing creative context
Finally, this collection intends to contextualize Moth Generator through the acquisition of material documenting the creative process behind it. Interviews with and essays by the artists — many of which were used to research the statement of significance — will be captured via WebRecorder along with any reader comments. A list of 10,000 Latin moth names and 4,000 English names were collected via web crawler and currently seed the random generation of moth names. The lists and the web crawler (if built from scratch or customized in any way) may also contribute to this collection. Artist Katie Rose Pipkin has referred to a personal trove of programming error screenshots she has collected throughout the bot-building process. These screenshots offer a glimpse behind the curtain at the intellectual and emotional labor that led to Moth Generator.
Among the major preservation strategies raised in Rinehart & Ippolito’s Re-Collection, re-use and reinterpretation are the most tantalizing, and seemingly most radical. The fear of somehow being untrue to the spirit of a work, or to an artist’s intent, make these approaches look riskier than others. A few writings on digital sound and moving image hint at what it means to de-center the artist in preservation and looking to users for cues.
In Jason Eppink’s history of and interview with The Signal about GIFs, we see work consistently distanced from its creators. This is in part because the origins of images aren’t so easy to trace on the internet and in part because, as Eppink says in the interview, “There’s still very little to gain from making GIFs.” He goes on to say, “We expect the image to have an author because of the fundamental relationship of authorship to the economics of producing cultural artifacts. But today images are as cheap and prolific as the air that we utter our words with.” GIFs, to Eppink, manifest the near erasure of authorship by use and reuse. He offers an extreme vision of looking beyond artists for the primary stakeholders in preserving digital art. It makes me wonder if the gain in GIFs might lie in distribution: like how one of the interviewees in this Off Book video about YouTube describes people sharing funny internet things as “wanting them associated with their identity.” And sometimes social capital morphs into something higher-risk, as with feuds over meme-sharing and -stealing on Instagram.
Here’s a bit from Jonathan Sterne’s chapter “Format Theory” that I took as further reason not to focus too narrowly on creators and intention: “Because these kinds of codes [underlying formats] are not publicly discussed or even apparent to end-users, they often take on a sheen of ontology when they are more precisely the product of contingency” (p. 8). In other words, things aren’t necessarily made a certain way out of values- or meaning-based reasons. What contingency kludges together, specific use can improve or infuse with meaning. I was also struck by the contrast Sterne draws between the “ubiquitous,” “banal,” and “pedestrian” presence of MP3s and the passage he quotes from Lisa Gitelman ending, “Specificity is the key.” Pervasive technology might mean shared experiences, but not identical ones. Maybe format theory is best served by comparative studies or format / use genealogies, highlighting divergence as well as trends. There are local and individual variations, and variations on those variations — GIFs on GIFS on GIFs.
The pieces mentioned here intersect, in my mind, with a talk Jarrett Drake gave this week about archival description. He argues that archival practice is due to stop privileging provenance and move towards a new organizing principle (or principles). Provenance is about creatorship; valuing it above everything else results in archival description that centers records creators and the relationships between them. Archivists might propose to “collect more broadly,” but inviting the oppressed and underrepresented to participate in oppressive systems does little to effect change. How, instead, to open participation in rebuilding and reworking archival principles? His answer, deliberately not providing an answer:
“The truly transformative principle that is needed for archival practice and archival description cannot come from one person or from one invite-only forum, but such a principle necessarily must develop organically, slowly, and anti-oppressively with a radical cross-section of academic, disciplinary, racial, ethnic, gender, cultural and class backgrounds represented. In this sense, a new foundational archival principle, should it be worth anything, must be developed beyond the bounds of the archival profession.”
Reading this while thinking about sound and moving image distributed via browsers and apps, the collapse of user and creator categories is a major factor that could shape new kinds of archival description. YouTube is celebrated not only as a “wild west” of user-uploaded video, but also as fertile ground for new brands and businesses. How users relate to digital objects, user-creators, YouTube production companies, and each other seem to be the most important aspects to capture. It also seems worth exploring how digital objects relate to one another with or without the intervention of people. None of these phenomena can be adequately reflected in provenance-focused archival description, making digital art curation a more valuable site than ever for experimentation and enrichment.
Moth Generator (@mothgenerator) is an interactive, multi-faceted, collaborative digital artwork by Katie Rose Pipkin and Loren Schmidt. The following statements illustrate its complexity and set the stage for an eventual preservation plan for this work:
Moth Generator is:
A Twitter feed where moths are regularly published and @replies are used as moth-generating text
A collection of computer-generated moth images and names, including looping animations created from generated moths and reused for other purposes
An element of a complex virtual world project
A collaboration between a game designer and an artist whose work deals in large part with code and bots