The Three C’s of Digital Preservation: Contact, Context, Collaboration

Three big themes I will take from learning about digital preservation: every contact leaves a trace, context is crucial, and collaboration is the key.

“Every Contact leaves a trace”

Matt Kirschenbaum and an optical disk cartridge in 2013.
Matt Kirschenbaum and an optical disk cartridge in 2013.

Matt Kirschenbaum’s words (or at least his interpretation of Locard’s words) will stick with me for a long while.  That when we will look at a digital object for preservation, we need to consider what it is we are looking at, and know that what we see is not necessarily all that there is.  Behind the screen there is a hard drive, and on that hard drive are physical traces of that digital object.  There is a forensic and formal materiality to digital objects – what is actually going on in the mechanical/physical sense versus what we see and interpret from those mechanical processes as they are converted to digital outputs.  We cannot fall into the trap of screen essentialism – of only focusing on the digital object as it is shown on our screens, without taking into consideration the hardware, software, code, etc. that runs underneath it.  

Which leads into my next point, about platform studies.  I am really intrigued by this idea that as digital media progresses we are seeing layers and layers of platforms on top of platforms for any given digital object.  The google doc that I wrote this blog draft in is written using Google Drive (a platform), which is running on my Chrome browser (a platform), which is running on Windows 7 (a platform).  These platforms can be essential to run a particular digital object, and yet with platforms constantly obsolescing or upgrading or changing, these platforms cannot be relied upon to preserve all digital objects.  Especially since most platforms are proprietary and able to disappear in an instant.  For example, my Pottermore project was spurred by the fact that the original website (hosted on the Windows Azure platform as well as the Playstation Home) had vanished and was replaced with a newer version.  If I had more time I would have liked to further develop the project by exploring the natures of the different platforms used by Pottermore, like Windows Azure and Playstation Home, and how those platforms influenced the experience of the game.

Context is Crucial

If content is king, context is queen!
If content is king, context is queen!

There’s no use in saving everything about a digital object if we don’t have any context to go with it.  Future researchers who have access to the Pottermore website files can examine them thoroughly and still have no idea why Pottermore was so important.  For this reason it is important to capture the human experience with digital objects.  Whether using oral history techniques or dance performance preservation strategies, there need to be records that try to capture the experience of using the digital work.  This can include interviews with the creators, stories from the users, Let’s Play videos, the annotated “musical score” approach so that a work can be re-run in a different setting.

This is really what the Pottermore project was about: providing context to the website that is all but lost to us.  In case the game does reappear, there will not be materials like the Pottermore Wiki and the Let’s Play videos that can explain how the game was played.  Furthermore, it can help future researchers realize the sense of community of the Pottermore users, and why they reacted so negatively when the old website was replaced.

Collaboration is the Key

Pottermore was a collaboration of many different entities, including JKR, Sony, and Microsoft.
Pottermore was a collaboration of many different entities, including JKR, Sony, and Microsoft.

There are a number of roles played by different people in digital preservation, and these roles are conflating and overlapping.  The preservationist may be the user who is nostalgic for an old game and so creates an emulation program for it.  The artist may use feedback from the users and incorporate it into their next work.  The technological expertise of IT folk may need to be ascertained in order to understand how to best save some works – in what formats, in which storage devices, etc.  Archivists and librarians may be the fans themselves, contributing to the fanfiction community that they are trying to preserve.  With funding only getting tighter and tighter and the digital world growing more complex, collaboration is going to become essential for a lot of digital preservation projects.    

What next?

Best practices, next exit sign
We’ll get here eventually… right?

Of course this leaves us with many unanswered questions.  How do we balance out the roles of different experts? How do we match the large scale of digital works on a limited budget? How much context do we need to give a certain work? In almost all cases the answer is going to be “it depends.” But these are questions that I am excited to figure out as I go on in the field.  

Pottermore – the Archival Information Package

I was able to put my Preservation Plan into action by uploading a Pottermore Collection to the Internet Archive in addition to saving the collection on my laptop. Here’s a brief recap of my Preservation Plan:

  • Capture this YouTube video that announced the launch of Pottermore in 2011, saved by the youtube-dl downloader.
  • Archive the Pottermore Wikia, using their own archiving tools to download the xml files.
  • Download the images from the Pottermore Wikia separately, since the xml files don’t include them.  This was going to involve the command line method, or if that didn’t work, to curate a selection of images from the collection.
  • Save this Pottermore entry from the Harry Potter Wikia, which details the description and history of the site.
  • Save Let’s Play videos that can be found on YouTube to capture the interactivity of Pottermore, using the youtube-dl downloader.

I’ve officially uploaded what I’ve collected so far to the Internet Archive, check it out here:

Internet Archive Pottermore
What my Internet Archive collection looks like!

The first file I included was a PDF of the Pottermore entry from the Harry Potter Wikia.  This entry gives a full description and history of Pottermore.  I concluded that since it was only one entry, and the text is what matters more than anything else, a PDF would suffice.  The next folder includes a selection of images from the Pottermore Wikia.  This is what I was really happy about, since this is a feature that a lot of people enjoyed from the first Pottermore that isn’t as present in the newer version.  Since I couldn’t figure out that command line method that I had written about in my Preservation Intent Statement, which was supposed to capture all of the images from a Wiki, I had to go through one by one on the Pottermore Wikia image directory and download them.  Since there are 51 pages of images, with each page containing at least 40 images, I will be uploading one page’s worth of images at a time (as of this post, I have two pages’ worth of images uploaded to the Internet Archive). I save all of the images in their original format, which are either .jpg or .png files.  The final folder contains the XML files of the Harry Potter Wikia, which I had downloaded using the tools provided by the Wikia itself.

What I did not upload to the Internet Archive (due to copyright uncertainties) but have saved to my Pottermore folder on my computer are the videos.  I used the youtube-dl downloader to save the Pottermore launch video from 2011 as well as some Let’s Play videos to capture the experience of playing Pottermore.  All of the videos were saved in .mp4 format.

Below is a screenshot of the collection I have on my computer:

screenshot of my Pottermore collection
Screenshot of my Pottermore collection on my laptop.

I arranged the folders according to the different aspects of Pottermore that were saved.  The first folder contains the history of Pottermore, which includes the Harry Potter Wikia entry.  The second folder involves the Let’s Play videos, which capture the experience of playing Pottermore.  The next folder contains the Pottermore images, which are either in .jpg or .png format.  Some of the images are labeled either with descriptions, usually the names of the characters in the images (for example, “Hokey” or “Hooch”).  However, most of the images are named after their location within Pottermore.  For example, B1C11M1 = Book 1 (Harry Potter and the Sorcerer’s/Philosopher’s Stone), Chapter 11 (“Quidditch”), Moment 1 (“Charms Homework”).  This will help orient the viewer as to the order of images within Pottermore.  The next folder is Pottermore Launch, which includes the 2011 YouTube video that announced the coming of Pottermore.  The final folder contains the Pottermore Wikia pages in xml format.

What this collection really comes down to is trying to capture the essential elements of a website that, for our present purposes, no longer exists.  I am hoping that with the xml files of the wiki, the images that provided the interactive layers, and the let’s play videos that show how the game was played, that this goal was accomplished.

Pottermore: A Statement of Preservation Intent

It’s time to collect all of the horcruxes that remain of the old Pottermore.  Not to destroy them, but to save them.

Not that I’m saying that the old Pottermore was evil or needed to be killed off.  In fact, the situation is rather the opposite of Voldemort’s, in that here the main character (the old website) has been “killed off,” but pieces of it are still left behind.  And these are the pieces I think are worth saving.

In a magical world, I would save the original files of the website, make bitstream copies of them and save them in different places, including open source cloud storage and on hard drives.  I would interview the original Pottermore team, including JK Rowling, Sony, TH_NK the UK digital company, and Windows Azure in order to document the creation of such a unique project.  Also in this magical world I could pull a Fawkes and resurrect the old Pottermore, by bringing it back under another URL and hosting it on the same Windows Azure platform (or ideally an open-source platform) so it could coexist with the new version.  Old users can finish up their journey through the books, and new users can begin theirs, and Pottermore could live on longer than Nicolas Flamel.

Unfortunately, no Alohomora spell is going to unlock the old Pottermore website anytime soon; it seems to be kept under tight lock and key by JKR and her Pottermore team, with very little chance of ever seeing it again.  Several snapshots of the website are preserved on the Internet Archive using the Wayback Machine.  However, the functionality and interactivity is removed from it.  So someone can see what the website looked like, but even then sometimes it doesn’t load properly.  Therefore, I have decided to go after the “horcruxes” – the magical traces of Pottermore’s soul left scattered across the internet.  And thus follows a plan…

My ultimate goal is to collect the pieces together in one place, not to destroy them (as Harry did to the horcruxes), but to preserve them.  So in the future, when fanatical Harry Potter historians like myself want to study all things Harry Potter, this will be available to them.  Especially since it is JK Rowling’s first contribution to the online world of Harry Potter.  What I’m especially trying to capture is the context surrounding Pottermore, including the user’s perspective and the users’ reaction to the disappearance of Pottermore.  This way in case the old website is ever resurrected, there will be enough materials to show future users/researchers how it was played and experienced.

The first step is to save this video released by JK Rowling announcing the launch of Pottermore in 2011.  The original video released by Pottermore is no longer available (it has been turned to “Private”) but this was the highest quality one I could fine.  This was the first peek into what Pottermore was – a hint about “a reading experience unlike any other” involving the author and the reader. I have actually already saved this by downloading it using ClipConverter, which allowed me to download it in .mp4 format, and that I now have saved in a folder called “Pottermore” on my desktop.  

The next step is to archive the Pottermore Wikia.  This was pretty much a step-by-step guide to everything that could be found on the old Pottermore – you can follow moment by moment to see all that can be collected and done on the site.  Luckily they have their own archiving tools that I can use under a Creative Commons license.  This archiving tools includes the current pages and the history of each page.  The wiki is downloaded into a compressed XML file, which I can then decompress with a tool like 7-zip.  

The end of year feast and the house points from the beta version of Pottermore, taken from the Pottermore Wikia, 2011.
The end of year feast and the house points from the beta version of Pottermore, taken from the Pottermore Wikia, 2011.

The images from the archive would need to be archived separately.  There are 51 pages of images, which adds up to over 2000 images, so I haven’t decided how to go about doing this.  I have found one wiki page that seems helpful but in case that doesn’t work out my plan is to make a selection of the highest resolution images from a variety of Pottermore moments and save them in JPEG format.

I would also like to save this page, an entry on Pottermore from the Harry Potter wiki, which gives a very detailed history on the launch of Pottermore, the revisions and changes done over the years, the full results of all eight House Cups, and the change from the old Pottermore to the new one.  Essentially, it provides all of the background context I need to support the other materials in this collection.  Since there is only one page that I want to save (as opposed to an entire Wiki) I have saved it as a PDF and have added it to my Pottermore folder.

Next would be to capture the interactivity of Pottermore.  Fortunately there is a lot of documentation out there that records people’s experience with Pottermore.  These include Let’s Play videos and subreddit posts, I will archive Let’s Play YouTube videos like this one in the same manner I used for the Pottermore Announcement video, downloading them as .mp4’s and saving them to my Pottermore folder.

There is also an entire subreddit r/Pottermore that was full of posts with troubleshooting questions, favorite moments, glitches, etc. that I would like to capture.  I have posted in this subreddit asking everyone what was important and/or special to them about Pottermore.  I would then save the replies to this post, probably as a PDF.

The final step: once I’ve downloaded all that needs to be downloaded and have all of the files saved on my computer (and probably on a USB drive), I will upload them to the Internet Archive as a Pottermore collection.  I probably won’t include the YouTube videos due to copyright issues, but the Wiki pages, images, and the Reddit posts will be saved there.  I’ve just signed up for an account with the Internet Archive, so this week I will try to become more familiar with the platform as I save/download all of the materials for my future collection.  Additionally, I’m working out how to include annotations or metadata to give more context to the materials I’m uploading – descriptions for the images and the videos specifically.  Now if only I had a magic wand that could do all this work for me… 

A one-way ticket to Hogwarts: the old Pottermore


Pottermore logo from the site's launch in 2011
Pottermore logo from the site’s launch in 2011.

Launch of Pottermore

In June of 2011 JK Rowling announced a new, online way to experience Harry Potter – Pottermore.

The idea behind it was to create an interactive eBook whereby new young readers (along with older nostalgic readers) of Harry Potter could follow the books while interacting with the gamified aspects.  JK Rowling initiated a Magical Quill challenge that allowed one million lucky people to gain early access to Pottermore as beta users.  The general site launched in April of 2012.  The way it worked was that any user could register for an account, and after taking a quiz would be sorted into one of the Hogwarts houses and would receive their own individual wand.  Then their adventure into the books could begin.  Between April of 2012 when the site officially launched and 2015, all seven books were released.  Each book was broken down into chapters and each chapter broken down into “moments” or illustrated scenes.  Below is a Let’s Play video showing you one of the first moments of Book 1 Chapter 1.  

Each moment had several zoom layers in which you could click around to collect various items, as well as a summary of that scene from the books and annotated blurbs from JK Rowling providing extra backstory to the characters or settings.  In addition to moving through the moments, users could brew potions and participate in duels.  Pottermore was also the first (and to this day, the only) place one could purchase the official eBook editions of the Harry Potter series.

New Pottermore

In September 2015, the old Pottermore was replaced with a revamped new version ( which removed all interactive gamified features.  You can now sign into your own account and still get sorted and get your wand, but aside from that you can only look through the Buzzfeed-esque website for JK Rowling’s writings and articles published by the “Pottermore Correspondent.” This Mugglenet article puts it this way: “Basically, they seem to have gotten rid of many of the features that made Pottermore more than just another fansite.”  The pros of the new website is that it focuses more on JK Rowling’s writings (which is what a lot of Pottermore users liked the most about the old Pottermore), it can keep us updated on new upcoming Harry Potter happenings (like the new play coming out this summer), and there’s a rumor going around that soon a Patronus quiz will be available (still waiting on that).  However, the new website provides a very different experience indeed from the old Pottermore.

Why the old Pottermore is worth saving

I believe the old Pottermore site is worth archiving from several angles.  First, it serves as an important milestone for the history of the cultural phenomenon that is Harry Potter, marking the first time that the Harry Potter books were available in eBook form.  Second, the site serves as a unique instance of an author converting her original printed work into an online experience, so it provides an interesting study of the crossover between literature and online gaming.  

The website would be interesting to study from the point of view of a historian of cloud computing or software development.  As outlined in this Microsoft article, the initial beta version of Pottermore was built using Windows Server; however, it quickly became evident that a much larger scale platform would be needed for the anticipated Facebook-level numbers of users.  The team chose Windows Azure as their solution because it offered a Platform-as-a-Service (PaaS), meaning that Pottermore could be moved from the Windows Server to Windows Azure without having to manage and maintain virtual machines.  The ideal archiving situation for this, of course, would be to get a hold of the original Windows Server files as well as the newer Windows Azure files and all of the documentation that goes with it.  However, since all versions of the old site are completely removed and under tight lock and key by JK Rowling and the Pottermore team, this won’t be possible.  

From an ethnographic/cultural historian’s perspective, it is just as important to capture the documentation of the experience of engaging with Pottermore, and this will be a lot easier for me to accomplish.  Luckily, there was a lot left behind.  And I believe it is crucial to collect what I can, because just in case at some point in the future JK Rowling and/or Sony decide to release the old versions of Pottermore, it would be useful and important to preserve the supplementary materials that would provide more context as to how it was originally used.  Lowood, in his discussion about preserving virtual worlds, asserts that it is important to capture the “subjective level of experience within communities” when it comes to virtual worlds.  Although Pottermore technically isn’t a virtual world, I think this still applies.  

In terms of supplementary materials, the main source of information regarding content kept on the website and how it was played was the Pottermore Wiki.  This wiki served as a game guide, created and maintained by dedicated Pottermore users.  It’s organized into chapters, locations, items, and characters.  The content includes JK Rowling’s annotated blurbs, the various objects that could be found in each moment, and images and screenshots from the game.  There is also a page dedicated to Pottermore on the Harry Potter wiki outlining the history of Pottermore from its announcement to present and its features (old and new).

There were also subreddits created like r/pottermore and r/pottermorewritings which would be helpful sources for stories and comments from the users about their experience with Pottermore.  The Pottermore Writings subreddit is especially useful since it has archives posts of JK Rowling’s writings from the old Pottermore in a navigable fashion.  In addition, there are Let’s Play videos such as the one earlier in this post showing the interactive aspects of Pottermore, including zooming through the moments, duels, brewing potions, and earning House Cup points.

As one Pottermore fan put it on the Harry Potter subreddit, “the whole point of Pottermore […] was getting to have an experience that was as close as I was ever going to get to going to Hogwarts.” The old Pottermore was a very unique experience in allowing Harry Potter fans to walk in Harry’s footsteps, exploring the books in an interactive digital way straight from the author herself.  And because the old site itself is lost to us for now, I believe it is essential to capture the traces of Pottermore left behind.

Hogwart's express; one of the moments from the old Pottermore.
Ready to begin this new adventure of archiving Pottermore!

Why is who saving what, and how?

It seems that when it comes to preserving born digital works, certain questions need to be raised.  In fact, a lot of questions need to be raised since there is no established consensus on which formal framework to use.  There’s the question of “who,” involving the roles different people play in the lifetime of a work.  This includes the artist, the curator, the preservationist, and the consumer/audience. Next there’s the “why”: what makes this work worth saving, and why did we choose certain components of the work to save? Next comes the “what” part: what exactly do these groups decide to save, and what is it that we are actually saving about this work? And finally there’s the “how”—putting a preservation plan into action.

The “who”: Creators, Curators, Conservators, and Consumers

First comes the artist, who creates the work.  The artist makes the initial creative decisions that make his/her work unique, whether intentionally or incidentally. Next comes the curator, who decides that the work is worth collecting and exhibiting and defends the work’s significance.  After that is the preservationist or conservator, who determines what to preserve and how.  Finally there is the audience/consumer and their role in supporting the work.

What makes born digital works so complex is that the roles of these various groups are often bleeding into each other: the artist creates an interactive work that allows the consumer to feel a sense of authorship in making unique decisions that affect the work; the conservators are now asking for statements of intent from the artists to hear their feedback on what’s significant about the work; and fans of a work can prove crucial in providing the emulation software necessary for preserving that work.

Furthermore, as Dappert and Farquhar insist, different stakeholders place their own constraints on a work.  For instance, Chelcie Rowell discusses how Australian artist Norie Neumark used a specific software called Macromedia Director for her 1997 work Shock in the Ear. The audience who experienced it originally had to load a CD-ROM into their computer, which could have been a Mac or Windows.  The preservationists chose emulation as the best method to save works like this one, and these emulators were created by nostalgic enthusiasts.  So each of these people involved placed constraints on the original work, in terms of hardware, software, and usage.  And these constraints changed from its creation to preservation. Dianne Dietrich concludes with this in regards to digital preservation:

“As more people get involved in this space, there’s a greater awareness of not only the technical, but social and historical implications for this kind of work. Ultimately, there’s so much potential for synergy here. It’s a really great time to be working in this space.”

For this reason, it is becoming more important than ever to document who is doing what with the work, increasing accountability and responsibility. Which leads to…

The “why”: Preservation Intent Statements

As Webb, Pearson, and Koerbin express, before we make any attempt to preserve a work we need to answer the “why”.  Their decision to write Preservation Intent Statements is a means of accomplishing this. For, as Webb et all say, “[w]ithout it, we are left floundering between assumptions that every characteristic of every digital item has to be maintained forever.”

And nobody has the time or resources to save every characteristic of every digital item.  At least I don’t.  To try and do this would be impossible and even undesirable for certain works, where the original hardware and software become too costly to maintain.

This leads to a discussion of authenticity. Like Espenshied points out in regards to preserving GeoCities, with increased authenticity comes a lower level of access, but with a low barrier to access comes a low level of authenticity and higher percentage of lossy-ness. In the case of GeoCities, Espenshied says,

“While restoration work must be done on the right end of the scale to provide a very authentic re-creation of the web’s past, it is just as important to work on every point of the scale in between to allow the broadest possible audience to experience the most authentic re-enactment of Geocities that is comfortable for consumption on many levels of expertise and interest.”

And that gets at the heart of why we should bother to create Preservation Intent Statements before implementing any actual preservation actions.  We need to establish the “bigger picture,” the long-term vision of a particular work’s value.  Rowell also points out that there are different kinds of authenticity: forensic, archival, and cultural.  Forensic and archival authenticity deal with ensuring the object preserved is what it claims to be (if you’ve read Matt Kirschenbaum’s book Mechanisms, you know that this can be harder than you think to achieve).  Cultural authenticity, however, becomes a much more complex issue, and explores how to give respect to the original context of the work while still ensuring a wide level of access.

And once we have decided on the best strategy, we then get into…

The “what” and the “how”: Significant Properties Characteristics

Now that we’ve established the “bigger picture,” we get into the details of exactly how to capture the work for preservation.  This is where Dappert and Farquhar come back in.  Dappert and Farquhar really get technical about the differences between “significant properties” and “significant characteristics.”  Their definition of significant characteristics goes like this:

“Requirements in a specific context, represented as constraints, expressing a combination of characteristics of preservation objects or environments that must be preserved or attained in order to ensure the continued accessibility, usability, and meaning of preservation objects, and their capacity to be accepted as evidence of what they purport to record.”

Sounds confusing, right? The way I understood it was that properties can be thought of like HTML properties for coding.  In coding, properties are simply a means of using a logical system language to define certain attributes of the website/game/whatever we are coding.  Similarly, for a digital work, the property itself is abstract, like “fileSize” or “isVirusScanned.”  We aren’t trying to preserve those properties; rather, it is the pair of the property with its value (like “fileSize=1MB”) that we want to capture, and this is what a characteristic of the work is.  You wouldn’t save a property without its value, nor would you save the value without attaching it to a property.  And significant characteristics go beyond the basic forensic/archival description of the object by capturing the context surrounding the object.  Thus, significant characteristics can evolve and change beyond the original work as the preservation environment changes and as different courses of action are taken.  And all of these changes should be documented along the way through these significant characteristics, prioritized and listed by order of importance.

The last question that remains is… is anyone else’s mind boggled by all this?