XKCD Preservation Project Reflection/Review

The project to preserve the web comic XKCD had some interesting turns, and results at the end of it all. The beginning the goal was to preserve the webcomic using the internet archive quickly changed.  Instead I ended up creating an archival model and an AIP to go with it and learning a couple of interesting things along the way.

Review: What is XKCD and why preserve it?

XKCD is a unique web comic created by Randal Munroe, a physicist who worked at NASA before moving to work on XKCD full time.  The webcomic first launched in 2005 and has had regular comics every week since then.  Due to its unique focus on science, mathematics, and other intelligent fields in addition to relationships and philosophy XKCD has an avid following amongst a number of communities.  This and the web comics significant characteristics make this comic valuable and worth preserving for the future.

Project Results/Reflection

The results of this project is an effective plan and model for the preservation of XKCD using the Internet Archive’s Wayback Machine. However in addition to creating this model and AIP for preserving XKCD I also learned a couple of things that I did not expect to. During this project I learned about how varied archival information packages (AIPs) can be, how important having good discovery at your archive is, and that the archival model can be just as important, possibly even more important, as the AIP itself.

Going into this project I automatically assumed that archival information packages were something large, complex, and time consuming to make.  I figured that the AIP consisted of things like complex metadata, both technical and not, in addition to things like authority, policy, purpose.  What I learned that the AIP can vary dramatically in complexity due to what is being preserved and how it’s being preserved in the first place.  It turns out that the nature of the item can greatly affect how complex the AIP needs to be.  For example authority for XKCD did not need to be included in the AIP because the comic’s creator has already declared the comic available for public use.  Another thing that surprised me was that the method of preservation affected the AIP as well.  It turns out that limitations in the method of preservation can make parts of the AIP completely useless and even a waste of time.  Learning this made me realize that you have to know and understand what you archive or chosen institution can do when designing you AIP or you can end up making something that cannot be used.

The second thing I learned working on the project is that discovery for archives is extremely important. Not only is it the only way for people to find and discover things in an online archive it is also important from a preservation perspective.  When one of you goals is to make sure every entry in the archive is functional being able to find those entries because extremely important.  This is because you cannot replace or repair broken entries if you cannot find them.  Unfortunately when working with the Internet Archive I regularly found myself confused due to how the archive organizes its entries and metadata to the point of second guessing myself. I had to recheck to make sure I was right in that something was missing or broken a number of times and had actually made a mistake once or twice during the project.  This issue has given me a greater appreciation for good discovery features and the archives that have it because not only is it important to users but also to fellow archivists.

The last thing I learned from the project is that the archival model can be just as, if not more, important as the archival information package.  While the information package might be what you submit to the archive in order to preserve a record to the desired degree I came to realize that how you make that AIP and the steps to reaching the project goal can be just as, or even more, important than it.  This is because by itself the AIP is nothing but a package of information, you need to know what to do with it, how to make it, and most importantly what to do if it does not work.  In effect the AIP is only a small piece of a larger puzzle that the archival model tells you how to solve.  I did not expect this going to the project, believing that the AIP was paramount to the success of the project.  However I quickly realized that the AIP alone was not enough to properly ensure the webcomic would be preserved and that more was necessary to make that happen.  That thing turned out to be an archival model, which solved the problem nicely.  What I learned from this is that the AIP is not the only thing that matters and that it is actually part of the greater process of preserving something.


          In conclusion the goal of the preservation project was to preserve the XKCD webcomic because of its cultural value.  This was accomplished using the Internet Archive’s Wayback Machine.  While working on this project I learned a number of things. I learned that the AIP can vary due to both the material and the preservation method.  I learned that the archival model, or the preservation steps, can be just as important or even more important the archival information package itself.  Finally I learned that the discovery services at the archive are extremely important in understanding what they have, what they do not have, and what needs to be replaced.  All in all I believe that I learned allot from this project and that it was worth completing.

Archival Model and Information Package for XKCD

Archival Model

In order to ensure the project’s success in preserving XKCD the project will follow an archival model.  This model helps the project succeed by creating a set of instructions and steps that, when followed, will successfully preserve the web comic. While this model does not cover extreme circumstances or niche issues it does cover all the problems the project is most likely to encounter.  Below are the four steps in the archival model that the project will follow.  Each step covers an important task that needs to be done in order for the project to preserve the web comic to the required standard.

  1. Collect designated materials from the XKCD webpage, this being the comic entry’s URL, title, and date. This can be done using either a dedicated crawler, one specifically designed to regularly crawl the XKCD webpage only, or a person.


  1. Compile these URLs in a standing list to be used in the preservation process. Then review the XKCD Collection at the Internet Archive and note any broken or missing entries.


  1. Create and submit a package of URLs for the missing or broken entries. Submit package to the Internet Archive Wayback Machine.


  1. If there is any issue in submitting an entry to the Wayback Machine submit it to Archive it.



Each of the four steps in the project’s archival model has been designed to be the most functional solution to the major challenges posed to the project and are necessary for the project to succeed.  Each one has an important role in the success of the project that is not covered by any other step. If any of the steps were removed or skipped it would result in the project not meeting both its standards of quality and successfully preserving the web comic.

Step one is designed in such a way in order to guarantee successful regular captures of the web comic.  Due to how the Internet Archive works, the only object the project needs to capture for the archival information package is the comic pages unique URL.  This is will be accomplished by using a dedicated crawler that will go over the XKCD website regularly and capture any new URLs it encounters.  However, crawlers are not perfect and can miss things on occasion, such as an entry. Missing entries is something to be prevented as it prevents the project from accomplishing it goal.  Therefore the project will also have a human staff, or volunteer, regularly go over the XKCD website and collect new URLs as well in order to guarantee proper capture. The comic entries title and date are captured because they are important metadata for the archive.

Step two is done in order to properly create an archival information package to be submitted to the archive.  Because the Internet Archive is also working to preserve the comic it critical to know what they already have and what they actually need.  This increases the projects productivity by avoiding doing work already done. It also informs us on what we entries we need to replace if any are broken. This is important because that suggests the entry is incompatible with the Wayback Machine and needs to be submitted to Archive It.

Step three is a direct follow up of step two.  After going over the collection in the Internet Archive the project has the information necessary to compile an archival information package to submit to the archive.  This information package will consist of the unique URLs for each of the comic entries.  This information package excludes the collected title and date metadata because that is built into the URL address.  The title and date are collected in case it becomes necessary to submit entries into Archive It which does make use of that metadata.

Finally step four is a step that only occurs in the cause there is a problem with the Wayback Machine.  Despite the quality and the overall simplicity of the Wayback Machine’s system there are things that it cannot do. Things like embedded images and video can be problematic and not work in the Wayback machine depending on how they work.  This results in that entry failing to function properly and therefore not being preserved to the projects standards.  In order to avoid this problematic entries are submitted to Archive It using their unique URL with their respective title and date being used as metadata to help identify each entry.  This method should effectively preserve ever the most problematic of XKCD comic entries and ensure the web comics preservation.


In conclusion each of the projects archival model steps meets an important need of the project and helps accomplish the projects goal of preserving XKCD.  Without each of these steps a critical aspect of the project would be missed entirely and the project would fail to meet its goal.  Without step one there would be nothing to create an archival information package with to submit later on.  Without step two there would be no way of knowing what entries the Internet Archive needs to have submitted or fixed and the process of actually creating the AIP would be more difficult and time consuming that it needs to be.  Without step three it would not just be more difficult but impossible to accomplish the projects goal. Creating the AIP this way makes it easier and more efficient to submit it to the Internet Archive.  Additionally submitting the AIP to the Wayback machine is the entire point of the project, not doing that would result in guaranteed failure from the start.  Finally step four is necessary in order to properly preserve entries that do not function properly in the Wayback Machine. Because not every entry can be preserved in the Wayback Machine an alternative method of preserving them is necessary.  Submitting the AIP to Archive It is that alternative.  All in all each of these steps in the project’s Archival Model are important and necessary for the project to succeed in preserving XKCD for the future.


Archival Information Package/Process

Archival Package: http://xkcd.com/624/   2009-8-14 Oregon

Step 1 Capturing URL



Step 3 Submitting the package


Saving XKCD for the Future: Statement of Preservation and Acquisition

After taking into account the cultural importance of the web comic XKCD created and authored by Randall Munroe it has been concluded that an effort should be made to preserve the web comic and related materials. The following statement of preservation and acquisition plan have been created to clarify and guide the preservation process.

Statement of Preservation

The purpose of this preservation project is to preserve as much of the comic XKCD as possible to the best quality we can insure.  It has been concluded that XKCD should be preserved because of its significant cultural value.  XKCD’s unique content provides valuable insight into multiple communities in addition to capturing certain facets of internet culture, making it valuable to both future researchers and the community it services. It has been decided that the best way to preserve the comic is to work with other groups that are already working towards preserving it. The group that the project decided on working with/on in order to best preserve XKCD is the Internet Archive.

The Internet Archive and XKCD

The Internet Archive is an organization whose goal is to preserve as much of the Internet as possible for cultural reasons through a variety of ways.  The method that this project is concerned with is their general web archive, which they call ‘The Way-Back Machine’.  The Way-Back Machine is an archive of website pages that records what the site looked like on a certain day. For example, if I wanted to I could look up and see what the XKCD webpage looked like on November 1st 2010 in the archive. This is accomplished by taking the webpages URL and making a permanent copy of it.  As a result this system allows the internet archive to give users a reasonably authentic experience of the website at that period of time, meeting the quality standards for the project.

In addition to this The Internet Archive has already archived a significant portion of the XKCD webcomic already.  This includes over 800 saved pages and counting. however, this collection is not complete and is missing a number of entries.  There are numerous comic panels missing from the Internet Archive,  the most notable being a period of three months in 2009 where no comics were recorded and entered into the archive.  For this reason the goal of this project is to fill in any gaps in the XKCD collection at the Internet Archive, insure that any future missed content is swiftly added to the Archive, and to make sure the entries function properly.  Doing this would successfully preserve XKCD for the future and fulfill the original intent of the project.

If there are issues in accomplishing this during the process of entering the missing XKCD comics into the Internet Archive the project will preserve those pages using the ‘Archive It’ service.  Archive It is a sister program of The Way-Back Machine and is operated by the Internet Archive as well.  It is stronger, more compatible, and more secure that the Way-Back Machine however, it is a paid service.  If it becomes necessary to use Archive It the project will seek the required funds in order to preserve the problematic entries.

Acquisition Plan

The projects plan for acquiring permission to preserve is rather simple, we operate on the assumption that we already have it.  Because the web comic is in the Internet Archive’s collection already and the XKCD homepage notes that permanent URL it is safe to assume that Munroe has already decided to permit people to archive the comic.  This is doubly so when you consider how the Internet Archive actually acquires things.  The Internet Archive acquires webpages in two ways, crawlers and personal submission.  The Internet Archive uses crawlers to regularly crawl both the internet and the websites selected for preservation.  When a crawler encounters a webpage that is not in the Internet Archive it will submit the pages URL automatically.  Personal submission works just like it sounds, people directly submit a sites URL to the Internet Archive which preserves it by making it permanent.  For this reason it can be concluded that the project has ethical and moral permission to submit XKCD webpages into the Internet Archive since literally anyone is able to do so.  However, if it becomes necessary to use the Archive It service provided by the Internet Archive explicit permission from Munroe will be sought.

In regards to how the project will acquire the actual comic that to is rather straight forward.  Because the goal of the project is to ensure the Internet Archive’s collection of XKCD is complete and has no gaps in content the method of acquisition is the same as the Archives but focused solely on the web comic itself.  The project would set up a dedicated crawler that will regularly crawl the XKCD webpage and compile a list of new URLs as they occur.  Additionally a person(s) chosen by the project will also go over both the website and crawler generated list in order to make sure no entries were missed.  The results will then be compared to the Internet Archives’ collection and if there are any comics missing we will submit the appropriate copy/copies into the archive. Finally if any of the entries do not function properly in the Internet Archive they will be submitted to Archive It.  Overall using this method should guarantee the complete preservation of the web comic.


In conclusion XKCD is consider worth preserving for the future and that is best done by working with and assisting preexisting efforts to do so.  Not only is the web comic rich with cultural of its community but it also acts as an excellent record of their values and interests, making it very valuable to future researchers.  This makes the comic worth preserving and the best way for project to accomplish that is to work with the Internet Archive.  Not only is the Internet Archive already trying to preserve the web comic it also has all of the tools, services, and permission to do so.  This project can assist in this effort by acting as both a back-up and a form quality control, catching and submitting any missing entries and ensuring they function properly.  Overall this project fits a niche in the effort to preserve XKCD that needed to be filled.

Photos and Media: The Influence of Visuals and the rise of Photoshop


A great deal has changed in the last two decades, especially in the fields of art and culture.  The information revolution brought on by the advent of powerful but affordable computers has had a huge effect on media culture as whole.  More specifically though, the role of photos and photo editing tools, particularly Photoshop, has dramatically changed and grown.

The Role of Photos in Media: Traditional and Current

…the 20th century was the golden age of analog photography peaking at an amazing 85 billion physical photos in 2000 — an incredible 2,500 photos per second. (Good)

Traditionally photos have had an important, but limited role in media.  They were primarily regulated to publications, such as magazines and newspapers, other mass produced materials, and photography as art.  While photos were used by individuals as a form of communication and expression this was highly limited due to technological. Photos used to be time consuming to make, copy, and share because the technology to rapidly make, copy, and share them did not exist.  Add in the fact that photos, while not expensive, were not cheap and the role of photos in media was limited.

– it is estimated that 2.5 billion people in the world today have a digital camera[6]. If the average person snaps 150 photos this year that would be a staggering 375 billion photos. (Good)

Information technology changed this significantly by removing the technological limitations on photo use.  Cameras are practically everywhere now and are affordable to practically everyone and easily accessed.  Additionally computers and digital technology makes copying and sharing photos almost effortless, literally only taking a press of a button.  The result of all this is that the use of photos in communication and expression has practically exploded.  According to Johnathan Good roughly 85 billion analog photos had been taken up to 2000 since the invention of the brownie camera in 1901, around 2,500 photos a second.   In comparison the estimated number of photos that will be taken is 375 billion, more than four times the number of photos taken during the 20th century, and that we have now taken 3.5 trillion photos in total.  These numbers reflect the increasing use of photos in our lives as a means of communication and expression.  Photos and images have a much higher information density than text or even audio recordings do, people get more out of seeing an image in few seconds than reading something for the same amount of time.  This makes photos and images an incredibly powerful method of communication and expression since so much can be done with them.

Photoshop and Media: The Role of Editing Tools and Software

In addition to the increasing role photos have in communication, expression, and general media the role of photo editing tools and techniques has grown as well.  Manipulating and editing photos using tools and techniques has been a long standing practice in media since photos started being used.  Originally this was done using the photo’s negatives and painting/coloring them to the desired effect.  This was done for largely the same reasons as it is today, improving and optimizing the final photo.  Magazines and other visual media products often used, and still use, photo editing and manipulation to create the desired end product.  However, due to the explosion of information technology the use of photo manipulation and editing in art, communication, and expression has grown tremendously.  In particular the role of Photoshop, a premier photo editing software, has grown to become a cultural/media phenomenon.

Adobe Photoshop, which was created in 1988, is a photo editing and manipulation tool.  Due to its versatility, quality, regular support, and ease of use it has become one of, if not the, de facto program/tool for photo editing and manipulation.  With the increasing availability of cameras and the ever increasing use of photos in media, communication, and expression the role of photo manipulation has expanded.  Because photo manipulation allows people to repurpose, add new meaning to existing photos/images, and even change a photo’s/image’s meaning entirely it massively increases what can be done with photos and images.  In effect photo/image manipulation tools like Photoshop removes most of the remaining limitations on photos and images as a medium.  The ability to make such manipulations allows people a level of freedom never seen before and its effect can be seen in social media.  The ability to create customized images allows for extremely fast and highly informative communication that spreads quickly.  Memes in particular are an excellent example of the influence of Photoshop and other photo manipulation tools.  They are extremely expressive and spread extremely quickly, far faster than most other forms of communication.


          In conclusion the role of photos, images, and photo manipulation tools such as Photoshop is bright.  In our ever increasing technological word where everything is connect and the creation of images/photos is cheap and accessible there role in media will only increase.  We have gone from making only a few billion photos in the last hundred years to making over three hundred billion in a single year.  The advent of cheap and available photo/imaging technology has spurred the adoption and expansion of Photo/image manipulation tools such as Photoshop.  This in turn has increased the role photos and images have in media even further.  Taking into consideration the increasing importance of information technology the importance of photos, images, and photo editing tools will only increase going forward.

XKCD: A Smart Web-comic


XKCD is a unique web comic created by Randal Munroe, a physicist who worked at NASA before moving to work on XKCD full time.  XKCD launched in 2005 and has had regular comics every week since then.  Due to its unique focus on science, mathematics, and other intelligent fields in addition to relationships and philosophy XKCD has an avid following amongst a number of communities.  This and the web comics significant characteristics make this comic valuable and worth preserving.

Who cares? User Community

Before getting into what makes XKCD valuable and worth preserving it is important to explain to whom the comic is important to.  Due to the comic’s nature and accessibility it is difficult to definitively define its user community.  Rather there are a number of ill-defined groups that make up XKCD’s user community. For starters there are the people who follow and read internet comics.  This is the broadest group that would be a part of XKCD’s community and be one of the vaguest.  Another broad group would be intellectuals, especially those in the sciences, who better understand the more intricate comics.  More specific groups would be the people who followed Munroe’s other works, like What If? and his books, as well as community websites, such as the XKCD Reddit page.  These people form a significant and dedicated community that value the web comic and would desire to see it preserved for the future.

Why is it special? Significant Characteristics

Now that XKCD’s community has been covered it is time to explain what makes the comic so valuable to its community and worth preserving.  There are allot of significant characteristics that makes XKCD valuable and worth preserving both for its community and for researchers.  These characteristics are important because they provide historical content and context valuable to researchers and its current community. (Microsoft pdf)  They can be divided into two categories, cultural characteristics and technical characteristics.

Cultural Characteristics

            The two significant cultural characteristics of XKCD that make it important to preserve are its content and the cultural impact it has had.  One of the most important things about the web comic is its content.  Unlike most other web comics XKCD, as stated above, features science, math, and other intelligent fields/studies in its content.  It even refers to itself as “A webcomic of romance, sarcasm, math, and language”.  This focus normally takes the form of either a form of smart humor related to the field, such as the comic “Universal Install Script”, or in a form that is thought provoking, such as the comic “Doomsday Clock”. This kind of content is unique to XKCD and is one of the comic’s core features that makes it so popular and is why it has acquired such a dedicated following and community. This is also why the comic should be preserved.  Because there is no other comic that has the same content as XKCD not only does it acts as a record of web comic culture but as a cultural artifact of its user community.  If XKCD was not preserved, then this information would be permanently lost.

The second significant cultural characteristic of XKCD is the cultural impact the web comic has had.  For starters the web comic has directly influenced Munroe to create other content including a number of books and “What If?” a site where Munroe answers questions like ‘what would a mole of moles be?’ in full detail, however extreme, or hilarious, that might be.  It has also resulted in the creation of an active Reddit community and thread on the web comic itself. Finally XKCD has spawned a number of cultural ‘homages’ among followers and fans.  An example of this is how a number of programs, such as Siri, call back the XKCD comic “Sandwich” by having that command and response function as they do in the comic.  Another is the game called Geohashing which was invented in the comic of the same name.  None of these things would have been made, or happened, without the original XKCD comic and that relationship should be preserved as it marks.

Technical Characteristics

In addition to its significant cultural characteristics XKCD also possesses numerous significant technical characteristics.  Because XKCD is a web comic it has access to unique tools and techniques only available on the computer that Munroe uses in making the comic.  To begin with the most common significant technical characteristic XKCD has is mouse over text, text that appears when you mouse over the comic, which gives more information about that particular comic.  This characteristic is present in almost every XKCD comic.  Another significant technical characteristic is scrolling and infinite scrolling, such as in the comic “Pixels”, where the comic can be scrolled, sometimes even infinitely.  Additionally there are also comics that have the significant characteristic of being expandable such as ‘Gravity Wells” and ‘Lakes and Oceans’.  The underlying theme of these significant characteristics is that they allow users to interact with the comic rather that only experiencing it passively.  While it is possible to do this with traditional comics it is both difficult and cumbersome.  Web comics, on the other-hand, can accomplish this with relative ease due to how their medium, i.e. computers, function. This allows web comics to explore different methods of engaging their community and increasing interactivity.  In this regard XKCD is an excellent example because practically all of its content is interactive in some form or another.  As a result XKCD encapsulates what makes a web comic unique from traditional comics.


In conclusion the XKCD is worth preserving for a variety of reasons.  It has a substantial community made up of a number of different groups which values and avidly supports it.  XKCD’s unique sense of humor and subject matter combined with the cultural impact it has had gives the comic valuable cultural characteristics.  In addition to its significant cultural characteristics XKCD also possesses significant technical characteristics such as mouse-over text and interactive comics. Together these significant characteristics makes XKCD valuable culturally and technically and make it worth preserving for posterity and future research.