Archival Model and Information Package for XKCD

Archival Model

In order to ensure the project’s success in preserving XKCD the project will follow an archival model.  This model helps the project succeed by creating a set of instructions and steps that, when followed, will successfully preserve the web comic. While this model does not cover extreme circumstances or niche issues it does cover all the problems the project is most likely to encounter.  Below are the four steps in the archival model that the project will follow.  Each step covers an important task that needs to be done in order for the project to preserve the web comic to the required standard.

  1. Collect designated materials from the XKCD webpage, this being the comic entry’s URL, title, and date. This can be done using either a dedicated crawler, one specifically designed to regularly crawl the XKCD webpage only, or a person.


  1. Compile these URLs in a standing list to be used in the preservation process. Then review the XKCD Collection at the Internet Archive and note any broken or missing entries.


  1. Create and submit a package of URLs for the missing or broken entries. Submit package to the Internet Archive Wayback Machine.


  1. If there is any issue in submitting an entry to the Wayback Machine submit it to Archive it.



Each of the four steps in the project’s archival model has been designed to be the most functional solution to the major challenges posed to the project and are necessary for the project to succeed.  Each one has an important role in the success of the project that is not covered by any other step. If any of the steps were removed or skipped it would result in the project not meeting both its standards of quality and successfully preserving the web comic.

Step one is designed in such a way in order to guarantee successful regular captures of the web comic.  Due to how the Internet Archive works, the only object the project needs to capture for the archival information package is the comic pages unique URL.  This is will be accomplished by using a dedicated crawler that will go over the XKCD website regularly and capture any new URLs it encounters.  However, crawlers are not perfect and can miss things on occasion, such as an entry. Missing entries is something to be prevented as it prevents the project from accomplishing it goal.  Therefore the project will also have a human staff, or volunteer, regularly go over the XKCD website and collect new URLs as well in order to guarantee proper capture. The comic entries title and date are captured because they are important metadata for the archive.

Step two is done in order to properly create an archival information package to be submitted to the archive.  Because the Internet Archive is also working to preserve the comic it critical to know what they already have and what they actually need.  This increases the projects productivity by avoiding doing work already done. It also informs us on what we entries we need to replace if any are broken. This is important because that suggests the entry is incompatible with the Wayback Machine and needs to be submitted to Archive It.

Step three is a direct follow up of step two.  After going over the collection in the Internet Archive the project has the information necessary to compile an archival information package to submit to the archive.  This information package will consist of the unique URLs for each of the comic entries.  This information package excludes the collected title and date metadata because that is built into the URL address.  The title and date are collected in case it becomes necessary to submit entries into Archive It which does make use of that metadata.

Finally step four is a step that only occurs in the cause there is a problem with the Wayback Machine.  Despite the quality and the overall simplicity of the Wayback Machine’s system there are things that it cannot do. Things like embedded images and video can be problematic and not work in the Wayback machine depending on how they work.  This results in that entry failing to function properly and therefore not being preserved to the projects standards.  In order to avoid this problematic entries are submitted to Archive It using their unique URL with their respective title and date being used as metadata to help identify each entry.  This method should effectively preserve ever the most problematic of XKCD comic entries and ensure the web comics preservation.


In conclusion each of the projects archival model steps meets an important need of the project and helps accomplish the projects goal of preserving XKCD.  Without each of these steps a critical aspect of the project would be missed entirely and the project would fail to meet its goal.  Without step one there would be nothing to create an archival information package with to submit later on.  Without step two there would be no way of knowing what entries the Internet Archive needs to have submitted or fixed and the process of actually creating the AIP would be more difficult and time consuming that it needs to be.  Without step three it would not just be more difficult but impossible to accomplish the projects goal. Creating the AIP this way makes it easier and more efficient to submit it to the Internet Archive.  Additionally submitting the AIP to the Wayback machine is the entire point of the project, not doing that would result in guaranteed failure from the start.  Finally step four is necessary in order to properly preserve entries that do not function properly in the Wayback Machine. Because not every entry can be preserved in the Wayback Machine an alternative method of preserving them is necessary.  Submitting the AIP to Archive It is that alternative.  All in all each of these steps in the project’s Archival Model are important and necessary for the project to succeed in preserving XKCD for the future.


Archival Information Package/Process

Archival Package:   2009-8-14 Oregon

Step 1 Capturing URL



Step 3 Submitting the package


Leave a Reply

Your email address will not be published. Required fields are marked *