TAGOKOR: migration, data fragments, and archiving for the future

On June 30, 1950, Kenneth Shadrick became the first casualty of the Korean War and thus became the first record in a sprawling, 109, 975 item archive recording casualties—deaths and injury—in the Korean War.

TAGOKOR, standing for The Advocate General’s Office, KORean war, began as a simple punch card archive, organized initially by casualty date. From there, it would embark on a complicated route toward archival and dissemination.

The punch cards, created between June 30, 1950, and July 28, 1953, were created to be as detailed as possible: name, rank, service, hometown, and so forth. These pieces of information provided the basis by which the Advocate General would catalogue the dead and injured, but also provide notification to family. At the time, the data was crafted to be readable to the contemporary AG’s office, including specific codes.

As time went forward, the number of punch cards, one for each casualty, became a burden. As a result, the AG ordered the cards transferred to 556 BPI (bit per inch) magnetic tape and the cards themselves destroyed: encoded in Extended Binary Coded Decimal Interchange Code (EBCDIC), the data began a new life as a digital record. One more transfer to a denser, 1600 BPI magnetic tape followed on January 29, 1970.

In 1989, the National Archives and Records Administration (NARA) acquired a copy of TAGOKOR, and there the problems began.

First, the aforementioned EBCDIC was non-standard, diverging strongly from the industry standard ASCII format. Second, NARA did not have the right equipment to read the data, borrowing time on systems like those at the National Institutes of Health to copy the data over to 27871 BPI magnetic tape and verifying the information. Third, in the verification process, bits were dropped due to an inability to check the original record. Finally, NARA itself had been an agency in flux, moving multiple times between 1968 and 1988.

Ultimately, TAGOKOR was brought into a somewhat readable format, albeit with errors abundant. In 1999, the Archival Electronic Records and Inspection Control system confirmed these errors, notating the record accordingly, and in 2012, the Electronic Records Archive finally posted TAGOKOR for public consumption, errors and all.

This route from punch card to internet and the data artifacts and errors within point to important questions in archival work.

How archives handle incomplete records

When modernizing records, archives are often faced with the prospect of an incomplete, error-riddled data set. Sometimes, the incompleteness is related to a lack of complete entry, but in some cases, like TAGOKOR, there is a confluence of problems archivists must tackle. Considerations that an archive must make are related to an equation of how capable the archive is at preserving the record and how much money the archive has to do this archival. For TAGOKOR, NARA was faced with the question of not only missing bits, which could be anything from codes to data flags, but also the fact of a missing source document: the punch cards had been destroyed, and many of the codebooks were deteriorating or unreadable.

When creating a digital record from a pre-digital computer record, archivists should be aware of these circumstances first and foremost, something NARA was quickly immersed in. The fact that you cannot simply look at the “12-zone” in the indexing region of a punch card to verify codes means that you end up without vital information, so with TAGOKOR, an abundance of ampersands and empty fields abound. Something as simple as the physical header of a punch card provides a wealth of information that was not considered at disposition after the records were put on magnetic tape.

If anything, an archivist should strive to maintain physical copies as often as possible, especially associated ephemera like entry guides and memos. Without these little pieces of information, an archive can become a big problem.

Planning for the future

The most vital job of an archive is saving data, objects, and other relevant items for the future. In that case, it is important then that archivists are aware of the state of technology, future changes, and the likelihood of obsolescence in their methods. Original entrants are not bound by these considerations, but rather, their own entry guidelines, but the eventual need to save records means that even a data entry clerk should be aware of how these records will be viewed in the future.

A best practice for an archive should always err on the side of providing data in its simplest form from the word “go”. Digitally, there exist international data standards that will be readable for decades to come. Furthermore, providing stable physical copies of digital archives when possible creates a secondary preservation tool, one that can be potentially used for migration in the case of an archival failure.

Understanding that some formats, like EBCDIC, are non-standard and more or less inadequate for archiving, should be the first hint of the direction an archival project is headed. If an archive chooses proprietary over access, it does not mean more data safety, it simply means less access in the future: anyone over the age of 30 knows this when they come across references to Real Media or Real Player.

Furthermore, beginning your archive in a format that is more universal also provides an opportunity to tailor the data for a multitude of internal uses. With a wide variety of data tools, archivists can often create a basis for future data verification, manipulation, and other acts of finesse upon the data, all by simply utilizing a format that is standardized and widely-used.

What does this mean for archivists?

For archivists, having data easily accessed, especially digital-born or first step digital data, opens up the archive to outside interpretation and interaction. For TAGOKOR, this included the efforts by Whitey Reese, whose attempt at decoding records and creating a database for veterans and family members to look up casualties began in 2000 and persisted for a few years thereafter. Even TAGOKOR, as difficult as it had been coded, was still somewhat readable to Reese, who benefited from the clear and accessible format provided to him.

Archivists should look at examples like TAGOKOR and ask themselves what good their work is if it is going to be unreadable?

Archivists should also be ready to admit that sometimes a large physical record is vital, especially when involving heavily technological natures. Punch cards, diskettes, magnetic tapes, and so on, all should be preserved until the absolute last bit is verified.

Another consideration for archivists is providing ample explanation of the data. For TAGOKOR, the codes used in some bits were lost to time because they came from unpreserved memos and the like. Standardizing is one thing but ensuring that standard is interpretable by future readers is another.

Finally, archivists should be fully cognizant of the fact that although not every archive they receive for digitizing will be as incomplete as TAGOKOR, they should treat each archive as though they were. Instead of relying on the assumption of due diligence being the originator’s duty, archivists should be quick to rely on instinct when a seemingly complete dataset seems to have gaps. The “why” of those gaps is as much a part of the history of the record as the data.

Conclusion

Archiving a new digital collection that may require interpretation is a difficult task. Archivists are not by trade code breakers, but by necessity they often become such. Looking to an archive like TAGOKOR as an example of both best practices and the resilience of data demonstrates how each record can be an unreadable salad of bits, but nonetheless be of extreme importance. Archivists should be prepared to not only work with this data in an intense and complete way, they should be ready to let gaps remain until they can be filled. As long as an archivist follows good preservation practices, that data will remain intact until someone else can come in and work to make the fix.

6 Replies to “TAGOKOR: migration, data fragments, and archiving for the future”

  1. AJ, I really enjoyed your thoughts here!
    After reading “Paper Knowledge: Toward a Media History of Documents” by Lisa Gitelman my thoughts on this are varied.

    First- I am a historian with a background in archaeology and artifacts, and have a passion for material culture. Therefore, learning that these punch cards had been destroyed not only saddened me, but angered me due to the lack of respect for these important documents. I realize that the pure quantity of them was surely a burden. However, I wish that they would have stopped and thought about the future for just a moment and processed how crucial those cards could have been to the historical record (doesn’t every historian dream about going back in time and changing history?).
    Yes, they did transfer the cards information onto a digital record. And yes, we should be thankful that we at least have those. However, modern-day historians could have made new interpretations and conclusions from the cards if they were still intact. They could have studied their usage, consumption, and creation as well as the behaviors, norms, and rituals that the objects created or took part in. A great deal of information was lost when those cards were destroyed.

    Secondly- There is a great deal of pressure put on archivists to find all the answers related to their archives. So I ask, where is the line drawn? How and when do they decide to give up looking for the “missing bits and missing source documents”? How much time should they allot to finding and correcting errors? Like you mentioned, archivists are not by trade code breakers. Therefore, when is the appropriate time for them to “give up” and move on to the next project? When is it okay to leave a gap in the archive incomplete?

    1. I think your second point is important! There certainly is a bandwidth for everyone and every org, so having an understanding of when it’s time to give up is key. That said, I think this is where an archivist’s documentation is vital! It can be a good signpost for future historians who might be interested in deciphering these gaps.

  2. Thanks for the great post AJ! Olivia, I totally understand your second point. It can be really hard to decide when an archivist should “give up” and move on . I think it definitely has to be case by case as each repository decides what its priorities are and how archivist’s time is best spend on each collection. I’m also right there with you AJ. I think that whenever an archivist does stop trying to decode a collection undergoing digitization its critical they leave detailed notes. Historians and others interested in these collections might be able to pick up where they left off and offer new insights.

    1. I think this second question about where is the line drawn in archives really gets down to the purpose of an archive. To determine how much time archivists should spend “decoding” a collection and how much physical storage an archive should devote to a single collection, depends on what the archive sees as its mission: is it to hold copies of as much historical material as possible, is it to preserve as much original historical material as possible, is it to make as much historical material as accessible as possible? All of those would result in a very different archive and there is not one that is inherently better than the other. Perhaps it would be useful for these purposes to be explicitly spelled out for the public and researchers (assuming archives internally already make these decisions).

      For me, I want an archivist to tamper with a collection as little as possible. There is no doubt that archivists put their own historical interpretation into the archive, but best case scenario is for the materials in a collection to be left in their original format. This article shows that that becomes harder after a collection’s format has already been changed. I think the main question in this example is how much should archivists “mess” with a collection after it has already been messed with. The TAGOKOR example shows the grey area between interpreting historical material and making historical material as accessible as possible. I do think archivists did great work making TAGOKOR more comprehensible and accessible, but each iteration of work got the material further from its original format, so it is always and give and take.

  3. These are all really important points! The TAGOKOR data is a really interesting case because is was born digital in a way – the punch cards were an early way to process data. However, this technology (and the technology it was transferred onto – the 7 and 9-track magnetic tapes) soon became outdated or obsolete. As we turn more often to digital records, careful thought will have to be taken to ensure the preservation not only of original files, but copies that can be made accessible to the public and readable using up-to-date technology.

    This brings me to my second point. I was really struck throughout the article about how archivists were continually working to make TAGOKOR accessible. Even from the beginning, individuals were considering how others might want to use the records for research, ordering them in ways that seemed logical. When the files passed into NARA hands, archivists worked to create print and digital copies along with supplementary reference materials to help researchers decode the records. Bailey talks about how research by war veterans drove the availability of full records, but how the creation of the web and the NARA’s Access to Archive Database system allowed more people to use TAGOKOR in their research. Stable records readable through up-to-date technology are important because without them the public doesn’t get access to them.

  4. Your discussion of how nonstandard code and reliance on obsolete technology significantly reduced how useful and accessible the TAGOKOR archive’s data was reminded me of Kara Van Malssen’s article. Van Malssen’s tips for deciding on file formats (including advice about, as you mention in your post, the pitfalls of using proprietary media formats) warn aspiring oral historians away from decisions that might lead them to difficulties similar to what NARA has faced with TAGOKOR, which can basically serve as a case study of what not to do with your data. The destruction of the original punch cards would be akin to destroying your preservation master file, which could end up fine if one of your edited files was saved in a format that remained popular and usable forever, but which has quite a lot of potential to cause problems down the line.

    Emily and Katie, you both touch on how archivists transformed the TAGOKOR data over time in their (surely noble) attempts to make the data more accessible—this is a pretty great argument for maintaining a “preservation master” even for non-digital, non-video materials. If the original punch cards or the original data that was encoded in the original punch cards had been preserved in a format chosen for its reliability, and the data only copied, not migrated or reformatted entirely, the use of obsolescent technology and abnormal code wouldn’t be as big an issue. Van Malssen’s advice for digital video preservation can inform our preservation decisions for other types of media as well.

Leave a Reply

Your email address will not be published. Required fields are marked *