On June 30, 1950, Kenneth Shadrick became the first casualty of the Korean War and thus became the first record in a sprawling, 109, 975 item archive recording casualties—deaths and injury—in the Korean War.
TAGOKOR, standing for The Advocate General’s Office, KORean war, began as a simple punch card archive, organized initially by casualty date. From there, it would embark on a complicated route toward archival and dissemination.
The punch cards, created between June 30, 1950, and July 28, 1953, were created to be as detailed as possible: name, rank, service, hometown, and so forth. These pieces of information provided the basis by which the Advocate General would catalogue the dead and injured, but also provide notification to family. At the time, the data was crafted to be readable to the contemporary AG’s office, including specific codes.
As time went forward, the number of punch cards, one for each casualty, became a burden. As a result, the AG ordered the cards transferred to 556 BPI (bit per inch) magnetic tape and the cards themselves destroyed: encoded in Extended Binary Coded Decimal Interchange Code (EBCDIC), the data began a new life as a digital record. One more transfer to a denser, 1600 BPI magnetic tape followed on January 29, 1970.
In 1989, the National Archives and Records Administration (NARA) acquired a copy of TAGOKOR, and there the problems began.
First, the aforementioned EBCDIC was non-standard, diverging strongly from the industry standard ASCII format. Second, NARA did not have the right equipment to read the data, borrowing time on systems like those at the National Institutes of Health to copy the data over to 27871 BPI magnetic tape and verifying the information. Third, in the verification process, bits were dropped due to an inability to check the original record. Finally, NARA itself had been an agency in flux, moving multiple times between 1968 and 1988.
Ultimately, TAGOKOR was brought into a somewhat readable format, albeit with errors abundant. In 1999, the Archival Electronic Records and Inspection Control system confirmed these errors, notating the record accordingly, and in 2012, the Electronic Records Archive finally posted TAGOKOR for public consumption, errors and all.
This route from punch card to internet and the data artifacts and errors within point to important questions in archival work.
How archives handle incomplete records
When modernizing records, archives are often faced with the prospect of an incomplete, error-riddled data set. Sometimes, the incompleteness is related to a lack of complete entry, but in some cases, like TAGOKOR, there is a confluence of problems archivists must tackle. Considerations that an archive must make are related to an equation of how capable the archive is at preserving the record and how much money the archive has to do this archival. For TAGOKOR, NARA was faced with the question of not only missing bits, which could be anything from codes to data flags, but also the fact of a missing source document: the punch cards had been destroyed, and many of the codebooks were deteriorating or unreadable.
When creating a digital record from a pre-digital computer record, archivists should be aware of these circumstances first and foremost, something NARA was quickly immersed in. The fact that you cannot simply look at the “12-zone” in the indexing region of a punch card to verify codes means that you end up without vital information, so with TAGOKOR, an abundance of ampersands and empty fields abound. Something as simple as the physical header of a punch card provides a wealth of information that was not considered at disposition after the records were put on magnetic tape.
If anything, an archivist should strive to maintain physical copies as often as possible, especially associated ephemera like entry guides and memos. Without these little pieces of information, an archive can become a big problem.
Planning for the future
The most vital job of an archive is saving data, objects, and other relevant items for the future. In that case, it is important then that archivists are aware of the state of technology, future changes, and the likelihood of obsolescence in their methods. Original entrants are not bound by these considerations, but rather, their own entry guidelines, but the eventual need to save records means that even a data entry clerk should be aware of how these records will be viewed in the future.
A best practice for an archive should always err on the side of providing data in its simplest form from the word “go”. Digitally, there exist international data standards that will be readable for decades to come. Furthermore, providing stable physical copies of digital archives when possible creates a secondary preservation tool, one that can be potentially used for migration in the case of an archival failure.
Understanding that some formats, like EBCDIC, are non-standard and more or less inadequate for archiving, should be the first hint of the direction an archival project is headed. If an archive chooses proprietary over access, it does not mean more data safety, it simply means less access in the future: anyone over the age of 30 knows this when they come across references to Real Media or Real Player.
Furthermore, beginning your archive in a format that is more universal also provides an opportunity to tailor the data for a multitude of internal uses. With a wide variety of data tools, archivists can often create a basis for future data verification, manipulation, and other acts of finesse upon the data, all by simply utilizing a format that is standardized and widely-used.
What does this mean for archivists?
For archivists, having data easily accessed, especially digital-born or first step digital data, opens up the archive to outside interpretation and interaction. For TAGOKOR, this included the efforts by Whitey Reese, whose attempt at decoding records and creating a database for veterans and family members to look up casualties began in 2000 and persisted for a few years thereafter. Even TAGOKOR, as difficult as it had been coded, was still somewhat readable to Reese, who benefited from the clear and accessible format provided to him.
Archivists should look at examples like TAGOKOR and ask themselves what good their work is if it is going to be unreadable?
Archivists should also be ready to admit that sometimes a large physical record is vital, especially when involving heavily technological natures. Punch cards, diskettes, magnetic tapes, and so on, all should be preserved until the absolute last bit is verified.
Another consideration for archivists is providing ample explanation of the data. For TAGOKOR, the codes used in some bits were lost to time because they came from unpreserved memos and the like. Standardizing is one thing but ensuring that standard is interpretable by future readers is another.
Finally, archivists should be fully cognizant of the fact that although not every archive they receive for digitizing will be as incomplete as TAGOKOR, they should treat each archive as though they were. Instead of relying on the assumption of due diligence being the originator’s duty, archivists should be quick to rely on instinct when a seemingly complete dataset seems to have gaps. The “why” of those gaps is as much a part of the history of the record as the data.
Archiving a new digital collection that may require interpretation is a difficult task. Archivists are not by trade code breakers, but by necessity they often become such. Looking to an archive like TAGOKOR as an example of both best practices and the resilience of data demonstrates how each record can be an unreadable salad of bits, but nonetheless be of extreme importance. Archivists should be prepared to not only work with this data in an intense and complete way, they should be ready to let gaps remain until they can be filled. As long as an archivist follows good preservation practices, that data will remain intact until someone else can come in and work to make the fix.