1. This is a really great post Jeff. I shared it out over twitter and got a really positive reaction from a bunch of folks in the international digital preservation community. Format woes have been a key part of a lot of the policy thinking for a while, but I think you are likely completely right here. That the fear is largely unwarranted. If a PDF renders in a few different tools today there is good reason to believe that it will in the future. Sure, if we had to reverse engineer PDF rendering capability from specifications in some future time it would be tricky to work with a bunch of poorly constructed PDFs, but at that point we would likely have far worse problems if we’ve gotten to the point where we can’t even render a PDF.

  2. I heard somewhere that the PDF has been likened to a post-apocalyptic cockroach…

    The open source community has really altered the landscape of format obsolescence. I mean, with the right flavor of Linux, you can even resuscitate ancient hardware. VideoLan is an example of a tool that ignores format obsolescence and I don’t think I have come across something that wouldn’t play, provided that it wasn’t corrupted or otherwise broken. I’m sure there are a lot of things that VLC won’t natively open, but my guess is that someone somewhere has written a plugin for it (and at the least, you could do it yourself…).

    I think that your point about just ensuring that the bitstream is intact and letting the niche communities do what they do it valid. But, this means that part of digital preservation must be to actively participate in those communities and be aware of what is going on. Knowing who, or where, to ask for help and advice is more practical than launching a campaign to migrate every weird file in existence and create yet another standard (relevant xkcd: https://xkcd.com/927/).

  3. I’ve been thinking occasionally about formats going obsolete and there are surely examples from early computing days, for people who were compiling their own software or whatever. I am not sure how much it is the mission of archives to handle that; it seems more like work for historical researchers. Yale is preserving the Voynich Manuscript even though no one can read it. I think extremely obscure digital formats can be treated similarly.

    Rather than format issues, we are more likely to face documentation issues. What is this folder full of text files each containing a single word? Is it a weird meaningless artifact, or something to save? But again, I think the solution is the same — we don’t have to understand it. Just save it.

