To Preserve, You Must Understand

This week’s readings solidified what I see as a collector in the present day (being any librarian or archivist). You will need to understand digital objects and have the confidence necessary to handle them.

Understanding the objects

The need to understand how a digital object functions in the information is imperative for the current and future relevance of the field. As modern information professionals, we all will work with digital objects: ranging from the catalog record, to more complex digital files such as images, articles, and books. The Owens chapter clearly illustrated the minute mechanics of a digital object. (It reminded me of how large internet companies track your internet browsing and searching. It also made me think that this practice may not be as scary as the news has made it out to be.) He also explained the importance of understanding these details and using them to become better digital curators and managers of digital content. Basically, if we know what makes up a digital object, and how we can optimally organize those objects, we can better preserve our collections and provide access to the general public.

When considering physical collections, space is such a clear consideration. Physical items take up physical space. Digital items take up space in a similar, more abstract way. They take up virtual space. We, as digital information professionals would need to work to ensure that our digital objects take up as little space as possible without losing some of the quality necessary to consider the object to be the same.

Practice of preparing, preserving, and using digital objects

In the Chan article, prestigious design museums have begun cataloging and digitally preserving symbols that have become popularized as a digital form (such as the “@” symbol) as claim-able objects. The authors explain the need for a thought process and concept where something like the “@” symbol can be digitally claimed and preserved: “The larger issue facing design museums is that more and more of the products “made” by design practitioners now lack any form at all.” In traditional media, design objects would be fully physical, or at least more easily claim-able by an institution. Now, however, design objects are created and continue to live in a virtual space. Design creators have become unwilling or unable to fully contribute to the longlasting preservation of their work.

Alternately, this can also come in handy when digital forensics are needed. Kirschenbaum and team explain this process and the necessity for awareness of how digital objects could be used for forensics and larger problem solving. In the information professionals’ world, this is often accessing records on outdated and no-longer-used systems. They lay out three reasonable options for ensuring continued access to materials: migrate the files and save both the original and the manipulated files, retain or obtain the original systems required for the media, and create or use an emulation to show the material on modern systems as if it were on the original system.

Questions for the class

How do you think having an understanding of digital objects will help you when you go to consult with your small institution?

How might we (future information professionals) go about preserving these kinds of digital objects now that we understand their makeup and how they can be saved?

18 Replies to “To Preserve, You Must Understand”

  1. “How do you think having an understanding of digital objects will help you when you go to consult with your small institution?”

    This is a great question to meditate on as we begin our consultations. My organization wants to sort and organize “decades” of digital content into a shared network. I’m meeting with them for the first time later this week so I don’t yet have an impression of the variety of formats that they are trying to preserve. From the brief description, it sounds like text documents as opposed to audiovisual files but I’ll make no assumptions at this point.

    The possibility that these files were created over a period of decades presents a couple of challenges. The first thing that came to mind was file format. If the files were created using obsolete software then there might be a problem with either opening them, or if they do open, being rendered in their original layout. I had planned on talking to them about their goals and understanding what they perceive as important aspects of the content to save. As Owens and Kirschenbaum both write, there is more to the digital object than what you see on the screen.

    This relates to another aspect – the file naming system. Kirschenbaum had noted that file systems can evolve. My organization had mentioned wanting to sort and organize their files so it had also occurred to me that this could involve altering their original order and thus altering the context of some of these files. It had not occurred to me that the older files might have been named under a different naming system with different character limitations. Thus simply copying files into a new directory might pose technical problems if the file name isn’t compatible. Additional, if files refer to each other as with Owens’s webpage example then rearranging and renaming files will disrupt that connection.

    1. Tina, your concerns about file names also brought to mind the documentation (or lack thereof) that they might/probably have regarding how their digital files have been handled in the past. I can only imagine that many of our institutions may have started by organizing one way, and then stopped or continued using another, but may not have thought to document that process. It gets deep into the concerns that I have about being the one to make some of those decisions (and makes me glad that this project is more like an “educated suggestion”)… I am definitely guilty of erasing what I thought were random files and only keeping the “important” ones over time. I hate to think of all the important stuff I erased over the years!!

      1. I met with my organization yesterday and I think there will be a challenge from a lack of documentation. I hadn’t thought to ask at the time if they have any knowledge management system or if they should consider using one. Where I work, we use both Sharepoint and Confluence. They’re both great when the organization commits to using them and stay up to date on the documentation.

  2. “How do you think having an understanding of digital objects will help you when you go to consult with your small institution?”

    My institution has many of the same issues that Tina mentions above. Like her, I’ve discovered that an obsolete file format is going to pose some interesting challenges. To be specific, when our department transitioned from film photography to digital in 1995 virtually everything about those early cameras was a kludge, from the physical form of the device to the files it produced. StratComm (called CorpComm at the time) went with Kodak, and Kodak’s raw files of that era were a special flavor of TIFF that today’s Adobe software doesn’t play nice with. And unfortunately the photographer from that era left most of his work in this proprietary format. This is one of the big issues I intend to address at my interview with them tomorrow.

    Also like Tina’s institution, I have encountered cross platform issues where characters and file management practices that were allowed and common on Mac OS 9 are problematic – or not allowed at all – on Windows 10. For example, I’ve encountered a few directories burned to optical media that include quotes in the directory name. Windows won’t let me open or copy these directories. Luckily, there have only been a few of these. However, a much more common issue has been the lack of file extensions on a sizable percentage of those early Mac OS 9 files. To be honest I’m surprised that today’s OS 10 still offers users the option of omitting file extensions, but I digress. I have found that most of the time I can manually add an extension of either .jpg, .png, .psd, or .tif and that will repair the issue, however there are thousands of files in hundreds of directories that need to have this done. I’m not sure what options there are for batch-guessing on that scale.

  3. Great questions to get us thinking about how understanding the nature of digital objects will impact our consultation project! Tina and Andy’s responses have got me thinking about how I’ll need to collect more specifics regarding what version of file format my institution is using and how they are organized in order to surmount possible migration problems.

    I keep returning to the principle we discussed last week about letting the purpose behind preserving an object drive how to preserve it. My institution is interested in preserving digital objects related to its own history as well as digitized items. I feel like the purpose behind both these objectives could lead to very different methods. Based on the response to my survey, it feels like the institution is taking an informational approach in that it’s focused on capturing surface level forms of digital objects (i.e. our top layer from this week’s readings). The institution has collected a number a PDFs related to various forms of their website. I don’t think it’s possible to retroactively capture a more layered view of what existed but I wonder if it’s appropriate going forward to try to do more? Obviously that’s a call the organization needs to make but I’m wrestling with what my input should be like. Is it my job to just meet their needs and keep their methods at best practice or should I try to introduce new approaches that might be more than they can handle?

    1. I’m struggling with this as well.

      On the one hand I know the intended user community and their needs fairly well. Typically they are looking to obtain a relevant image as quickly as possible, throw it into a presentation and move on to the next thing. I honestly can’t think of a single time that any image inquiry required more than this “grab-n-go” approach. Even StratComm’s designers seldom need the extra exposure data hidden in a TIFF or raw camera file. They too, usually just want to grab and go. So I’m trying to figure out where the balance point is between strictly adhering to known best practices for image archiving, while also not being so rigid with those guidelines that I actually make workflow for image producers nearly impossible.

      And when I mention “impossible workflow” I’m referring specifically to the demand for TIFF as the default archival choice. Speaking as a photographer, TIFF was fine in 2002 when my camera had 6 megapixels and took 20 seconds to save a single image. Those files were less than 18 MB. In 2018 my camera has 36 megapixels and the TIFF files it produces are over 200 MB each. This is an unworkable size when editing batches in the hundreds. As a producer I can’t justify the business case for that workflow. And my management can’t afford to increase their allocation for disk space by a factor of 10.

      So, back in archivist mode… how comfortable can I get with a compressed alternative? Because I know if I hold out for TIFF the photographers are going to give me nothing at all.

      1. Andy, you’re conundrum reminds me that a lot of times in both digital and physical archival work, it seems like we just have to make do with the best possible solution. Would it be great if we could save all of the original copies of each document? Of course, but most of the time it’s just not practical… for space, funding, manpower, all sorts of reasons. Your comment made me think of the terrible quality of Facebook photos. There have been times when I see a picture on FB that I can’t find the original of, so end up just downloading it and saving it in all it’s terrible quality. I figure that lesser quality is better than nothing, if that’s all you can get!

    2. Yes! The end goal of the preservation is absolutely relevant to the process and handling of these digital objects. I think that – if you believe your suggestions will help in the future – you should share any and all insights you have on their practices. Especially since we are only working with our institution for a short time period, any extra input would likely be beneficial. And remember, a suggestion is just that, so if you aren’t as realistic as you thought, they can always modify that for themselves or not follow it.

  4. “How do you think having an understanding of digital objects will help you when you go to consult with your small institution?”

    Like Andy and Gwen, I’m also kinda struggling with this.
    I’ve already consulted with my org, but I feel like I still need to know more, maybe to do a follow up with them. I was particularly interested in Kirschenbaum’s article regarding digital forensics because it addressed issues I’ve had when attempting to access old files on floppy discs. Kirschenbaum discusses the limitations of a computer’s file system, and how this might impact the creator’s choice of how to name and organize their files. Specifically, he mentions that in early file systems, creators were limited to 8 characters to name their files! Additionally, certain file formats require specific software to work/be readible, but there are tons of different file formats and applications that have been developed over the last few decades.

    When going through old floppy discs at the Park Service a few months ago, I noticed that there were a ton of files labeled “FISH.pfg”, “BRIDGE.gny”, “”, and they all had these different file formats at the end. None of this made sense to me, and none of the files were accessible! This reading (as well as some research) made me think about the limitations of the file system maybe forcing previous employees to name their files with short, single word titles, and apparently word perfect (an old word processing software) allowed creators to “sign” the document with their initials– hence the strange file formats.

    I think having this kind of back ground information and better understanding of the idiosyncrasies of old file systems, applications, and computers will be of a great help for our project.

    My org has an “ancient” laptop with a database trapped on it, and who knows what else on there. I feel like there will need to be some understanding of old formats for me to be able to recommend a reliable way to extract and continue to make this database usable.

    1. Yes, I too am finding so much weirdness in the files I inherited from the late 90’s and early 2000’s.

      One of the big questions I hope to ask the next time we meet in person is if there are any reference points Trevor recommends for solving some of these specific technical mysteries. For example, I remembered him saying last week that it was possible to open an image in a text editor, so I tried it yesterday on one of the images that was saved without an extension. None of the usual guesses worked (.tif, .psd, .jpg), so I wanted to see if comparing the text to a known file type might give me some clues. I made some interesting discoveries looking at this information, but I probably raised more questions than I answered. I eventually figured out what the correct extension was (.ai… why would anyone save an image in this format?!), but I’m still wondering if there are any bona fide practical resources I should bookmark. Google is great, but so many of the solutions offered were posted on snarky forums from usernames that sound Russian telling me to go download their free magic tool that will do all the work for me. And most of those were posted 10 years ago.

      It’s a little hard to know who to trust.

    2. Maggie, that must have been so frustrating to find those funky formats. I think that if I saw them, I would have just assumed there were a bunch of different, less well-known systems back in the day. But having the will to learn about the different systems organizations used should help you as you progress with the project.

  5. I would say that a deepened understanding of digital objects will give me necessary knowledge and an extra dash of confidence for this project.

    I hadn’t really thought of it in quite this way till now, but most of my knowledge has been confined to best practices for handling objects I don’t fully understand, much the way I’d handle a painting without having a clue what materials had been used in its creation or why it’s okay to store it one way and not another. But if we’re really aiming at being digital preservation specialists, it seems we need to understand what it is we’re preserving. Knowing how much understanding is enough is where I struggle. Still, I think we need to know the difference between say, a TIFF and a JPEG, so we can understand the best practices and as mentioned above, sometimes question them.

    Where I work, I have to keep an eye on how much of our storage allotment we’re using for all those 200 MB TIFFs we create through our digitization project, when it’s the JPEG access copies we really use. Our options for bucking best practices are limited because we’re operating under a grant that stipulates certain procedures and we’d have to argue for changing them. But, knowing what we know now about formats, storage and our usage, we could write a different kind of proposal at some future point and better tailor those best practices/levels to our situation.

    I think that what I’m learning now about the objects themselves will help me be a more informed consultant as my project client tries to understand their needs.

  6. When first considering the consultation project, I wasn’t coming at it from the same perspective I am now. Now I have more of an understanding of the different forms of digital preservation rather than thinking it was a monolith. I don’t think we’ll ever really understand all of forms of digital objects, but at least now we have more of a framework to apply.

    When working with the answers from my survey, I realize now that it will be more important to figure out how the organization wants to use these files now and in the short-term future, rather than paying particular attention to a very specific standard of file. My opinion might change once I really start working with them and their material, but right now I’m focusing more on how they understand the use and how that impacts the kind of digital preservation tools they will need to use.

    1. I agree! I think that our understanding of the project will need to evolve as our understanding of the digital needs of the organization will continue to evolve. I appreciate your point that we will not be experts in every digital (both current and outdated) media in the next three months. It’s important to remember that as we work with our organizations, we have to learn about the current state of *their* systems, but also how *they* plan to use them moving forward. For instance, if an organization just wants files to be accessible in case they have a need in the future, that is a different need from an organization that plans to access their older files on a fairly regular basis.

  7. Thanks for chiming in, everybody. It’s nice to know that people are working through similar theoretical issues in their consultation process. I agree, Margaret Rose, that treating the immediate preservation needs in the manner the organization needs to use them is the first priority. I hoping that the policy portion of this project will present an opportunity to suggest different ways of thinking about digital preservation to my organization or even different ways of thinking about a digital object. Over-complicating workflows, as Andy points out, has inherent dangers of its own. Perhaps this is where its useful to return to the Chudnov article from a few weeks ago and think of this more as an iterative process than a “one and done,” even if our contact with these organizations doesn’t last past this semester.

  8. My organization doesn’t have as much digital content as some of the other organizations mentioned here because they are still in the process of figuring out what they have in their collections and scanning documents and photos. Because the content hasn’t been sitting around for decades, I’m hoping that I’ll be able to take what I’ve learned about digital objects and proactively address issues while the organization is still in the early stages of their projects. I think it’s also great that we can learn from each other’s projects as we encounter different challenges with our clients. For me, it’s really helpful to read these comments about file formats now because it’s helping me to identify some pitfalls that my organization needs to avoid as they move forward.

  9. I just wanted to say thank you to everyone for your comments on my question, as it is something that I am also struggling with at this stage of the project. You have all given me a lot to think about.

    1. Thank you, Leigh. My earlier comment seems to have been lost in the ether of moderation but thanks for posing this question as you did. It got me thinking about this project and my current job.

Leave a Reply

Your email address will not be published. Required fields are marked *