So what is a digital archive?
Before there were digital archives there were physical archives, both of which exist today. However, as we’ve seen, there are huge differences, but also similarities, between traditional archives and digital archives. Traditional archives and the transition towards digital archives are best explained in Owens, Theimer and Bailey’s articles.
Traditional archives are defined by Kate Theimer as inclusive of provenance, a unified collection (aggregate), and being kept in their original context. Examples of this include records management, the papers of well-known figures, i.e. George Washington, and tape archives (see Owens). Theimer pushes back against calling online collections “archives” because they do not keep to the same values as traditional physical archives. However, the concept of what counts as an archive becomes more complicated once you introduce born-digital items, or records created and stored digitally, creating the need for digital archives. Just as digital history is not simply history on the computer, neither are digital archives just archival records online.
So then what is a digital archive?
In Bailey’s article he discusses how archival practices have changed to suit the needs, both political and practical, of archivists, and pushes for a change in how archives are conceived of in the digital world. Many of the practices that archivists use for physical archives are unnecessary for digital ones. Original order and provenance, as Theimer would agree, are cornerstones of archival management, but this information is stored in the data records of files online, making the organization of archival information in this order unnecessary. Through digital archives, access to each item is not dependent on its original collection or provenance, making physical archival practices impractical for a digital archive.
It is important to note the difference between a web archive and a digital archive. A website never starts out as an archive, but becomes an archive over time by preserving its data. The practicums being looked at this week showcase the diversity of the term ‘digital archive,’ as defined by Owens. The September 11th archive is a crowdsourced collection of materials related to the incidents of 9/11. The Bracero archive is a digitized collection of oral history interviews. The Shelley-Goodwin archive is a digitized collection of primary source materials. The diversity of these sites (as you will read about in our cohort’s blog posts) highlights the strengths and differences of digital archives.
In Jerome McGann’s “the Rationale of HyperText”, he discusses the process of digitizing books. Would this be considered a digital archive? Why or why not?
What processes of digital history do you see used in digital archives, as explored by these authors? (Close reading, as discussed in Meg Philips’ “Close Reading, Distant Reading: Should Archival Appraisal Adjust” is a great place to start)
How do physical and digital archives differ? Can physical archives adapt to become digital archives? Will physical archives ever become obsolete?
20 Replies to “Digital vs. Physical: the Ultimate Showdown”
I found Meg Phillips ideas fascinating when discussing different archive practices. It’s often up to the individual archivist to determine what material they deem worth saving, but a misinterpretation or too close of a reading could lead to large amounts of data being lost. It would be fascinating to track disease trends through “sick day” emails from employees to their bosses, but without this context in mind who would deem a “sick day” email worthy of saving? This is a problem I am sure happened with regular archival processes, however, as not everyone kept every letter they ever wrote or received. This new way of distance reading can help us as researchers find larger trends in data sets, and through these trends we can then better choose which artifacts (digital or not) to engage with at a “closer reading” level.
Like Abigail, I was also really intrigued by the idea of apply macroanalysis to something like sick day emails, or quick notes about setting up a meeting, and how that complicates the process of deciding what gets kept and what doesn’t.
I was surprised, though, that Phillips (and others we read for today) didn’t discuss the issue of selection and deciding what’s important on the other end of the process, before material reaches an archivist. Is someone working in an office more or less likely to keep emails about sick days and meetings than they would be to keep physical notes or memos about the same things? Personally I tend to clear those things from my inbox pretty quickly but I’ve worked with processing archival collections that had all sorts of minor notes and memos tucked away. But obviously my personal experience does not a pattern make, so I’m curious what others think.
I think you bring up a really good point about the selection of material that is even sent to an archivist and what is deemed significant enough to be preserved in an archive. I think it would have been interesting to hear the thoughts of the authors that we read.
I think Philips’ argument about distant reading is connected with the question that Callie and Abigail both brought up, what information should be saved? One limitation of physical archives is space. Not everything can be saved because not everything has room. On the other hand, digital archives are able to save everything since there is a seemingly unlimited about of space on hard drives. Eventually, I think everything will become digitally preserved in some way so documents or objects will be remembered as they were. 3D modeling and other digital programs can also act as restoration tools to reimagine papers/photos/objects to the time they were first created. That being said, I don’t think physical archives will ever become obsolete since the physical copies will still exist and need to be maintained. The question then becomes how will these digital and physical archives work together to interact with audiences? How will these collections be integrated to act as a unitary archive? Thoughts??
I want to resist a little bit the idea that digital archives can everything because they have unlimited space – a small hard drive can hold a whole lot more than paper or microfilm, but it isn’t infinite.
As a case in point, someone else brought up tweet elsewhere in these comments. The Library of Congress announced in 2010 that it was going to archive every tweet, but then rolled that back last year to only a small selection of important tweets: https://www.npr.org/sections/thetwo-way/2017/12/26/573609499/library-of-congress-will-no-longer-archive-every-tweet
Space becomes an issue even for digital archiving at that scale, as does processing. Even if an algorithm can sort through large volumes of digital material much faster than an individual archivist can go through papers, there is still work involved in getting material read for researchers.
I agree, I know I tend to think of space as unlimited, even though I know that its not. Maybe this means that physical archives might become a repository for documents stored digitally on external hard drives. How does this change the field of archival studies? What might some of our authors have to say or feel about those changes?
I agree with that point, I was just thinking more along the lines of the Edson article from week 3. The constraints of the size of hard drives is a problem that we have now because we haven’t figured out how to make them any bigger without making the technology huge physically. I felt like the NPR article focused more on the human constraints of why the Library of Congress decided to no longer archive every Tweet. However, I think one day technology will advance so that archiving all Tweets would be feasible.
This is a really interesting concept of using people’s correspondence to make more macro-level analysis. I wonder, too, how much would depend on the individuals who make up the sample. It would depend not only on the archivists to preserve sick day emails, but if most email users are like me, they just swipe most of their emails into the trash without a second thought–and I’m a more history-minded person than most. My point is that while it’s interesting to look at one’s virtual footprint to gain some understanding about a person, and although there’s a lot of space to store things on a hard drive and on internet storage, there’s a lot of variability that comes down to how people use and store their information. Depending on the range there, it would change the accuracy of the data and the required sample size.
“Through digital archives, access to each item is not dependent on its original collection or provenance, making physical archival practices impractical for a digital archive.”
This made me think of how difficult and interesting it would be to actually trace provenance for digitally archived materials. The catch would be that many digital things can be in multiple locations at once, so it would be a web of provenance instead of linear. But wouldn’t it be cool if we could see the exact path an image followed across the internet? Or the geographic and temporal spread of a significant tweet?
I agree! To add, wouldn’t it be interesting to see if there is a certain kind of Tweet that Trump produces at a certain time (e.g. Hillary Clinton keeps him up specifically at 3AM)? There’s ongoing jokes that Trump is on Twitter at all hours of the night, so maybe one day there will be a computing program that “macroanalyses” these Tweets.
I agree that this would be really fascinating to trace! I was recently listening to a podcast called Reply All where they tracked down a single bitcoin transaction and it took them all over the internet and required a lot of finesse and expertise. I bet the process for finding the path of an image or tweet would be unique to each image and nearly impossible to trace. I wonder if there are ways to track this that are either being developed or used. Such a cool thing to consider!
I think that the question of what is significant enough to be saved, and the possibility of using digital storage to archive functionally infinite amounts of material does lead to a bigger question about how we process information. Obviously, physical constraints limit the materials which can be stored by a physical archive, but there is also the risk of overwhelming a researcher with a deluge of information which is only semi-relevant or which otherwise adds nothing to their research. Unlike physical storage concerns, this remains an issue for digital archives, though one which might be solved with digital tools, such as improved search engines which can more effectively analyze the researcher’s intent or evaluate the relationships between documents.
This class makes me think about the potential of AI quite often. People are bringing up physical storage but there has been talk recently about quantum computing, which has the potential to speed up processing power exponentially.
AI itself could be a useful assistant to a researcher. Particularly toward the point that Christian makes about researchers being overwhelmed by data in archives. An AI could rapidly gather relevant data, like a human archivist can with physical archives.
Besides the question of space, I think it would also be interesting to think about the longevity of these digital material and archives compared to a physical archive. How long can digital archives or material last before the technology become obsolete in the future.
right now it seems we are working at such a fast turnover rate for technology. I have worked in archives that had only just finished transferring their tapes and video cassettes to CD’s. Now they are starting the process of doing this all again with digital files. There are also documents still stuck on floppy discs which are being transferred as well. A major issue I see with digital archives is that many archivists want there to be a CD or thumb drive or floppy disc with the physical archives and most lack larger databases that actually contain this digital information. With computers themselves having such a fast turnover rate this problem is widespread with archives as the only copy of much of this information is with the physical material (especially for small operations). I almost feel like in order to get digital archives to a space of longevity there would have to be some sort of plateau in technology development.
And going off of that, how will new technologies be created to be compatible with old systems as well? What information with be lost if companies/programmers decide to make closed systems?
While the discussion of space is key to the discussion of digital archiving, something that we haven’t really touched on is the funding required. Yes, in theory we could have limitless resources and space digitally, but this requires funding, whether it is paying for servers or hard drives. Also, these archives will require upkeep over time, so someone will need to be paid to maintain a digital archive well after it is created. This is why grants and endowments are so key to this work- they allow the digital world to be created and maintained properly.
I want to touch on Josh’s point about AI, as this is something I also think about in this class. I love the idea of using a faster method to sort through data and decide what belongs in an archive, but we just discussed the possible dangers of using computers to do the work of humans earlier in class. Computers can’t pick up on human cues, and in missing the human aspect how would a computer be able to tell what is and isn’t relevant? Even if it could, how successful would it be, and how successful would we need it to be to deem it to be a success? I’m not downplaying this angle at all, I’m truly curious to see what others think such thresholds might be.
Kyle you are absolutely right about funding! There are certainly not enough paying positions for archives of the physical variety let alone the digital. All of these ideas require money!… or at least a massive restructuring of how archives are processed.
I wouldn’t be surprised if the next wave of archival work is digital and also automated. I don’t know how it works but I’m assuming it wouldn’t be too hard for someone to create a program that automatically pulls and organizes based on text or data. This would certainly cost some money for space and to initially build a program, but it would certainly save money over paying archivists. In the traditional format, I think archival jobs are probably safe but for digital, it would be surprisingly easy to minimize the people involved in the process for money-saving (and efficiency?) purposes.
With regards to your last question about digital archives replacing physical archives, it seems to be an inevitability. Documents, records, articles, and more are increasingly born-digital or produced physically with digital copies stored away somewhere. Court records for example exist as physical documents but are available online. The ease of use of digital archives will soon outweigh limited access as more archival material is scanned and stored digitally.
There are obstacles on this path, such as the time and resources required to translate current archives into a digital format but through a variety of techniques such as crowd-sourcing it is only a matter of time and more advanced technology until researchers can rely entirely on digital archives, more often than not, to complete their projects.