6 Replies to “I wonder how many extra terabytes we could have if we just emulated the cloud…”

  1. Hi Dave,

    I really like the point you made about how while open source is free, the costs will come when you need to have someone on staff who actually has the technical skills to administer the program. Some of us are taking Curation in Cultural Institutions this semester, and our guest speaker today from UMD’s special collections discussed some of the difficulties they faced when implementing ArchivesSpace. When migrating their collections from the old Access database (called the Beast) to the new ArchivesSpace platform, they faced a steep learning curve when figuring out how the software worked, while also dealing with technical software glitches that can arise with open source software. She did mention how they really appreciate the listserv of other members of ArchivesSpace so that they can answer each other’s questions and bounce around ideas for what might work best for their specific institution. It seems that the Special Collections department staff is managing to learn the technical aspects of ArchivesSpace to get the service up and running, but I wonder what the costs were in terms of staff hours devoted to this project, in comparison to how much it would have costs to use a more out of the box software instead of ArchivesSpace.

    This summer, I interned at the White House Historical Association who launched their Digital Library earlier this year that served as both a public facing website for users, while also serving as their digital asset management system. I was really lucky that my supervisor, the Digital Librarian, thought it would be a great learning experience for me to see how they had gone about picking the software. She explained that over two years ago, WHHA had an outside organization do an audit of what digital assets they had, what they would like to display publically, and how they would like for that interface to work. The recommendations leaned toward using some open source materials to build out their digital library, but the Digital Librarian decided that she would rather use part of her budget to go with a more established outside vendor to build a system that would work for the library and not require a lot of maintenance by the WHHA staff. When I interned there, they were about 6 months into working with the outside vendor and were pretty pleased with what the software was able to do for user needs, and it’s functionality behind the scenes for the staff. I asked my supervisor if she would make the same decision if she was going through the process again, and she said that she definitely would still go with an outside vendor, as it freed up the staff’s time to focus on processing more images to upload into the library. They were lucky in that they had grant money that they could use towards building the digital library, which I realize a lot of other institutions don’t have. However, like I said above, I think it’s definitely worth it for institutions to do a cost-benefit analysis before embarking on whatever path they choose to see what will work best for them.

    1. I’ve set up ArchivesSpace in a test environment before and getting it up and running is, by far, the easiest part. SCUA’s challenges with implementation are, in my opinion, mostly data problems with some sprinkles of software trouble. I say this, mostly because of her description of the messy data. I think that they would have had the same problems with any other platform, even a paid service (they just wouldn’t see the messy data; they’d just pay for fixing it).

      Open source listservs and forums will help a lot, but you have to sift through writing that “reads like stereo instructions” (https://www.youtube.com/watch?v=RZBaSUhrwBY). But, they are extremely helpful, especially once you are able to articulate precise questions about your software and project details.

      This conversation harkens back to that idea from earlier in the semester: it’s a balancing act.

      1. It’s great that you guys are getting into the weeds on some of the costing parts of this. It’s also great that you are starting to pick apart the differences between various kinds of free. (This interview with the folks behind Archivematica from a few years back may be of use in digging into that further. http://blogs.loc.gov/thesignal/2012/10/archivematica-and-the-open-source-mindset-for-digital-preservation-systems/ ).

        In my mind, this is a great point to return to some of the points that Chudnov brought up in the Emperor’s new repository. Ultimately, you have some set of content that you want to ensure access to now and in the future and when we need to be thinking through how we can make sure we can get our stuff in and out of various systems, or eycosystems of systems into the future. (On that point, I think some of Mark’s comments in this interview about Islandora are very useful https://blogs.loc.gov/thesignal/2013/03/islandoras-open-source-ecosystem-and-digital-preservation-an-interview-with-mark-leggott/ specifically about the “crown jewels” of the content vs the functionality of a current access system.)

  2. I like the title of your piece. Maybe all of us in the archiving world should start a service that allows users to pay $99/year for “Archive Prime” and free streaming of all archived content?

    My client for this course already has access to Amazon Glacier for backup storage. I wasn’t familiar with Glacier, but it came up (somewhat negatively) in the readings. It is a cheaper version of Amazon’s s3 cloud, primarily for long-term storage; it doesn’t allow for instant retrieval. Rumor has it that this data is stored on tape (which is not crazy). I found a quick overview of Glacier’s suitability for NDSA levels here: http://www.avpreserve.com/wp-content/uploads/2014/04/AVPreserve_Glacier_profile.pdf.

    That study has the following key quote about fixity: “Amazon claims the service performs regular data integrity checks on all objects in storage, but the fixity checking method and outcomes of these checks are not available to clients.”

    I think, though, that a lot of organizations are going to be in the situation where they simply won’t have the resources to handle storage. This seems like an area where a large institution like the Internet Archive or perhaps large consortia of research institutions could do a real service to the broader archival community, by creating a low-cost service similar to Amazon Glacier but allowing clients to do those critical fixity checks.

    1. You’re right about Glacier, and from what I’ve read (no sources on hand at the moment), Glacier is nearline storage, which is most likely robotic tape libraries or non-spinning disc cold storage. This is also why its so much cheaper than other AWS storage.

      I think that organizations are already in the position where they don’t have the resources for storage–a large majority of the digital preservation consortia and storage services contract through Amazon. An organization like Internet Archive did a fantastic job avoiding “farming out” their server farms and storage, but in reality, it is simply terrifying that the majority of the cultural heritage sector is investing so heavily in one cloud storage provider. Then again, Google is a little bit more expensive, and Microsoft Azure is just, well, Microsoft.

      There are organizations like LOCKSS and Digital Preservation Network that use membership fees to fund their network of ‘nodes,’ but these memberships aren’t exactly affordable to the broader archival community, specifically because of the technology commitments and prerequisites.

      But you’re right on the money, cloud storage is problematic to the concept of fixity.

      1. Glacier is really interesting. My take, is that it could be a useful resource for say a third or fourth copy. A kind of “in case of emergency break glass” sort of thing. But you don’t really want it running as part of your primary infrastructure. Some folks aren’t comfortable that you have to trust that they are doing their own checking and repair of your content. I’m less concerned about this. I think the main issue in the case of the service is how one weaves it into the various workflows and practices you set up for managing content.

        Ultimately, Glacier’s market is not to cultural heritage institutions, it is as an “Archive” in the sense of replacing a “Tape Archive” that is, it’s a place to stick stuff an organization really does not think it is going to need but that it wants to keep around just in cases in which case you would then pull back a little bit here and there when you need it. In contrast, most storage systems for cultural heritage institutions are the actual infrastructure that both serves up content and are regularly being added to and revised over time.

Leave a Reply

Your email address will not be published. Required fields are marked *