So I successfully scared everyone off from blogging about the The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. If you get a bit lost in this reading I suggest reading over mwest24’s post on it from last year.
Along with the post I would suggest the following vocabulary terms that we should all become familiar with. I will weave these terms together into a bit of a how computer files and storage work discussion that will help set us up for Kirschenbaum’s book.
As this is technical info, I am just going to crib most of it for Wikipedia. Don’t get too lost in the details, but please read these over and click the links out to Wikipedia if you have no idea about some of the terms.
Key Terms for File Characteristics
Dots per inch (DPI) is a measure of spatial printing or video dot density, in particular the number of individual dots that can be placed in a line within the span of 1 inch (2.54 cm). The DPI value tends to correlate with image resolution, but is related only indirectly.
A character encoding system consists of a code that pairs each character from a given repertoire with something else—such as a bit pattern. In our work, the most useful things to know about are ASCII and Unicode.
Data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying marginally important information and removing it.
Embedded Metadata: Metadata that is embeded inside a given file instead of existing outside the file in some other database or something. Examples include ID3 tags for MP3 audio files and Exif for image files.
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text
A bit (a contraction of binary digit) is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states. These may be the two stable states of a flip-flop, two positions of an electrical switch, two distinct voltage or current levels allowed by a circuit, two distinct levels of light intensity, two directions of magnetization orpolarization, the orientation of reversible double stranded DNA, etc.
The byte ( /ˈbaɪt/) is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer architectures.
A disk image is a single file or storage device containing the complete contents and structure representing a data storage medium or device, such as a hard drive, tape drive,floppy disk, optical disc, or USB flash drive. A disk image is usually created by creating a complete sector-by-sector copy of the source medium and thereby perfectly replicating the structure and contents of a storage device.
Documents: Of particular importance for us are .txt .doc .pdf .xml and .html
Images: Of particular importance for us are .jpg .tiff and JP2000
Audio: Of particular importance for us are .mp3 .wav
Digital Video Encoding: this one is tricky, we will talk about .mov .mpg .swf .mp4 and .avi