MATRIX Procedures for Digitizing, Describing, and Preserving American Black Journal

Digitization Procedures

MATRIX's Digital Lab at Michigan State University in East Lansing, Michigan is equipped to reformat a wide range of audio and video tape formats. Specializing in forward migration of analog audio and videotape from multiple formats (including such professional and consumer formats as 1" Type C video, 3/4" U-matic video, BetaSP video, VHS and S-VHS video, 1/4" audio at multiple speeds, records, cassettes and multiple other formats) the staff of Matrix's Digital Lab is well trained in migration to digital files.

MATRIX adheres to archival community standards in MPEG-2 25MBps I frame only output, .DV 25-50MBps output, digibeta output, or even 72GB/hour .avi standard output from analog and carefully captures all camera-generated metadata in "born digital" formats such as DigitalBeta. MATRIX technicians use ISO-standard evaluation methods to inspect tape, prioritize digitization, and create client-needed output in multiple formats.

For ABJ digitization, MATRIX's lab is creating multiple format access copies but, more importantly, is also creating archival .dv formats of analog tape and standard non-reformatted files from "born digital" tape such as digibeta (Sony's industry-standard Digital Beta format).

For the NEH Preservation and Access phase of this project, MATRIX originally proposed to reformat tapes in the collection to Panasonic DVCPro50 format. However, project staff is currently investigating best practices in alternative preservation formats that are also of immediate use to the show's producers at Detroit Public Television (DPTV) as a working broadcast station with the American Black Journal still in production. As the broadcast industry moves away from analog and digital tape (Beta SP, Digital Beta, and even HDV tape), MATRIX, MSU Library Special Collections, and DPTV are working as a team to preserve the original analog and digital tape as well as digital captures of every show. Newer "born digital" formats and straight to hard drive recording offer both challenges as well as opportunities to collect more metadata at the time of shooting, and MATRIX is working closely with DPTV to coordinate efforts for long-term preservation. MATRIX and DPTV will continue to work together to maintain the entire ABJ collection in digital formats that allow for both professional editing from copies of offline, archival formats as well as easy online access through current streaming formats.

Systems Configuration, Backup, and Storage

For storage and streaming, MATRIX currently operates a server farm designed for audio and video, dedicated servers for streaming video and audio, servers for front-end display, and Internet 2 connectivity to prevent any server-side connection bottlenecks of deliverable files. As video files are quite large, preservation copies of ABJ shows are stored on removable media (currently DLT-S4 data tape on a 5-year migration plan to accommodate changing technologies).

MATRIX servers are kept in climate controlled, physically secured rooms. All MATRIX servers run the Debian distribution of Linux. Incremental tape backups using Linear Tape technology are performed daily (with a full backup performed weekly) and those tapes are stored at the MSU Computer Center, thus providing off-site backup storage in a constantly staffed facility. Backup tapes cycle through the system approximately every 6 weeks and are replaced as needed. In addition to these ongoing backups, a full "permanent" backup is performed every 1-2 months to ensure against data loss. Those tapes will be kept in perpetuity as part on MATRIX's long-term preservation strategy. The MATRIX systems administrator keeps a wiki-based log of all tape backups.

MATRIX's infrastructure already streams thousands of hours of video and audio and is scalable, with additional hardware, for increased content storage and streaming.

Using PBCore as a Descriptive Metadata Standard

During the 2002-2004 pilot phase of the American Black Journal project, MATRIX elected to use the PBCore Public Broadcasting metadata standard. With a public television program at the core of the repository's assets, MATRIX was drawn to this public broadcasting implementation of the Dublin Core Metadata Element set. Using a community-specific implementation of Dublin Core meant that descriptions of ABJ shows would be coded using a well-established, stable international descriptive metadata standard designed for resource discovery. Furthermore a Dublin Core-based standard would facilitate any future interchanges and/or exchanges of digital resources via metadata harvesting technology.

PBCore was developed by a "cross-organizational team of public radio and television producers and managers, archivists and information scientists," according to the PBCore website (link to www.pbcore.org). As such, PBCore provided the ABJ project with a metadata standard that would be understandable to and address the needs of the communities of both stakeholders in this partnership digital libraries (MATRIX) and public broadcasting (DPTV). Further, with funding from the Corporation for Pubic Broadcasting, PBCore offered a stable and sustainable community-specific metadata standard.

MATRIX found the PBCore elements, data dictionary definitions, and examples immediately useful and applicable to the project. After extensive consultations between MATRIX's head of digitization and the in-house digital librarian, project managers selected 34 of the 48 original PBCore (version 1.0) elements to use in the ABJ database. The lion's share of these elements record technical metadata essential to preserving the ABJ shows. Because the ABJ project focused on one program and was mostly a collection of "finished shows," many of the other PBCore elements (like relation) were not applicable.

Looking beyond PBCore for Preservation

PBCore is not a preservation standard; it does not address certain important preservation issues such as the integrity of digital assets. Its purpose is to provide a standardized way to describe public broadcasting objects in order to promote resource discovery and retrieval as well as support data sharing or exchange. However, the PBCore website suggests that the standard also could be used "as a guide for the onset of an archival or asset management process at an individual station or institution." The Instantiation container (included in PBCore v. 1.1 and 1.2) allows for recording metadata about original physical resources, such as videotapes, as well as digital versions of the same content used for access. Most of this metadata is technical in nature, although some is related to the provenance of the asset. MATRIX looked to the Library of Congress's Preservation Metadata Implementation Strategies Data Dictionary for Preservation (PREMIS) and Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) as guides to frame our preservation practices. The TRAC certification plan was developed by a RLG and National Archives and Records Administration (NARA) joint task force that was charged with developing "criteria to identify digital repositories capable of reliably storing, migrating, and providing access to digital collections." In addition to documenting MATRIX's current preservation activities by applying the TRAC checklist, MATRIX used PREMIS to enhance the preservation functions of KORA, MATRIX's digital repository application. Over the past 12 months - through a rewrite of KORA - MATRIX has added appropriate metadata and preservation functionalities to KORA that bring it in line with emerging best practices in digital preservation.