NDIIPP’s Model Digital Video Preservation Repository

The prototype Preservation Repository (PR) at New York University was built as a major component of the Library of Congress-funded project Preserving Digital Public Television.

A partnership between WNET-TV, WGBH-TV, the Public Broadcasting Service, and NYU, the Preserving Digital Public Television had several goals:

  • Build a model preservation repository for “born-digital” public television programs
  • Examine operating issues related to content selection, costs, and access
  • Promote system-wide support for digital preservation

The prototype Preservation Repository was developed at New York University by the Digital Library Technology Services team between 2006 and 2009. Focus was on submitting a selection of video files, ingesting them into the repository, and then retrieving them successfully.

The repository was built on DSpace and the operations were based on an ISO standard known as the “Reference Model for an Open Archival Information System (OAIS)” and commonly referred to as the “OAIS Reference Model”. The model is generic in terms of content – it describes the general data flow of files in and out of an OAIS and provides a vocabulary for discussing digital preservation repository concepts and a framework for structuring repository functions.

How we’re using PBCore

This framework is being adopted by many media archives for long-term storage. Using this model, files are packaged for input as Submission Information Packages (SIPs), then stored as Archival Information Packages (AIPs). Particular attention was given to determining the most appropriate metadata. PBCore was chosen as the primary descriptive and technical metadata schema for program files being packaged into an Archival Information Package. (Additional metadata was added to the AIP using other schema, including METS, MODS and PREMIS.)

A sample of over 80 hours of program files were submitted to the repository, including high-resolution (production quality) program masters from WNET and WGBH, and lower-resolution (broadcast quality) distribution versions of the same programs from PBS. These included samples from national productions Nova, Frontline, Religion & Ethics Newsweekly and other programs. Although not a large collection, this allowed the repository to organize and manage a wide range of program file encoding formats, with different wrappers and metadata.

Each AIP contained multiple files and were required to have:

  • At least one Master Program file
  • A PBCore file containing descriptive and technical metadata
  • A PREMIS file containing creating application and rendering environment information
  • Plus the files specified by the core AIP:
    • A METS file containing structural metadata
    • A MODS file containing descriptive metadata
    • And a METSRights file containing rights information

OAIS Repository Functions

SIP = Submission Information Package – configuration of files going into the repository
AIP = Archival Information Package – configuration of files stored in the repository
DIP = Distribution Information Package – configuration of files when they leave the
repository to be used.

Steps to adopting PBCore

All AIPs had to contain a PBCore document. However, because program information is not collected systematically or uniformly by public television entities, the PBCore data had to be collected from multiple sources on a program-by-program basis. Also, because it was not standardized, the quality of the incoming metadata was idiosyncratic and inconsistent. The PBCore records were drawn from:

  • From WNET — XML exports from the InMagic database and PBCore exports from ProTrack
  • WGBH — submitted XML exports from TEAMS database
  • PBS — submitted PODS data as PBCore

In order to generate the final PBCore files, the collected metadata had to be analyzed and mapped against the PBCore schema. Then software was developed to extract the metadata from the various files and insert it into a final PBCore document. Detailed descriptions of the OAIS framework and the selected metadata schema including XML, can be found at:


The Mapping charts can be found in Appendices 6 – 8.

The difficulty of collecting, processing and manually entering data into PBCore demonstrated the high priority for setting uniform metadata and technical standards. Without them, automating the functions for extracting and managing the metadata and file integrity of large collections will simply not be feasible.

Why we’re using PBCore

At the same time, when PBCore metadata records are filled in, the repository proved that PBCore is a scalable and feasible operation for public broadcasters. Using PBCore this way, producers can easily share data with any third party capable of interpreting PBCore, e.g., repositories, other stations, or a wide range of other users.


Nan Rubin
Project Director – Preserving Digital Public Television
Community Media Services
4700 Broadway #2J
New York, NY 10040

Leave a Reply