Webinar Recap: Examples in Context

This is the fourth and final post in a series about the PBCore webinar that the Education Team presented in October. A recording of the webinar can be found here, and we’ll be recapping the event over the next few weeks.  Part one of the series is located here, part two is located here, and part three is located here.

The previous blog post written by Sadie Roosa described the basic elements and form of a PBCore document. Expanding on that, this post will demonstrate various ways that PBCore can be structured in order to meet specific objectives.

Use Cases

The following use cases will be discussed:

  • Archival Description

  • Asset Management

  • Digital Preservation

  • Sharing and Exchange

  • PBCore in METS

PBCore is very flexible, which allows it to be utilized in any of these use cases. In fact, it is possible to design PBCore files that apply to a number of these use cases. However, for the sake of readability, the sample PBCore XML files discussed in this post are kept as simple as possible. All of the sample files are available on the documentation page of the PBCore website. Keep in mind that these examples are not meant to be prescriptive, but are rather meant to inform you as to what PBCore can do for you. Don’t worry If your institution uses PBCore files that look different than these examples! The standard is designed to be flexible in order to accommodate the needs of many different types of institutions, and any deviation from these examples should not be seen as wrong or incorrect.

PBCore in XML

Sadie’s blog post mentioned that PBCore data is the most useful when held in an XML document. XML stands for eXtensible Markup Language. XML is used as a data storage and transmission format across a number of fields and disciplines. It is both human and machine readable, and the data structure of XML allows multi-dimensional data to be nested hierarchically. These characteristics make it particularly useful in the realm of A/V archiving, where we often deal with many versions, instances, or parts of a single intellectual unit.

Take for example the process of transferring a single analog video tape to the digital realm. The first step may be to create a single Preservation Master file to represent all of the content on the tape, with a one-to-one relationship. Next, the Preservation Master file may be broken into several Access Copies according to the programmatic content on the tape, creating a one-to-many relationship between the Preservation Master file and the Access Copies. An excel document can do a good job of describing a one-to-one relationship, but it cannot accurately describe a one-to-many relationship. XML, on the other hand, can describe complex relationships using a tree structure. It is for this reason that PBCore is typically held in XML files.

The Simplest PBCore

Sadie’s blog post also discussed which fields are required for a PBCore file to be valid. With these in mind, we’ve created two XML documents that represent the “simplest” PBCore files possible. These documents have just enough information to validate, thought they do not actually tell us much about the assets they are describing.

The Simple Instantiation sample describes a single instantiation. It is possible to do this by using <pbcoreInstantiationDocument> as the root element. In this case, there are only two required elements: <instantiationIdentifier> and <instantiationLocation>

The Simple Description Document sample describes an asset. The three fields required for this to be valid are <pbcoreIdentifier>, <pbcoreTitle>, and <pbcoreDescription>.

By examining these samples, we can see that the bare minimum fields required for validation do not tell us much about the assets they are describing. PBCore documents this bare would almost certainly never appear in the wild. These documents were designed to show what PBCore looks like in its most simple form, in order to give those of you that might be confused about the standard an idea as to what is actually going on in these XML files. The rest of the examples discussed will have far more information in them, but are still structured similarly to these two simple documents.

Archival Description

In this use case, the assets PBCore is describing are physical objects. However, before we begin looking at the example, I would like to take a quite aside to discuss some vocabulary

In the parlance of PBCore, the term “Intellectual Content” is used to refer to what is typically known in the archival community as “Descriptive Metadata”. This includes information that can help to identify an asset, as well as describe the content of the asset. “Intellectual Property” is used to refer to what is typically known in the archival community as “Administrative Metadata”. This is information that concerns the creation, authorship, and ownership of the asset. For the purpose of this blog post, the terms DMD (Descriptive MetaData) and AMD (Administrative MetaData) will be used. Another type of metadata that will be discussed is Technical Metadata (TMD). This type of metadata refers any attributes which describe the physical or digital properties of an object. The size of a book, the playback speed of a tape, and the frame width of a digital video are all examples.

The example PBCore document for Archival Description can be described with the following model:archival_description

At the root level there is a <pbcoreDescriptionDocument> element. Within this element, we have DMD, AMD, and a Physical Instantiation.

The DMD is made up of a number of fields that describe the asset and its content. These fields include <pbcoreIdentifier>, <pbcoreTitle>, <pbcoreDescription>, <pbcoreGenre>,  and <pbcoreCoverage>.

The AMD section is made up of a number of fields that describe who was involved with the creation of the asset. In this case, these fields include <pbcoreCreator> and <pbcoreContributor>. You may notice that these elements are repeated many times within this document. This is a perfectly valid way to use these elements, since many a/v and broadcast assets have a number of creators, contributors, and copyright holders.

The physical instantiation exists within the <pbcoreInstantiation> element. All of the information in this section refers to the actual physical instantiation, which in this case is a Master version of a VHS tape, about an hour and a half in duration, that resides in the McHale University Library.

 The information contained in these three sections affords a number of interactions with the asset. It can aid users in finding this asset, it can help users determine what the rights associated with the asset may be, and how it can help users play back the physical instantiation of the asset, among other things.

Asset Managment

PBCore can also be used to aid in asset management. In this case, the PBCore document describe the physical and digital locations of A/V assets and their digital derivatives, as well as the relationships between the tapes and files. The example file can be illustrated with the following model:

 asset_management

In this model we see that like the last example, the asset is represented by a single pbcoreDescriptionDocument. However, unlike the earlier example, this asset is also described by three different pbcoreInstantiations. This is a many-to-one relationship, where each instantiation is related to the original content, but has a different physical or digital format.

The physical instantiation describes the original object, in this case, a reel of 1/4 inch audio tape. The first digital instantiation describes the 96kHz/24bit preservation master WAV file, and the second digital instantiation describes MP3 access copy derived from that master file. In the PBCore file, each instantiation section includes DMD, AMD, and TMD associated with that specific object. This information describes the properties of the object, the content contained on the object, technical details about the object that aid in playback and discovery, and also how the objects relate to one another.

The <instantiationRelation> element is a parent element that contains the <instantiationRelationType> and <instantiationRelationIdentifier> elements. The <instantiationRelationIdentifier> element contains the identifier of the instantiation or object that the instantiation in question is related to, and the <instantiationRelationType> element describes that relationship. For example, the preservation master is “Derived From” the physical object, and the acccess copy is then “Derived From” the preservation master.

The purpose of this example is to show that how a single asset can be described using as many instantiations as necessary. These instantiations contain information used for describing the objects, as well as relating them to one another, which aids greatly with asset management.

Digital Preservation

Another powerful aspect of PBCore is that it allows the inclusion of fields from other existing metadata standards. In the example for the Digital Preservation use case, PREMIS data is embedded in the PBCore file in order to combine the descriptive power of PBCore with an existing standard for preservation metadata. The following model illustrates how the PREMIS fits into the PBCore conceptually.

 digital_preservation

The idea here is that each instantiation contains a PREMIS event, and that the PREMIS events that concern reformatting, transferring, or transcoding an instantiation to another links those two instantiations together.

Within the PBCore, this all occurs in the <pbcoreExtension> or the <instantiationExtension> field, depending on whether the PREMIS event is at the Description Document level or the Instantiation level. In a general sense, these fields can contain any number of fields from another existing metadata standard. The example XML files provided on the website demonstrate how to use both the <pbcoreExtension> and <instantiationExtension>. Which element you should use will depend on what kind of information you are gathering, and how you want you information structured. <instantiationExtension> should be used for metadata concerning the specific instantiations, and the <pbcoreExtension> element should be used for information that concerns the asset across all instantiations.

Sharing and Exchanging

PBCore can aid in publishing or transmitting your assets through the use of the <pbcoreCollection> element. An example of an XML file that does this can be found here, and the following model illustrates the overall structure of the XML.

sharing_and_exchanging

In this model, we see that the <pbcoreCollection> element is at the root-level. This element has a number of attributes (collectionTitle, collectionDescription,  collectionSource,  collectionRef, and collectionDate) which provide DMD and AMD for the overall collection of assets to be held within the document. The assets within the collection are represented by different <pbcoreDescriptionDocument> elements, one for each asset. In the model pictured above, the description documents look similar to the description document discussed in the Archival Description case study; however, these documents can contain any information as long as the required fields are included.

The purpose of using <pbcoreCollection> as the root element is that it allows the user to combine a number of description documents into a single XML file. From there, any sharing or transmission, such as publishing the collected asset as an RSS feed, can be enabled with the XML file.

 PBCore in METS

In the Digital Preservation example we saw that it possible to embed other metadata standards in PBCore. In this example we’ll flip that around and look at embedding PBCore in another standard, in this case METS. METS (Metadata Encoding and Transmission Standard) is used by institutions to move and ingest files across content management systems. It was designed to have a malleable structure so that it could be used by a number of different institutions for a number of different uses. The model below shows that a single METS document can have the the following sections: Header, Descriptive Metadata Section (dmdSec), File Section (fileSec), Structure Map (structMap), Structure Link (structLink), and Administrative Metadata Section (amdSec). Within the amdSec we typically see Technical Metadata (techMD) and Source Metadata (sourceMD) sections. Both of these can be seen as Technical Metadata with the techMD referring to information about digital files, and sourceMD containing technical metadata about the physical source objects.

pbcore_in_mets

The METS guideline suggests that these sections to be filled with existing metadata standards, and since PBCore provides fields for type of descriptions used in the METS amdSec.

The XML files must be structured very specifically, and the example on the PBCore website can be referenced if you wish to see what this would actually look in the XML. The example is actually based off of METS XML delivered to Columbia University Libraries by George Blood Audio/Video/Film for a large video reformatting project.

 What’s Next?

We hope that this blog post has helped you to understand some of the many ways that PBCore can be used to empower your assets. We hope that if you were on the fence about using PBCore at your institution that this blog post clarified any questions you may have about doing so. However, we want to stress that you should make sure that you have the IT and content management systems necessary to support PBCore before you embark upon a PBCore initiative. The examples discussed in this blog post are meant to be informative and not prescriptive, so please keep in you may have to add or remove elements from these examples so that they meet the needs of your institution.

Leave a Reply