Report of the PBCore Ontology Hackathon

The following post was re-blogged from the American Archive of Public Broadcastingand was written by Karen Cariani, Director of the WGBH Media Library and Archives and Project Director for the American Archive of Public Broadcasting.

This past weekend a group of dedicated PBCore enthusiasts met prior to the Code4Lib conference in a suburban Portland, Oregon house for two days.  It was a healthy mix of developers, archivists, and managers. The goal was to discuss how to move PBCore towards development of an RDF ontology.  With the desire to fully utilize repositories like Fedora 4 and the desire to store data as RDF streams, users of PBCore were beginning to talk about building a PBCore ontology.

Before I continue, I want to sincerely thank everyone else who participated in the hackathon:

  • Jack Brighton, Illinois Public Media
  • Glenn Clatworthy, PBS
  • Laurence Cook, MetaCirque
  • Casey E. Davis, WGBH
  • Jean-Pierre Evain, EBU
  • Rebecca Fraimow, WGBH
  • Peggy Griesinger, Museum of Modern Art (MOMA)
  • Rebecca Guenther, New York University
  • Julie Louise Hardesty, Indiana University
  • Cliff Ingham, City of Bloomington
  • Andrew Myers, WGBH
  • Adam Wead, Penn State
    the_whole_gang

PBCore is a metadata schema for audiovisual materials. Its original development in 2004 was funded by the Corporation for Public Broadcasting, with a goal of creating a metadata standard for public broadcasters to share information about their video and audio assets within and among public media stations. Since its conception, PBCore has been adopted by a growing number of audiovisual archives and organizations that needed a way to describe their archival audiovisual collections. The schema has been reviewed multiple times and is currently in further development via the American Archive of Public Broadcasting and the Association of Moving Image Archivists (AMIA) PBCore Advisory Subcommittee.

A number of PBCore users contribute to and are part of the Project Hydra community, a collaborative, open source effort to build digital repository software solutions at archives institutions. Hydra is built on a framework that uses Fedora Commons as the repository for storing metadata. Many users are seeking to update their Fedora repositories to the latest version (Fedora 4), which provides a great opportunity to develop an RDF data structure. If PBCore had an RDF ontology, it would be easier for PBCore users to take full advantage of Fedora 4 capabilities in managing data and encourage adoption of Fedora 4. In addition, managing data in RDF allows much more flexibility for data relationships and linking data to other repositories.

projections

Knowing how much work building an ontology can be, the hope was to build upon existing work that is already well established. In particular, the EBUCore ontology is quite developed and established. EBUCore was developed from the need of European broadcasting community to express audiovisual materials in common data structures to allow for easier sharing. There seemed no need to develop something that already exists and does much of what we need it to do.  In fact, the uses of EBUCore and PBCore are so similar we began to wonder why the two exist separately and we are not joining forces to develop one standard. Certainly in this day and age of limited resources and time, collaborating is more productive than working at odds with each on different but similar paths.

We were graced with the presence of Jean-Pierre Evain from the European Broadcasting Union (EBU)  He clearly showed us what EBUCore did, how it was so similar to PBCore, and how far they had gotten with an RDF ontology.  The gap between EBUCore and PBCore turned out to be not so far apart. Perhaps bridging that gap was easier than building a brand new ontology based on PBCore. Within a day, many of the issues had been identified, or it felt doable in a reasonable time frame with a solid workplan in place.

intense_work

The group quickly came to the decision to not start from scratch by building a PBCore ontology, but try to build a bridge between PBCore and EBUCore so PBCore adopters could use the EBUCore ontology. We even talked about a new name for this new collaborative schema.

However, it was fully recognized that current PBCore users would need a path for migration, and some would not be interested in using an RDF ontology and therefore migrating. So how do we manage this community of diverse needs? There is certainly more work to do within the PBCore community around communication and education. And the PBCore community should speak up about this idea.

I am always amazed at how productive it is to gather together, face to face, dedicated people. If not for setting aside the weekend to focus on this issue, the work and decision would have lagged for months through bi-weekly one-hour phone calls and virtual meetings. The group more or less self organized and stayed focused with great guidance from Casey Davis.  By the end, most everyone was in github making XSLT mappings from PBCore to EBUCore, as we completed a gap analysis (still in progress).  We finished the day with a plan to move forward and a group dinner.

The PBCore Schema Team is working on an updated version of PBCore (PBCore 2.1), the changes of which will consist of minor tweaks and bug fixes, and is expected to be released in March 2015. The group thought that this work should continue, until 2.1 is released. At this point work on PBCore XML schema should freeze and efforts will go into aligning with EBUCore – making sure elements can map across, that we all understand the mapping, and building tools to help with the mapping. The PBCore community needs to comment about this direction. Does it make sense? What are your concerns? The group that met is by no means the end of the discussion.

In the end, it was worth it.  For the cost of some snacks, and a home made pasta dinner, we had 11 people from across the country working on a solution, come to a consensus, and enjoy the camaraderie.  I really want to thank everyone who participated and took the time to join us.  It was a weekend after all.

The hackathon notes are documented here: http://wiki.code4lib.org/PBCore_RDF_Hackathon

To view or contribute to the XSLT mapping work, visit the github repo:https://github.com/WGBH/pbucore

Leave a Reply