PBCore data exchange with the Pop Up Archive

You may have heard about the Pop Up Archive, which if you haven’t, you should. From my vantage point, they are solving a key problem of radio producers (and anyone creating digital audio) by providing:

- A simple way to upload and preserve the highest-resolution digital audio files with the Internet  Archive as the back end

- A cataloging tool for creating descriptive metadata to audio assets and collections

- An automatic metadata extraction tool for additional metadata, including full speech-to-text – transcripts

- An embeddable audio player for each asset so you can add them to your web pages.

Funded by the Knight Foundation and the National Endowment for the Humanities, the Pop Up Archive is carving out an important niche in preserving public radio for local stations and independent producers, like the Kitchen Sisters and Studs Terkel. Partners include KQED, WCRW, StoryCorps, NPR, and my station WILL.

You can find out more about the Pop Up Archive on their main website, and on their Tumblr blog.

I’m excited about the potential for this kind of service, but what makes me really happy is they are set up to ingest audio archives via PBCore.  And here’s the best part: this can be done automatically with zero extra effort on the part of anyone. How does this work?

At WILL our producers (news reporters, program hosts and producers) create audio content for broadcast that also gets published to our website. They use a standard workflow to get the content into our website Content Management System, where it goes out to the web as web pages and RSS feeds. As with most CMSs, content is presented to the front end (i.e. actual people) with a template system that pulls data from a database and builds HTML pages dynamically based on a given page url.  RSS feeds are created the same way, with XML feeds built dynamically from a given feed url.  Examples:

Web page: http://will.illinois.edu/news

RSS feed: http://will.illinois.edu/news/rss

We took this one step further, and using the same method built a PBCore template based on the 2.0 schema. This is an XML template similar to an RSS feed, but the PBCore schema is a bit more complex and has more detailed metadata.  Our PBCore feed uses pbcoreCollection as the root element, and by default includes the most recent 10 items.

PBCore feed: http://will.illinois.edu/news/pbcorecollection

The Pop Up Archive has a PBCore parser that checks this feed every day or so, and ingests any new content since the last check. Think of it this way: If you use iTunes to subscribe to podcasts, iTunes is doing the exact same thing with podcasts. The difference is the PBCore feed contains much more metadata. And after ingesting my content the Pop Up Archive is adding a ton of services to transcribe and preserve the audio content and the metadata.

Enabling exchange of media content and metadata with services like the Pop Up Archive is exactly why PBCore was designed. The American Archive ingests media archives and data the same way. As more services adopt PBCore as an ingestion format, and more producers become PBCore data sources, the universe of public media content that is both web-accessible and preserved for the long term will expand.

It’ll be even more interesting to see what we can do with that.

Leave a Reply