The descriptor language identifies the primary language of a media item’s audio or text. Alternative audio or text tracks and their associated languages should be identified using the descriptor instantiationAlternativeModes.
XPATH LOCATION:
/ pbcoreDescriptionDocument / pbcoreInstantiation / instantiationLanguage
USAGE RULES:
Occurs:
1 time or less
May Contain:
4 or less optional attributes, specific:
- source (text, may be empty)
- ref (text, may be empty)
- version (text, may be empty)
- annotation (text, may be empty)
Contained by:
Contained with:
[Any elements used MUST appear in this relative order]
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationIdentifier
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationDate
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationDimensions
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationPhysical
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationDigital
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationStandard
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationLocation
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationMediaType
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationGenerations
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationFileSize
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationTimeStart
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationDuration
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationDataRate
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationColors
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationTracks
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationChannelConfiguration
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationLanguage
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationAlternativeModes
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationEssenceTrack
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationRelation
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationRights
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationAnnotation
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationPart
- /pbcoreDescriptionDocument/pbcoreInstantiation/instantiationExtension
EXAMPLES:
-
<instantiationLanguage source="ISO 639.2" ref="http://www.loc.gov/standards/iso639-2/php/code_list.php">eng;fre</instantiationLanguage>
-
<instantiationLanguage source="ISO 639.2" ref="http://www.loc.gov/standards/iso639-2/php/code_list.php" annotation="Algonquian languages">alg</instantiationLanguage>
VOCABULARIES:
- PBCore recommends use of the ISO 639.2 or ISO 639.3 3-letter language codes.
- If the media item has more than one language that is considered part of the same primary audio or text, then a combination statement can be crafted, e.g., eng;fre for the presence of both English and French in the primary audio. Separating three-letter language codes with a semi-colon (no additional spaces) is preferred.
- Alternative audio or text tracks and their associated languages should be identified using the descriptor instantiationAlternativeModes.
The strange formatting of this field makes handling these values really challenging. Since PBCore is expressed in XML, I recommend that instantiationLanguage simply be maxOccurs = unbounded. The odd manner of handling language expressions as semicolon-delimited strings is inconsistent with the rest of the standard and a hassle to manage when building or parsing a PBCore document.
For example, having:
<instantiationLanguage>eng</instantiationLanguage>
<instantiationLanguage>spa</instantiationLanguage>
would be a lot more convenient, then having to deal with the string parsing inside:
<instantiationLanguage>eng;spa</instantiationLanguage>
Dave
I second Dave’s motion.
Martina McGinn added this on the PBCore Talk listserv:
“I agree and suggest that instantiationAlternativeModes could also
warrant multiple occurrences (e.g. a film dubbed in English and sub-
titled in Spanish). And it seems logical that if instantiationLanguage
allows multiple occurrences then so should essenceTrackLanguage.”
Seems like a minor schema tweak would iron out these rough spots. But of course there are others, and we need to follow through on identifying them, like this one.
Jack
I found ISO-639-3 largely unsuitable for this, I wanted to be able to represent different variations of english (e.g. en-US and en-GB) which led me to stumble upon http://www.rfc-editor.org/pdfrfc/rfc5646.txt.pdf in Language-Region format which in my opinion should be the standard here, of course I welcome opinions.
This discussion was also started on Stack Overflow: http://stackoverflow.com/questions/15965003/what-language-standard-should-i-use-if-i-want-to-at-least-differentiate-between