Skip to content

Latest commit

 

History

History
73 lines (40 loc) · 6.65 KB

README.md

File metadata and controls

73 lines (40 loc) · 6.65 KB

Data documentation

URL Look-ups for Preview, Object in Context, Services, etc.

For the majority of the metadata provided to this DLME demonstration, URLs for the digital object in context, the thumbnail, an oEmbed or IIIF service representations, or related, were not consistently provided.

As such, where we were able to discern the logic, we have to develop DLME-specific Traject macros specific to data providers to then generate these URLs. Examples of this work include:

For future data providers, being able to perform URL look-ups or generation based on values in the records (in particular, identifiers) will continue to be tricky and probably require the writing of similar macros.

Controlled Vocabularies & Look-ups (Translation Maps)

Right now, the following controlled vocabularies / look-ups are used:

None of these normalizations or look-ups currently employ fuzzy matching; if the provided field value doesn't exactly match a value in the translation map, then a match is not made. See https://github.com/traject/traject#translation-maps

Languages Normalization

Language (cho_language) is a mandatory if applicable field for DLME objects. Discovery of objects via clear and consistent language facets is a prioritized discovery path we hope to support in this work. As such, we do normalize as able languages mapped to DLME objects to display labels from iso639-2b - the labels, not the codes.

Here is our existing logic for performing that normalization, including where we pass through the unnormalized values (or not), and where you can find the look-up mappings:

MODS:

  1. If language/languageTerm[@authority="iso639-2b"][@type="text"] exists, use that text term as found.
  2. If 1. doesn't exist, then check if language/languageTerm[@authority="iso639-2b"][@type="code"] exists, and normalize that code using Traject's existing marc_language translation map to generate the normalized label. See our note on iso639-2b / marc languages overlap below. The unnormalized value is not included in the output.
  3. Otherwise, map whatever is in language/languageTerm to iso639-2b labels as best possible. At present, this means normalizing the text value using Traject's existing marc_language translation map, and also passing through the unnormalized value as well.

FGDC:

We are not expecting or mapping language values from FGDC in the current application.

TEI:

  1. If teiHeader/fileDesc/sourceDesc/msDesc/msContents/textLang/@mainLang exists, normalize that code using Traject's existing marc_language translation. to generate the normalized label. See our note on iso639-2b / marc languages overlap below.
  2. Otherwise (if 1 does not exist), map whatever is in teiHeader/fileDesc/sourceDesc/msDesc/msContents/textLang to iso639-2b labels as best possible. At present, this means normalizing the text value using Traject's existing marc_language translation map, and also passing through the unnormalized value as well. The better ability to support fuzzy lookup is an identified area of work in future cycles.
  3. In addition to 1 & 2 (no matter the outcome), check if teiHeader/fileDesc/sourceDesc/msDesc/msContents/textLang/@otherLangs exists, normalize that code using Traject's existing marc_language translation to generate the normalized label. See our note on iso639-2b / marc languages overlap below.

MARC:

We use the existing Traject method marc_languages to generate language labels (MARC Language labels) for these records. 008[35-37] and 041 are the sources for languages normalized then using the marc_language translation map.

American Numismatic Society Local CSV:

We are not expecting or mapping language values from the local ANS CSV schema in the current application.

Yale Local CSV:

If Language column value exists, we split the values based off of '|' as a delimiter, then use each as label as found (they are close enough to iso639-2b based on preliminary metadata analysis).

Met Local CSV:

We are not expecting or mapping language values from the local Metropolitan Museum CSV schema in the current application.

Penn Museum Local CSV (Near Eastern & Egyptian):

We are not expecting or mapping language values from the local University of Pennsylvania Museum CSV schema in the current application.

Note on ISO639-2b / MARC languages overlap

We aim to normalize all language values to the labels mapped to ISO639-2b, or a vocabulary of 3 characters codes for languages.

MARC Languages are codes are equivalent to those of ISO 639-2b codes and partially to those of ISO 639-5. However, the language name labels can differ between these vocabularies (some derived from their English name rather than local name).

A future work cycle would be to analyze and update mappings, if needed, for stricter adherence to ISO639-2b labels instead of MARC language labels.