DataCite Metadata Schema

DataCite Metadata Schema
Name	DataCite Metadata Schema
Established	2009
Developer	DataCite
Latest release	4.4.0
License	CC0

Contents

Overview
History and Development
Structure and Elements
Identifier and DOI Integration
Implementation and Usage
Governance and Versioning
Criticisms and Limitations

DataCite Metadata Schema is a standardized metadata specification designed to describe research data, datasets, software, and related scholarly outputs to support discovery, citation, and reuse. It provides a structured set of fields for resource description, integrates with persistent identifier systems, and interoperates with repository, library, and publishing infrastructures. The schema is widely adopted by research institutions, data centers, and scholarly publishers to ensure machine-actionable metadata across repositories and infrastructures.

Overview

The schema defines metadata elements such as title, creators, contributors, publisher, publication year, resource type, descriptions, and rights to facilitate citation, indexing, and discovery in systems like CrossRef, ORCID, Zenodo, Figshare, Dryad and PANGAEA. It supports multiple identifier types including Digital Object Identifier, Handle System, and other persistent identifiers used by organizations such as DataCite, International DOI Foundation, International Council for Science, World Data System, and national libraries like the Library of Congress and the British Library. The specification is published in machine-readable formats and is compatible with exchange frameworks like OAI-PMH, Schema.org, and OpenAIRE.

History and Development

Originating from initiatives to improve data citation emerging alongside projects like GenBank, Protein Data Bank, arXiv, and programs funded by agencies such as the National Science Foundation, the schema evolved under coordination by DataCite since 2009. Influences include standards bodies and partners like ISO, CODATA, Research Data Alliance, Force11, WorldWideWeb Consortium, and national infrastructures such as Europeana, Australian National Data Service, German National Library of Science and Technology and consortia including EUDAT, SPARC, JISC and CLIR. Revisions have reflected user needs from communities in fields like Genomics, Climate Research, Astronomy Data Systems, and initiatives like Human Genome Project and CERN data stewardship projects.

Structure and Elements

The schema organizes elements into core categories: identifiers, creators, titles, publisher, publication year, subjects, contributors, dates, language, resource type, alternate identifiers, related identifiers, sizes, formats, version, rights, descriptions, and geo-location. Implementations map these to metadata models used by Dublin Core, METS, MODS, PREMIS, and linked-data vocabularies such as RDF, SKOS, and Schema.org. The resourceTypeGeneral controlled vocabulary references types familiar to communities around ICPSR, NOAA, USGS, and repositories like Figshare and Zenodo, enabling interoperability with infrastructures including DataONE and CICERO. Contributor roles align with standards like CRediT used by publishers such as Elsevier, Springer Nature, Wiley, and Taylor & Francis.

Identifier and DOI Integration

Integration with DOI infrastructure allows minting and resolution through registration agencies and systems such as CrossRef, International DOI Foundation, and handle-based services like CNRI. Metadata fields support linking to grant identifiers from funders such as the National Institutes of Health, European Research Council, Horizon Europe, and Wellcome Trust, and to researcher identifiers including ORCID and institutional identifiers managed by entities like GRID and ROR. The schema encodes relationships using relationType semantics compatible with DataCite Commons and related identifier practices promoted by FAIRsharing and standards advocated by FAIR Principles implementers.

Implementation and Usage

Repositories, institutional archives, and publishers implement the schema via repository platforms and services including DSpace, EPrints, Invenio, Fedora Commons, Koha, and cloud services like AWS and Google Cloud Platform used by research infrastructures. Workflows integrate with submission systems at universities such as Harvard University, Massachusetts Institute of Technology, University of Oxford, University of Cambridge, Stanford University, and consortia like CERN and European Organization for Nuclear Research. Indexers and aggregators such as Google Scholar, Microsoft Academic, OpenAIRE, Scopus, Web of Science, and national data catalogs ingest metadata exported in JSON, XML, and RDF to support discovery, citation tracking, and metrics.

Governance and Versioning

Governance is coordinated through DataCite’s technical committees, working groups, and member organizations that include research institutions, national libraries, and commercial entities like Elsevier and Clarivate. Versioning follows documented releases noting schema changes, backward compatibility considerations, and migration guidance used by projects like OpenAIRE, EUDAT, and national data services. Community input channels include discussions at conferences such as IDCC Conference, Open Repositories, RDA Plenaries, and coordination with standards bodies like ISO.

Criticisms and Limitations

Critiques focus on issues such as granularity for complex datasets used by projects like Large Hadron Collider, Square Kilometre Array, and Human Cell Atlas, ambiguities in contributor role semantics impacting publishers like Nature and Science, and challenges mapping legacy metadata from domain repositories like GenBank and PDB. Other limitations include uneven adoption across national infrastructures such as China National Knowledge Infrastructure and integration complexity for non-DOI identifier ecosystems in regions represented by organizations like SciELO and Latin American and Caribbean Health Sciences Literature. Ongoing work addresses expressivity, multilingual support, and alignment with emerging vocabularies used by initiatives including FAIR Data Principles, Linked Open Data, and Schema.org.

Category:Metadata standards