Crossref Event Data

Crossref Event Data
Name	Crossref Event Data
Abbreviation	CED
Type	scholarly infrastructure
Founded	2012
Owner	Crossref
Status	inactive

Contents

Overview
Data sources and coverage
Data model and identifiers
Access methods and APIs
Use cases and applications
Limitations, privacy, and licensing
History and development

Crossref Event Data Crossref Event Data served as a metadata service that captured activities and interactions around scholarly works, aggregating mentions, citations, and social signals tied to persistent identifiers. It aimed to complement bibliographic indexing by registering events generated by platforms, repositories, publishers, and aggregators, enabling researchers, librarians, funders, and developers to analyze attention to publications. The service connected digital objects with online discourse across platforms and institutions.

Overview

Crossref Event Data collected event records linking scholarly objects to external activities produced by platforms such as Twitter, Facebook, Mendeley, YouTube, and GitHub. It interoperated with identifier schemes used by Digital Object Identifier, ORCID, PubMed, arXiv, and Scopus to normalize references to works. The project was operated by the scholarly infrastructure organization Crossref in coordination with stakeholders including DataCite, OpenAIRE, Jisc, and the Association of Research Libraries.

Data sources and coverage

Event Data ingested signals from a mix of social media, aggregators, and institutional sources: platforms such as Reddit, LinkedIn, Zenodo, Figshare, F1000Research, and ResearchGate; reference managers like Zotero and Mendeley; and citation indexes such as Web of Science and Scopus for cross-referencing. It also recorded mentions from news outlets like The New York Times and Nature, and blog platforms including WordPress and Medium. Coverage varied with API access and platform policies: large commercial platforms such as Facebook and Twitter had fluctuating access regimes, while open repositories like arXiv and PubMed Central offered more stable streams. Institutional partners including Harvard University, MIT, University of Oxford, University of Cambridge, and National Institutes of Health contributed metadata alignments.

Data model and identifiers

The Event Data model represented events as JSON objects linking actors, actions, and targets using persistent identifiers: Digital Object Identifier for works, ORCID for researchers, International Standard Serial Number for journals, and Uniform Resource Identifiers for web resources. Events encoded provenance using entities such as Crossref member metadata, publisher feeds from organizations like Elsevier, Springer Nature, Wiley, and Taylor & Francis, and repository records from Zenodo and Figshare. The schema was informed by standards from Schema.org, Dublin Core, and initiatives such as Project COUNTER and OpenAIRE Guidelines. Identifier resolution linked to registry services like DataCite to disambiguate versions and corrections tied to works indexed in PubMed and Scopus.

Access methods and APIs

Users accessed Event Data via RESTful APIs, bulk data dumps, and streaming endpoints compatible with tooling from ElasticSearch, Kibana, and Apache Kafka. The service offered query parameters modeled after practices used by Crossref Metadata API and mirrored patterns from Europe PMC and ORCID APIs. Client libraries and integrations were built in languages and platforms common to research IT teams at institutions like Stanford University, California Institute of Technology, ETH Zurich, and projects such as OpenRefine and Jupyter Notebook ecosystems. Access control and rate limiting followed precedents from GitHub API and Twitter API management.

Use cases and applications

Event Data supported altmetrics analyses employed by services such as Altmetric, PlumX, and research assessment teams at Wellcome Trust, National Science Foundation, and European Research Council. Use cases included tracking policy citations in documents from bodies like United Nations agencies, monitoring media coverage in outlets like BBC News and The Guardian, and surfacing code reuse from repositories like GitHub and Bitbucket. Libraries and research offices at institutions such as Columbia University, University College London, and Imperial College London used Event Data for reporting, while publishers including PLOS and eLife used it to augment article-level metrics and reader engagement dashboards.

Limitations, privacy, and licensing

Event Data faced limitations from platform policy changes (notably at Twitter and Facebook), data access restrictions from commercial services like Elsevier and Clarivate, and representational bias favoring English-language and Western platforms such as Twitter and Reddit. Privacy considerations required adherence to regulations influenced by laws like General Data Protection Regulation and national statutes in jurisdictions including United States and European Union. Licensing and reuse were governed by terms set by Crossref and data providers; some sources imposed restrictive licenses that constrained redistribution and commercial reuse, while open repositories such as Zenodo and Figshare allowed broader sharing under permissive licenses.

History and development

The initiative emerged from coordination among metadata stakeholders including Crossref, DataCite, and funders such as Wellcome Trust and Bill & Melinda Gates Foundation to capture non-traditional scholarly engagement. Development milestones aligned with global efforts like Scholix and events such as meetings at International Conference on Dublin Core and Metadata Applications and Force11 workshops. Major technical contributions came from teams at Digital Science, PLOS, and academic partners at University of Southampton and University of Birmingham, with governance informed by committees representing publishers like Springer Nature and infrastructure organizations like OpenAIRE.

Category:Scholarly communication