OpenCitations — LLMpedia

OpenCitations
Name	OpenCitations
Founded	2010
Founder	David Shotton
Location	International
Focus	Scholarly citations, bibliographic metadata, open science
Products	OpenCitations Corpus, OpenCitations Indexes, COCI, REST API, SPARQL endpoint

Contents

History
Mission and Scope
Data Sources and Models
Services and Tools
Governance and Funding
Impact and Reception
Technical Infrastructure

OpenCitations OpenCitations is an independent scholarly infrastructure initiative that provides openly accessible citation data and tools to support bibliometrics, digital scholarship, and open science. Founded by David Shotton and collaborators, it develops machine-readable citation corpora and services for researchers, librarians, publishers, and funders. Its outputs aim to intersect with projects across scholarly communication such as Crossref, ORCID, Wikimedia, and the Initiative for Open Citations.

History

OpenCitations emerged after debates surrounding citation data accessibility involving organizations like Crossref, Elsevier, Wiley, Springer Nature, and advocacy groups including SPARC and ROARMAP. The initiative was launched by David Shotton following contributions to discussions linked to the Wellcome Trust and the University of Oxford, drawing attention during policy developments at institutions such as the European Commission, UK Research and Innovation, and the National Institutes of Health. Early work interfaced with datasets maintained by PubMed Central, Scopus, and Web of Science while collaborating with community actors including DataCite, ORCID, and Creative Commons. Over time, OpenCitations established projects and indexes like COCI that responded to policy pronouncements from bodies such as the Committee on Publication Ethics and the Research Excellence Framework.

Mission and Scope

The stated mission centers on enabling transparent citation networks to support reproducible research and equitable access, aligning with principles advocated by entities such as UNESCO, OpenAIRE, Plan S, and the Wellcome Trust. Its scope covers bibliographic references, persistent identifiers, and provenance metadata relevant to stakeholders including university libraries (e.g., British Library, Library of Congress), scholarly societies (e.g., American Chemical Society, Institute of Electrical and Electronics Engineers), and research infrastructures like CERN and ELIXIR. OpenCitations positions itself within a global ecology of open infrastructures alongside projects such as Wikidata, Zenodo, Dryad, and DataVerse to facilitate citation transparency across disciplines reflected in journals like Nature, Science, PLoS ONE, and The Lancet.

Data Sources and Models

OpenCitations harvests and ingests citation metadata from sources that include cross-publisher deposits and publicly released reference lists, interacting with identifier systems such as DOI, ORCID, ISSN, and ISBN. It converts input into interoperable models using semantic web standards developed in contexts like the World Wide Web Consortium and integrates vocabularies related to the Dublin Core and Schema.org as well as the SPAR Ontologies lineage. Its citation index models enable mapping between bibliographic records sourced from repositories including arXiv, PubMed Central, HAL, and institutional repositories at universities like Harvard University and University of Cambridge. The project emphasizes provenance and the rights metadata that various publishers and aggregators, for instance Taylor & Francis and Sage Publications, choose to expose.

Services and Tools

OpenCitations provides core services such as the OpenCitations Corpus, COCI (the OpenCitations Index of Crossref open DOI-to-DOI citations), a REST API, and a public SPARQL endpoint for graph queries. Tools and integrations extend to visualization and analysis platforms used by projects like VOSviewer, Gephi, Cytoscape, and digital humanities initiatives at institutions such as the British Library and the Bibliothèque nationale de France. Collaborations have enabled interoperability with scholarly infrastructure services provided by Crossref Event Data, DataCite Commons, and the OpenAIRE Explore portal. The service suite supports export formats and workflows compatible with reference managers like Zotero and Mendeley and with citation index adopters including university research offices and funders such as the Gates Foundation.

Governance and Funding

Governance has involved academic stewardship and advisory input from scholars and data professionals associated with institutions such as the University of Oxford, University of Bologna, and United Nations Educational, Scientific and Cultural Organization. Funding and support have combined philanthropy, project grants, and institutional backing drawing on sources like the Wellcome Trust, the Open Society Foundations, and competitive grants from national research councils such as UK Research and Innovation. Operational oversight has respected community governance practices similar to those found in consortia like ORCID and Crossref, while engaging with policy frameworks advanced by the European Research Council and funder coalitions advocating for open infrastructure.

Impact and Reception

OpenCitations has influenced bibliometric research, policy debates, and infrastructure planning referenced in reports by organizations such as ResearchGate, Clarivate, and the International Science Council. Its data have been used in studies appearing in journals like Journal of Informetrics, Scientometrics, and PLOS Biology and in analyses by research offices at institutions such as MIT and Stanford University. Reception in the scholarly community has ranged from endorsement by open science advocates including Peter Suber to critical discussion in forums involving actors like Elsevier and Clarivate Analytics about coverage and sustainability. The project has contributed to transparency conversations alongside efforts by Wikidata, Wikipedia, and policy movements like Plan S.

Technical Infrastructure

The technical stack relies on semantic web technologies, RDF triplestores, and graph databases implemented using tools and standards from projects like the World Wide Web Consortium, Apache Jena, and Virtuoso. Data pipelines utilize workflows and platforms common to research data management at organizations such as GitHub, Bitbucket, and computational facilities at institutions like CERN and national e-infrastructures including PRACE. The infrastructure supports machine-readable access through APIs and SPARQL, enabling integration with analytical ecosystems including RStudio, Jupyter, and cloud services from providers like Amazon Web Services and Google Cloud Platform.

Category:Open science