COnnecting REpositories (CORE)

COnnecting REpositories (CORE)
Name	COnnecting REpositories (CORE)
Formation	2011
Headquarters	Not specified
Type	Research infrastructure
Focus	Open access, scholarly communication, repository aggregation

Contents

Overview
History and Development
Architecture and Technology
Content Harvesting and Metadata Standards
Services and Tools
Impact and Usage
Governance and Funding

COnnecting REpositories (CORE)

COnnecting REpositories (CORE) is an open access aggregation service that harvests, indexes, and provides access to scholarly outputs from institutional repositories and open access journals. It serves researchers, libraries, and developers by offering a large corpus for discovery, text mining, and preservation, interfacing with international initiatives and infrastructures.

Overview

CORE aggregates metadata and full-text from thousands of institutional repositories, subject repositories, and open access journals to provide searchable collections and machine-accessible datasets. The service interconnects with initiatives such as the Directory of Open Access Journals, the OpenAIRE project, the European Research Council, and the Wellcome Trust to enhance discoverability for outputs associated with institutions like University of Oxford, Harvard University, and University College London. CORE's outputs support tools and platforms used by actors including the World Health Organization, the United Nations Educational, Scientific and Cultural Organization, and national research funders such as the National Institutes of Health and the UK Research and Innovation.

History and Development

CORE emerged in the early 2010s in response to growing interest from stakeholders including Jisc, the British Library, and research-intensive universities like University of Cambridge and Imperial College London aiming to aggregate repository content. Early development drew on collaborations with projects such as the Open Knowledge Foundation, the European Commission research units, and repositories linked to organizations like Elsevier and Springer Nature through metadata exchange experiments. Over time, CORE expanded its corpus and technical partnerships with initiatives including Crossref, DataCite, and the Public Library of Science to improve metadata quality and persistent identifier interoperability.

Architecture and Technology

CORE's architecture combines web crawling, OAI-PMH harvesting, and API-driven ingestion with indexing based on search engines and machine learning pipelines. The system leverages components similar to technologies used by projects like Apache Lucene, Elasticsearch, and toolkits associated with Stanford University and the Massachusetts Institute of Technology for scalable indexing and retrieval. Natural language processing and open-source machine learning frameworks developed in communities around Google Research, Facebook AI Research, and the Allen Institute for AI are used to process full-text for metadata extraction, deduplication, and content classification. Interoperability with identifier systems from ORCID, DOI, and Handle System underpins linking between researchers, institutions, and publications.

Content Harvesting and Metadata Standards

CORE harvests content via protocols and standards such as OAI-PMH and metadata schemas related to Dublin Core, Schema.org, and repositories following guidelines promoted by SPARC and the Open Archives Initiative. Metadata normalization aligns entries with persistent identifiers from Crossref and DataCite and author identifiers from ORCID. CORE collaborates with institutional repositories at universities including University of Edinburgh, University of Toronto, and University of California, Berkeley and with national libraries like the National Library of Scotland to ensure compliance with standards promoted by organizations such as the International Federation of Library Associations and Institutions and the Council of Europe cultural heritage programs.

Services and Tools

CORE provides APIs, dataset downloads, and a search portal consumed by academic services, discovery layers, and text and data mining platforms. Tools and integrations built on CORE data have been used in projects affiliated with Microsoft Research, IBM Research, and Amazon Web Services for large-scale analytics and AI training. Libraries and consortia including the California Digital Library, the Digital Public Library of America, and research infrastructures like CERN have used CORE-derived services for repository aggregation, research assessment dashboards, and compliance reporting tied to funders such as the Wellcome Trust and the European Research Council.

Impact and Usage

CORE's aggregated corpus has been used in bibliometric studies, systematic reviews, and machine learning research by teams at institutions like University of Cambridge, University of Oxford, Max Planck Society, and Chinese Academy of Sciences. Policymakers and funders including the European Commission and national research councils have leveraged CORE-derived indicators for evaluating open access uptake and compliance with mandates such as those from the National Institutes of Health. CORE supports discovery for scholars across disciplines represented at venues like the Association for Computing Machinery and the American Association for the Advancement of Science conferences.

Governance and Funding

CORE's operations and development have been shaped by collaborations among higher education organizations, libraries, and research funders including Jisc, the Open University, and funding programs of the European Commission and charitable trusts like the Wellcome Trust. Governance models reflect partnerships common to research infrastructures such as EUDAT, ELIXIR, and regional aggregators like the Digital Repository of Ireland, aligning sustainability planning with stakeholders including national libraries, universities, and international consortia.

Category:Open access repositories