LLMpediaThe first transparent, open encyclopedia generated by LLMs

CERN Open Data Portal

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 55 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted55
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
CERN Open Data Portal
NameCERN Open Data Portal
TypeOpen data initiative
Founded2014
LocationGeneva, Switzerland
Parent organizationCERN

CERN Open Data Portal The CERN Open Data Portal provides access to research data produced by experiments at CERN, including datasets, software, and documentation from Large Hadron Collider, ATLAS experiment, CMS experiment, ALICE experiment, and LHCb experiment. It supports reproducible research by making collision data, simulation samples, and analysis tools available to scientists, educators, and the public from institutions such as European Organization for Nuclear Research, University of Oxford, Massachusetts Institute of Technology, Harvard University, and École Polytechnique Fédérale de Lausanne. The Portal links to training and outreach activities connected with projects like Particle Physics Summer School, CERN Summer Student Programme, OpenAIRE, and EOSC.

Overview

The Portal aggregates datasets, software, and metadata associated with major collider projects like Large Hadron Collider and detectors such as ATLAS experiment, CMS experiment, ALICE experiment, and LHCb experiment. It exposes data produced by collaborations including ATLAS Collaboration, CMS Collaboration, ALICE Collaboration, LHCb Collaboration, and services managed by European Organization for Nuclear Research. Users encounter formats and tools familiar to communities around ROOT (software), Geant4, HEPMC, Rivet (software), and Herwig. The Portal interoperates with platforms like Zenodo, INSPIRE-HEP, GitHub, OpenAIRE, and EOSC to enhance discoverability.

History and development

Initiated in 2014 under policy directions from European Organization for Nuclear Research management and advisory input from bodies such as the Open Science Grid and Research Data Alliance, the Portal evolved from internal preservation projects and precedents like the Data Preservation in High Energy Physics collaboration. Early milestones involved release campaigns tied to analyses from Large Hadron Collider runs, including datasets associated with results that appeared in journals like Physical Review Letters and Journal of High Energy Physics. Development engaged software efforts from laboratories and universities such as Fermilab, SLAC National Accelerator Laboratory, CERN OpenLab, Université de Genève, and Imperial College London and collaborations with initiatives including OpenAIRE and DataCite for metadata and DOI assignment.

Content and datasets

The Portal hosts datasets spanning reconstructed collision events, Monte Carlo simulations, detector calibrations, and derived analysis-level ntuples produced by ATLAS experiment, CMS experiment, ALICE experiment, and LHCb experiment. Content classifications include preserved production samples, validation samples, and educational simplified datasets derived for outreach programs like International Masterclasses. Data releases often reference provenance metadata standards used by DataCite and indexing in bibliographic services such as INSPIRE-HEP. The Portal also includes associated software packages, example analysis code, and simulation configurations from projects such as Geant4, PYTHIA, Herwig, and ROOT (software).

Access, licensing, and data formats

Datasets are accessible under explicit licenses and data usage conditions influenced by policy frameworks like those developed by European Organization for Nuclear Research and aligned with metadata practices from DataCite and Creative Commons. Common distribution formats include ROOT (software) files, HEPMC event records, and text-based configurations for Geant4 and PYTHIA. Access mechanisms integrate storage and delivery systems such as EOS (CERN), CERNBox, and federated repositories like Zenodo. DOIs enable citation in venues including Physical Review D and Journal of High Energy Physics.

Tools, documentation, and education

The Portal bundles documentation and tutorials authored by teams from ATLAS Collaboration, CMS Collaboration, ALICE Collaboration, and LHCb Collaboration and supported by training networks like CERN Summer Student Programme and Particle Physics Summer School. Educational packages include simplified datasets for International Masterclasses and notebooks based on Jupyter Notebook and examples posted on GitHub. Analysis toolchains showcased include ROOT (software), Rivet (software), FastJet, and workflow examples interoperable with compute infrastructures like Open Science Grid and European Grid Infrastructure.

Impact, reuse, and notable research

Open releases have enabled independent validation, methodology studies, and pedagogical use across institutions such as University of Cambridge, Princeton University, Caltech, University of Tokyo, and University of California, Berkeley. Reuse cases include reinterpretation studies published in Physical Review Letters and Journal of High Energy Physics, method development involving machine learning groups at Google Research and DeepMind, and cross-experiment meta-analyses referencing datasets via INSPIRE-HEP. The Portal underpins citizen science and outreach linked with International Masterclasses and has informed policy discussions at forums like Research Data Alliance and OpenAIRE.

Governance and sustainability

Governance involves coordination among European Organization for Nuclear Research departments, experiment collaborations such as ATLAS Collaboration, CMS Collaboration, ALICE Collaboration, and LHCb Collaboration, and community stakeholders including Research Data Alliance and DataCite. Long-term preservation strategies rely on infrastructure from CERN IT, storage platforms like EOS (CERN), and partnerships with archival services exemplified by Zenodo and institutional repositories at University of Oxford and Université de Genève. Sustainability planning addresses software preservation, metadata curation aligned with DataCite schemas, and policy alignment with OpenAIRE and EOSC.

Category:Open data