Generated by GPT-5-mini| DPHEP (Data Preservation in High Energy Physics) | |
|---|---|
| Name | DPHEP (Data Preservation in High Energy Physics) |
| Formation | 2009 |
| Type | Scientific collaboration |
| Region | International |
DPHEP (Data Preservation in High Energy Physics) is an international study group and initiative focused on preserving, curating, and enabling long-term reuse of data from particle physics experiments. It engages with major laboratories, collaborations, and funding agencies to develop policies, technical frameworks, and community practices for preserving collision datasets, detector metadata, software, and documentation. DPHEP bridges experimental collaborations, computing centers, and archives to extend the scientific legacy of projects beyond their operational lifetimes.
DPHEP arose amid growing recognition that datasets from facilities such as CERN, Fermilab, DESY, SLAC National Accelerator Laboratory, and KEK retain scientific value long after active data taking by collaborations such as ATLAS, CMS, ALICE, LHCb, CDF, and DØ. The initiative interacts with stakeholder organizations including European Organization for Nuclear Research (CERN), European Commission, National Science Foundation, STFC, and national laboratories to align preservation practices with mandates used by Digital Curation Centre, International Council for Science, and scholarly infrastructures like InspireHEP and arXiv. In doing so, DPHEP addresses provenance, reproducibility, and open data goals emphasized by entities such as OECD, UNESCO, and the European Research Council.
DPHEP was formed following workshops and recommendations that involved leaders from experiments, computing projects, and funding bodies including representatives from ATLAS, CMS, BaBar, Belle, and HERA. Governance has been coordinated through steering committees and working groups incorporating individuals from CERN Open Data Portal, WLCG, HEPData, and regional computing grids like EGI and Open Science Grid. Chronologically, milestones include white papers and reports presented at conferences such as ICHEP, CHEP, and meetings hosted at CERN and major laboratories, with contributions from projects like LEP experiments, Tevatron programs, and B-factory collaborations. Leadership interactions have linked DPHEP with agencies including European Commission, National Institutes of Health (where applicable to data policy discussions), and national research councils.
DPHEP defines objectives to preserve raw and processed datasets, associated metadata, and analysis software to enable future verification, re-analysis, and education. Principles draw from archival standards promoted by ISO committees, scholarly communication norms highlighted by CrossRef and DataCite, and open-science policies advocated by European Open Science Cloud and OpenAIRE. Emphasis is placed on sustainability, interoperability, documented provenance aligned with models used by CODATA and persistent identifiers compatible with DOI practices. The initiative promotes tiered access models that respect collaboration governance seen in experiments like Belle II and experiments transitioning from active operation to legacy stewardship such as LEP.
DPHEP advocates preservation models that range from basic documentation and metadata archiving to full preservation of software environments and virtualized analysis stacks. Technologies referenced include virtualization and containerization solutions used in CERNVM, Docker, Singularity, and reproducible-build practices employed in GitHub repositories and continuous integration systems akin to those in GitLab. Data repositories and metadata services include HEPData, Zenodo, and institutional archives at CERN Document Server, while compute preservation leverages infrastructures like WLCG, national grids, and cloud services provided by vendors used by major labs. Formats and standards such as ROOT (software), HDF5, and provenance frameworks developed in collaborations with initiatives like RDA inform DPHEP technical guidance.
Major preservation efforts associated with DPHEP include coordination with CERN Open Data Portal releases from CMS and ALICE, legacy preservation of BaBar and LEP data, and collaborative work with HEPData and InspireHEP for metadata and bibliographic linkage. Partnerships extend to national bodies and projects such as Fermilab Scientific Computing Division, DESY Data Management, SLAC National Accelerator Laboratory archiving programs, and integration with initiatives like EOSC Portal and regional repositories. Workshops and task forces have produced reports in collaboration with organizations including ICFA, APPEC, and funders like ERC and DOE.
DPHEP-enabled preservation has facilitated re-analyses, methodological studies, and pedagogical use of legacy datasets from experiments including Tevatron, LEP, and early LHC runs, enabling follow-up work by independent groups, university courses, and interdisciplinary teams from institutions such as MIT, University of Oxford, Princeton University, and University of Tokyo. Outcomes include validation of past results, novel searches using preserved data, and generation of benchmark datasets for machine-learning research involving groups from CERN Openlab and tech collaborations with companies active in high-performance computing. The initiative supports citation and credit mechanisms consistent with DataCite DOIs and scholarly practices promoted by ORCID.
Ongoing challenges include funding models for long-term stewardship, legal and governance issues around collaboration ownership exemplified by disputes in legacy experiment transitions, and technical obsolescence of software and hardware stacks. Future directions emphasize integration with global research infrastructures such as EOSC, adoption of persistent identifiers and FAIR-aligned practices championed by GO FAIR, strengthening ties with computing developments from EuroHPC and national supercomputing centers, and expanding training through summer schools and curricula at universities and laboratories. Ensuring sustained institutional commitments from organizations like CERN, Fermilab, DESY, SLAC National Accelerator Laboratory, and funding agencies will determine DPHEP’s ability to secure high-energy physics data as a durable scientific resource.
Category:Scientific organizations Category:High energy physics