LLMpediaThe first transparent, open encyclopedia generated by LLMs

CERN Data Storage

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: ALEPH (detector) Hop 5
Expansion Funnel Raw 90 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted90
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
CERN Data Storage
NameCERN Data Storage
Established1954
LocationGeneva, Switzerland
TypeScientific data repository
OperatorEuropean Organization for Nuclear Research

CERN Data Storage

CERN Data Storage supports large-scale scientific experiments at European Organization for Nuclear Research facilities, especially for projects like Large Hadron Collider operations and experiments such as ATLAS, CMS, ALICE, and LHCb. It underpins collaborations with institutions including Fermilab, DESY, SLAC National Accelerator Laboratory, and Brookhaven National Laboratory, enabling data workflows across grids and clouds coordinated with partners such as Open Science Grid and PRACE.

Overview

CERN Data Storage provides resilient, distributed storage and archival services for high-energy physics projects like Large Hadron Collider experiments, and multi-institution efforts including Worldwide LHC Computing Grid, European Grid Infrastructure, and Open Science Grid. It interfaces with research infrastructures such as ELIXIR, EOSC, and facilities like CERN Data Centre and Tier-0 centre nodes at Geneva and partner sites at CERN Meyrin. Operational governance involves stakeholders such as Council of the European Organization for Nuclear Research, IT Department (CERN), and collaborations with WLCG management boards and experiment computing coordinators from ATLAS Collaboration and CMS Collaboration.

Infrastructure and Architecture

The architecture is a layered system integrating on-premises systems at the CERN Data Centre with remote Tier-1 and Tier-2 sites at institutions including INFN, CC-IN2P3, KIT, RAL, and BNL. Core components include mass storage systems like CASTOR (software), EOS (CERN), and tape libraries from vendors used in conjunction with compute farms such as HTCondor pools and batch systems like SLURM. Metadata services coordinate with databases such as Oracle Database and MySQL instances and storage orchestration uses technologies inspired by OpenStack and container platforms like Kubernetes for microservices. Redundancy and replication strategies align with provenance tools used in collaborations including ATLAS Distributed Data Management and PhEDEx historical systems.

Data Acquisition and Management

Data acquisition pipelines collect raw detector outputs from experiments like ATLAS, CMS, ALICE, and LHCb using front-end electronics and DAQ systems, with trigger systems influenced by designs from Large Electron–Positron Collider experience. Event filtering and calibration chains reference software frameworks such as Gaudi (software), ROOT (software), and Geant4-based simulation outputs. Data management employs catalogues and policies from Rucio and integrates preservation metadata standards aligned with initiatives like FAIR data principles and projects coordinated with EOSC and OpenAIRE.

Storage Technologies and Media

Primary storage uses a mix of disk arrays, object storage, and magnetic tape libraries; major vendors historically include IBM, Hewlett-Packard Enterprise, Dell Technologies, and tape hardware from Quantum Corporation and IBM fab. File systems and object layers involve XRootD, Ceph, and distributed filesystems influenced by Lustre deployments at national labs such as Fermilab and DESY. Tape archives leverage libraries similar to technology used at National Physical Laboratory (UK) and archival practices comparable to European Organization for Nuclear Research preservation policies. Performance tuning draws on cache strategies used by ATLAS DDM and CMS PhEDEx derivatives.

Data Transfer and Networking

High-bandwidth networking is provided by backbones such as GÉANT and national research networks like RENATER, SURFnet, DFN, and JANET (UK), connecting to regional partners including ESnet and NORDUnet. Transfer protocols include GridFTP, FTS (File Transfer Service), xrootd, and newer approaches like HTTP/2 and Aspera-style accelerated transport, with monitoring via tools similar to perfSONAR and perfSONAR deployments used by WLCG. Network architecture integrates with optical infrastructure projects such as LHC Optical Private Network and collaborates with telecom providers and institutions like CERN Openlab.

Preservation, Curation, and Access

Long-term preservation aligns with community standards promoted by FAIR data principles and coordination with repositories and metadata efforts like DataCite and Invenio-based services. User access models support credentialing through federated identity systems including CERN Single Sign-On and eduGAIN, while publication linking integrates with scholarly services such as INSPIRE-HEP and preprint servers like arXiv. Data policy engagement involves bodies such as CERN Council and projects like OpenAIRE to enable open-data releases from experiments exemplified by ATLAS and CMS public datasets.

Security, Privacy, and Compliance

Security measures follow practices informed by collaborations with ENISA guidelines and national cybersecurity teams like CERT-CH and US-CERT-analogues at partner labs. Compliance with data governance engages legal frameworks relevant to host states Switzerland and France when applicable and aligns with institutional policies established by European Organization for Nuclear Research management, while incident response coordinates with groups such as WLCG Security Coordination and local CERN Computer Security Team operations. Operational resilience draws on auditing, encryption practices, and backup regimes used across research infrastructures including PRACE and EOSC partners.

Category:European Organization for Nuclear Research Category:Scientific data repositories Category:Large Hadron Collider