LLMpediaThe first transparent, open encyclopedia generated by LLMs

EOS (CERN storage)

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 102 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted102
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
EOS (CERN storage)
NameEOS (CERN storage)
DeveloperCERN
Released2011
Programming languageC++
Operating systemLinux
GenreDistributed file system, Object storage
LicenseLGPL

EOS (CERN storage) is a distributed, high-performance storage system developed at CERN to meet the requirements of large-scale scientific projects such as the Large Hadron Collider, ATLAS, CMS, ALICE, and LHCb. It provides namespace services, data placement, and replication optimized for high-throughput, low-latency access patterns common to particle physics, astrophysics, and major data archives like WLCG and Open Science Grid. EOS integrates with storage hardware, networking fabrics, and middleware used by collaborations including EMC Corporation, Dell Technologies, HPE, Intel, and NVIDIA.

Overview

EOS was designed to serve experiments that produce multi-petabyte datasets, supporting workflows from data acquisition at CERN Meyrin and CERN Prévessin sites to analysis at regional centers such as KIT, Fermilab, BNL, and KIT GridKa. It balances the needs of projects affiliated with European Strategy for Particle Physics, Worldwide LHC Computing Grid, Helix Nebula, and national research infrastructures like STFC and CNRS. EOS complements other storage technologies such as Ceph, GlusterFS, XRootD, dCache, HDFS, and Swift, and interoperates with compute frameworks including HTCondor, Slurm Workload Manager, Kubernetes, and Apache Spark.

Architecture and components

EOS employs a modular architecture with distinct services: name servers, EOS disk servers, managers, and clients. The namespace is managed by EOS name servers designed to scale across clusters at sites like CERN Data Centre, Tier-1 Grid, and national laboratories including IN2P3 and Rutherford Appleton Laboratory. Storage backends integrate with SAN and NAS arrays from vendors such as NetApp and EMC, and with local NVMe, SSD, and HDD pools using hardware from Seagate and Western Digital. EOS uses a metadata layer inspired by concepts from Google File System and Andrew File System while providing protocols compatible with POSIX semantics, XRootD federation, and HTTP-based access used by W3C tools.

Data management and features

EOS supports replication, striping, erasure coding, and policy-driven placement for datasets produced by projects like ATLAS Experiment, CMS Experiment, IceCube, LIGO Scientific Collaboration, and EUROfusion. It provides streaming for analysis jobs launched via ROOT and supports containerized workflows with Docker and Singularity. Features include namespace quotas, checksum verification, automatic healing, and integration with catalog services such as Rucio and FTS used by WLCG. EOS exposes APIs for data discovery used by portals like INSPIRE-HEP, Zenodo, and archive systems at institutions like CERN Library.

Performance and scalability

EOS is optimized for throughput and parallel access, serving concurrent readers and writers for experiments undertaking data-taking campaigns like Run 2 of the LHC and Run 3 of the LHC. Benchmarks compare EOS against systems deployed by NASA, ESA, and national centers such as NERSC and Jülich Supercomputing Centre. Performance tuning uses techniques from RDMA ecosystems, InfiniBand fabrics, and TCP/IP offload strategies common in deployments at Fermilab, SLAC National Accelerator Laboratory, and TRIUMF. EOS scales to exabyte-class ambitions through sharding, namespace partitioning, and integration with hierarchical storage managers used by projects at CERN Openlab.

Security and access control

EOS integrates authentication and authorization mechanisms compatible with Kerberos, OAuth 2.0, OpenID Connect, X.509 certificates, and federated identity providers such as EUDAT and eduGAIN. It supports access control lists and role-based policies used by collaborations including CERN IT, INSPIRE, and regional centers like GridPP. Encryption at-rest and in-transit leverages libraries and standards used in infrastructures managed by ENISA and agencies like European Commission research programs. Operational security follows practices established by ISO/IEC 27001 and audit processes similar to those at European Space Agency facilities.

Deployment and use cases

EOS is deployed at CERN's primary data centers and by partner sites across the WLCG federation, serving experiments such as ATLAS, CMS, ALICE, and user communities in astronomy, genomics, and climate science including collaborations like Euro-Argo, Copernicus, and ELIXIR. Use cases include raw data ingestion, large-scale simulation outputs from Monte Carlo campaigns, event reconstruction, and interactive analysis with tools like Jupyter Notebook, RStudio, and Mathematica. EOS is used alongside archival solutions at institutions like CERN Tape Archive and CERN IT Data Centre for long-term preservation.

Development history and governance

EOS development is led by teams at CERN IT with contributions from research groups, industry partners from CERN Openlab, and open-source communities associated with projects such as HEP Software Foundation. Its governance follows collaborative models similar to Apache Software Foundation incubations and is coordinated with stakeholders from WLCG, national laboratories like Brookhaven National Laboratory, Lawrence Berkeley National Laboratory, and European research organizations including CNRS and DFG. EOS evolved in response to needs arising from LHC operations and continues under roadmaps discussed at forums like CHEP and International Conference on High Performance Computing, Networking, Storage and Analysis.

Category:Distributed file systems Category:CERN software