CASTOR (software)

CASTOR (software)
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	CASTOR
Developer	CERN
Released	1999
Latest release version	legacy
Programming language	C, C++
Operating system	Unix-like
License	CERN Open Source

Contents

Overview
History and Development
Architecture and Features
Use Cases and Applications
Licensing and Distribution
Reception and Impact

CASTOR (software) CASTOR is a hierarchical storage management system developed for large-scale data archiving and retrieval. It provides a tape-based storage backend with disk staging, indexing, and access controls tailored to high-energy physics experiments and large research facilities. CASTOR integrates with compute clusters, batch systems, and metadata catalogs to support sustained throughput and petabyte-scale datasets.

Overview

CASTOR originated as a storage management solution to serve CERN experiments such as Large Hadron Collider, ATLAS experiment, CMS experiment, and LHCb experiment. It was designed to interface with Tape library hardware from vendors like IBM, Oracle, and Fujitsu while presenting file-level access compatible with POSIX semantics and GridFTP protocols. CASTOR supports archival workflows common to projects run by institutions such as Fermilab, DESY, SLAC, and Brookhaven National Laboratory while integrating with middleware from CERN partners and international data grids including Worldwide LHC Computing Grid and Open Science Grid.

History and Development

Development began in the late 1990s at CERN to meet data management needs of experiments preparing for Large Hadron Collider operations. Early milestones involved collaborations with research centers like IN2P3 and technology partners such as IBM and STK (StorageTek). Major development phases corresponded to LHC run cycles and upgrades coordinated with projects like EGEE and WLCG. Over time, contributions came from national labs including CEA and universities such as University of Oxford and University of Cambridge. The software evolved alongside initiatives like ROOT and GEANT4 to provide persistent storage for experimental outputs, integrating with job schedulers like HTCondor and TORQUE.

Architecture and Features

CASTOR implements a multi-layered architecture with components for namespace management, disk cache management, tape access, and monitoring. Its namespace service provides a global directory similar to systems used by Hadoop Distributed File System clients and interacts with metadata stores deployed at centers such as CERN Data Centre and Tier-1 facilities like INFN CNAF. The system includes a tape manager compatible with libraries from IBM and Quantum Corporation and supports staging policies used by collaborations including ALICE experiment and CLEO. Features include hierarchical storage management comparable to HPSS, integrity checks analogous to mechanisms in ZFS and Ceph, and interoperability with authentication services like Kerberos and X.509 used by Globus Toolkit deployments. Monitoring and accounting were implemented with tools similar to Nagios and Ganglia, and data transfer optimization drew from algorithms used in rsync and bbcp.

Use Cases and Applications

CASTOR has been deployed for primary archival storage at major physics facilities including CERN, Fermilab, and DESY to retain collision datasets from LHC runs and fixed-target experiments. It supported data lifecycles for projects such as ATLAS data reprocessing campaigns and CMS analysis workflows, and it served as backend storage for simulation output from GEANT4 and reconstruction data consumed by analysis frameworks like ROOT. Beyond particle physics, CASTOR found use in astronomy data archives at observatories like European Southern Observatory and national research infrastructures coordinated by organizations such as ELIXIR and EUDAT for long-term preservation.

Licensing and Distribution

CASTOR was developed under licensing terms managed by CERN and distributed to collaborating institutions and national centers. Deployments were coordinated through agreements with entities such as WLCG and national funding agencies like CERN Council members and European Commission programs including Horizon 2020 predecessor initiatives. Installation packages and source code were shared with partners including INFN, Rutherford Appleton Laboratory, and National Centre for Supercomputing Applications to enable integration with local storage fabrics and compute grids.

Reception and Impact

CASTOR influenced subsequent storage projects and archival practices at major research centers including CERN Data Centre and inspired architectural decisions in systems such as dCache and EOS. Its role in preserving datasets for the LHC enabled landmark publications by collaborations such as ATLAS experiment and CMS experiment, and it contributed to data management policies adopted by consortia including WLCG and OpenAIRE. CASTOR's operational experience informed standards debated within forums like IETF working groups and storage community meetings held at conferences such as ACM SIGMOD and USENIX.

Category:Data storage software Category:CERN software