PhEDEx (historical)

PhEDEx (historical)
Name	PhEDEx (historical)
Developer	CERN, European Organization for Nuclear Research
Released	2004
Discontinued	2019
Latest release version	legacy
Programming language	Python (programming language), Java (programming language), Perl
Operating system	Linux, Scientific Linux
Genre	Data transfer software
License	Open source

Contents

Overview
Architecture and Technologies
Deployment and Operations
Use in CERN and LHC Experiments
Performance and Evolution
Legacy and Succession

PhEDEx (historical) was a distributed data transfer and replication system developed at CERN to manage large-scale file movement for high-energy physics experiments associated with the Large Hadron Collider and the Worldwide LHC Computing Grid. It coordinated heterogeneous storage endpoints, network links and site services across international collaborations including ATLAS (detector), CMS (detector), ALICE (A Large Ion Collider Experiment), and LHCb. Designed for petabyte-scale throughput, it interfaced with grid middleware, catalog services and monitoring infrastructures from projects such as EGEE, Open Science Grid, and WLCG.

Overview

PhEDEx originated at CERN in the early 2000s to address data distribution needs of ATLAS (detector) after design studies for the Large Hadron Collider predicted unprecedented data volumes. The project brought together expertise from European Organization for Nuclear Research, SLAC National Accelerator Laboratory, Brookhaven National Laboratory, Fermilab, and major university computing centers to integrate site services, network engineering and storage provisioning. It operated alongside initiatives like GridFTP, Rucio (later generation), and FTS (File Transfer Service), and collaborated with standards bodies such as Open Grid Forum to ensure interoperability with Globus Toolkit and other middleware.

Architecture and Technologies

PhEDEx employed a tiered architecture aligning with the Worldwide LHC Computing Grid model, coordinating Tier-0, Tier-1, Tier-2, and Tier-3 sites across continents including CERN, Fermilab, DESY, and TRIUMF. Core components included a central database, a transfer node orchestration layer, and site agents that interfaced with local storage managers such as dCache, CASTOR, and EOS (filesystem). Network technologies integrated TCP/IP, GridFTP, and SRM (Storage Resource Manager), while security and authentication relied on X.509, VOMS proxies and certificates issued by European Grid Infrastructure. The stack utilized languages and tools from projects like Python (programming language), Java (programming language), and MySQL for metadata, and exposed APIs consumed by experiment frameworks including Athena (software), CMSSW, and experiment bookkeeping systems.

Deployment and Operations

Operational deployment followed coordination with regional grids such as NDGF, Nordic Data Grid Facility, EGI, and national centers including Rutherford Appleton Laboratory and CNAF. System administration incorporated monitoring with Nagios, Grafana, and custom dashboards integrated into Experiment Operations centers at CERN and partner labs. Transfer workflows were scheduled and managed in concert with data acquisition systems like DAQ (Data Acquisition), archival policies for CASTOR and HPSS, and network provisioning through research backbones such as GÉANT and ESnet. Incident response and escalation involved collaboration with teams from ATLAS (detector), CMS (detector), and WLCG operations board to handle outages, replication failures, and congestion across intercontinental links.

Use in CERN and LHC Experiments

Within ATLAS (detector), PhEDEx provided automated distribution of reconstructed datasets and analysis samples from CERN Tier-0 to Tier-1 centers like CC-IN2P3 and TRIUMF, enabling rapid access for collaboration institutions such as University of Oxford and University of Michigan. CMS (detector) and LHCb interfaced PhEDEx components for specific workflows before migrating to alternative systems, while ALICE (A Large Ion Collider Experiment) used PhEDEx-compatible transfer mechanisms for heavy-ion data. Integration extended to physics analysis frameworks, job submission systems like PanDA and CRAB, and data bookkeeping services that tracked provenance and dataset lineage across sites such as Fermilab and KIT.

Performance and Evolution

PhEDEx demonstrated multi-gigabit sustained transfers and handled petabyte-scale datasets during peak LHC runs, operating under traffic conditions shaped by scheduled reprocessing and user-driven analysis campaigns. Performance tuning involved congestion control across transatlantic links, coordination with ESnet and GÉANT for lambda provisioning, and adoption of parallel streams in GridFTP to maximize throughput. Over time, lessons on scalability, metadata federation, and failure modes influenced the development of newer systems; operational experience highlighted challenges with centralization, metadata consistency, and integration with evolving storage solutions such as EOS (filesystem) and object stores.

Legacy and Succession

PhEDEx's operational history informed successor projects and designs such as Rucio, which implemented a more modular, federated data management model used by ATLAS (detector), and influenced data movement strategies across WLCG and national research infrastructures. Its architecture and runbook practices contributed to best practices adopted by ESnet, GÉANT, and storage projects like dCache and EOS (filesystem). Archives of PhEDEx code and configuration served as reference material for data management curricula at institutions including CERN and University of California, Berkeley, and its impact persists in policies and tools for large-scale science collaborations such as those in High Energy Physics Community and multi-messenger projects.

Category:CERN software Category:Distributed file systems Category:High-energy physics software