PhEDEx — LLMpedia

PhEDEx
Name	PhEDEx
Developer	European Organization for Nuclear Research, CMS experiment
Released	2006
Latest release	2014 (major rewrite)
Operating system	Scientific Linux, CentOS, Debian
Programming language	Perl, Python, C++
Genre	Data transfer, Data management, Workflow
License	Custom

Contents

Overview
Architecture and Components
Data Transfer Mechanisms
Performance and Scalability
Security and Authentication
Deployment and Operational History

PhEDEx PhEDEx was a high-throughput data transfer and distribution system developed to manage bulk scientific datasets produced by the CMS experiment at the Large Hadron Collider hosted at the European Organization for Nuclear Research. It coordinated wide-area replication among major computing centers including CERN, Fermilab, Brookhaven National Laboratory, DESY, KIT, INFN, and regional centers affiliated with the Worldwide LHC Computing Grid. Designed for petabyte-scale workflows, it integrated with grid middleware such as Globus Toolkit, HTCondor, gLite, and ARC.

Overview

PhEDEx served as a production-grade transfer service to replicate datasets between Tier-0, Tier-1, Tier-2, and Tier-3 sites participating in the Worldwide LHC Computing Grid for the CMS experiment. It coordinated with catalog and workflow systems like Rucio, CernVM-FS, DBS (CMS), CRAB and data acquisition components at LHCb and ATLAS meta-services for interoperability. The project emphasized automated routing, reliability, and integration with storage services such as dCache, EOS, Castor (CERN), and StoRM deployed at major laboratories including SLAC, TRIUMF, and Lawrence Berkeley National Laboratory.

Architecture and Components

PhEDEx used a distributed, agent-based architecture comprising central services, site agents, and transfer daemons. Core components included a MySQL-backed database cluster often running on Oracle Corporation hardware, a web-based monitoring console inspired by portals like Grafana and Nagios, and an API layer utilizing RESTful interfaces and SOAP-style bindings for interaction with tools such as PhedexApi and grid middleware. Site-level components interfaced with local storage management systems (for example, dCache or EOS) and with network optimization stacks used in research networks like ESnet and GÉANT. Operational coordination used ticket and incident systems akin to JIRA and RT (Request Tracker), while software configuration management was performed via Puppet, Chef (software), and Git repositories.

Data Transfer Mechanisms

PhEDEx orchestrated transfers using multiple transport backends, including GridFTP, FTS (File Transfer Service), and custom TCP-based movers that tuned parameters for high-latency long-haul paths between CERN and continental facilities like Fermilab and Brookhaven National Laboratory. It implemented dataset-level replication policies, subscription models, and priority queues influenced by workflows from CRAB and HTCondor, and negotiated with storage endpoints using SRM protocols implemented in dCache and StoRM. The system tracked checksums and used integrity validations compatible with tools developed in ROOT (software), enabling recovery strategies similar to those used in Tape storage migrations at CERN and FNAL.

Performance and Scalability

PhEDEx demonstrated multi-gigabit aggregate throughput in production, scaling to petabytes per month during intensive run periods for the Large Hadron Collider and the CMS experiment data campaigns. Performance tuning relied on TCP window management research published by networking projects at ESnet, GÉANT, and industry partners such as Cisco Systems and Juniper Networks. Scalability testing exercised topologies resembling National Research and Education Networks coordinated by TERENA and leveraged monitoring solutions similar to PerfSONAR and MonALISA to diagnose bottlenecks at sites including CERN, Fermilab, DESY, and TRIUMF.

Security and Authentication

Authentication and authorization for PhEDEx used certificate-based X.509 mechanisms provided by the European Grid Authentication Policy and trusted by Certificate Authorities such as TERENA Certificate Service and national CAs. Integration with Virtual Organization management was achieved via VOMS attributes tied to CMS roles and delegated credentials for transfers were mediated with grid security stacks used by Globus Toolkit and gLite. Operational security practices aligned with incident response teams at CERN Computer Security Team and site security policies at Fermilab and SLAC, and audits referenced controls comparable to those used by W3C working groups on internet security and data integrity.

Deployment and Operational History

PhEDEx entered production for the CMS experiment in the late 2000s and evolved through operational campaigns including the 2010–2012 LHC run, the 2015–2016 Run 2 period, and reprocessing campaigns that involved coordination with Tier-1 centers at FNAL, KIT, RAL, and PIC. Over its operational lifetime, engineering work interfaced with successor projects such as Rucio and lessons contributed to data management practices for experiments like ATLAS and LIGO. The system's operational story includes large-scale migrations, incident responses coordinated with CERN, capacity upgrades at Fermilab, and decommissioning phases as the community transitioned to newer tools and standards championed by collaborations across European Organization for Nuclear Research member states and international partners.

Category:Data management systems Category:Scientific computing