HTCondor-CE — LLMpedia

HTCondor-CE
Name	HTCondor-CE
Developer	University of Wisconsin–Madison Condor Team
Released	2000s
Programming language	C++
Operating system	Linux, Unix
Genre	Job scheduling, Grid computing

Contents

Overview
Architecture and Components
Installation and Configuration
Job Submission and Management
Security and Authentication
Performance and Scalability
Use Cases and Deployments

HTCondor-CE

HTCondor-CE provides a specialized software component that integrates high-throughput computing access points with distributed computing cluster and grid computing infrastructures. It acts as a gateway between resource providers such as CERN, Fermilab, SLAC National Accelerator Laboratory, National Energy Research Scientific Computing Center, and workload managers including HTCondor, PBS Professional, Torque (software), and SLURM Workload Manager. The project is developed by the University of Wisconsin–Madison Condor Team and is used by collaborations like Open Science Grid, European Grid Infrastructure, WLCG, XSEDE, and domain projects in high-energy physics, bioinformatics, and astronomy.

Overview

HTCondor-CE functions as a computing element that brokers jobs and forwards them between clients and batch systems or cloud computing endpoints. It supports interfaces used by Globus Toolkit, gLite, HTCondor, ARC (Advanced Resource Connector), and UNICORE-style middlewares while exposing services compatible with Grid Security Infrastructure and OAuth 2.0-based web portals. Administrators deploy HTCondor-CE to enable interoperation with resource allocation frameworks such as OpenNebula, OpenStack, Kubernetes, and traditional clusters managed by Sun Grid Engine derivatives.

Architecture and Components

The architecture centers on modular daemons that mediate job control, logging, and data staging. Core components include a job router that interoperates with HTCondor, a submission service that implements protocols similar to DRMAA, a data mover interoperable with GridFTP and Globus Toolkit tools, and an accounting plugin compatible with GOCDB and PerfSONAR. The service integrates with authentication modules such as Kerberos, X.509, and token systems used in CILogon and federated identity infrastructures like eduGAIN and InCommon. Monitoring and telemetry are exposed to systems like Prometheus, Grafana, and ELK Stack for observability. Packaging and builds follow practices from projects like Debian, Red Hat Enterprise Linux, CentOS, and Fedora Project.

Installation and Configuration

Typical installation uses distribution packages tailored for Debian GNU/Linux and Red Hat Enterprise Linux variants, with alternative sources built using CMake and GNU Compiler Collection. Administrators configure site endpoints, routing rules, and queue mappings via configuration files that reference LDAP directories, MySQL, or PostgreSQL for accounting. Integration often requires coordination with site authorization systems such as XACML, token brokers used by Token Service projects, and certificate authorities like Let's Encrypt or organizational Certificate Authority services. Deployment patterns mirror those used by Globus Online and Apache Mesos frontends.

Job Submission and Management

HTCondor-CE accepts jobs from clients using protocols adopted by Condor-G, Globus Toolkit, and RESTful portals employed by Science Gateways. Submitted payloads are translated to local batch descriptions compatible with SLURM Workload Manager, PBS Professional, Torque (software), or LSF (software). The CE manages job lifecycle events—queuing, staging, execution, and termination—and reports status to upstream systems like Open Science Grid managers, WLCG dashboards, and experiment-specific schedulers used in ATLAS (particle detector) and CMS (particle detector). Data staging leverages plugins interoperable with GridFTP, SRM (Storage Resource Manager), and Rucio.

Security and Authentication

Security is implemented via support for X.509 certificates, Grid Security Infrastructure, and federated identity through SAML 2.0, OAuth 2.0, and OpenID Connect systems such as CILogon and institutional Identity Provider deployments. Integration with Kerberos realms and GSI-based proxies allows sites to enforce access controls consistent with policies from DOE Office of Science facilities and European research infrastructures governed by European Commission funding rules. Auditing and accounting hooks report to services like Gratia and Accounting Archive tools used by collaborations such as Open Science Grid and XSEDE.

Performance and Scalability

Designed for high-throughput workloads, HTCondor-CE scales horizontally by adding CE instances and vertically by tuning worker mappings to local batch systems. Common scalability patterns derive from deployments at CERN, Fermilab, and national infrastructures like XSEDE, which inform optimizations for connection pooling, job batching, and failover. Performance monitoring leverages Prometheus, Grafana, and Nagios-style alerts; benchmarking often references synthetic workloads used by HPC centers and real-world campaigns from LHC experiments and bioinformatics pipelines such as those run by Broad Institute groups.

Use Cases and Deployments

HTCondor-CE is widely used to present campus clusters and national resources to federated grids, enabling large collaborations such as ATLAS (particle detector), CMS (particle detector), LIGO Scientific Collaboration, IceCube, and multi-institution consortia in genomics and climate science to run distributed analyses. Deployments appear in service catalogs of Open Science Grid, European Grid Infrastructure, and regional infrastructures coordinated by PRACE and XSEDE. Use cases include workload brokering for data-intensive workflows, integration with scientific gateways like Apache Airavata, and back-end resource exposure for workflow engines such as Pegasus (workflow management), Nextflow, and Snakemake.

Category:Distributed computing software