Torque (software)

Torque (software)
Name	Torque
Developer	Adaptive Computing, Terascala, Open Source contributors
Released	2007
Latest release version	2.5
Programming language	C, C++
Operating system	Linux, Unix
License	Open Source (various)

Contents

Overview
History
Architecture and Components
Features and Functionality
Deployment and Integration
Licensing and Community
Security and Maintenance

Torque (software) is a distributed resource manager and job scheduler used to control batch workloads on high-performance computing clusters, grid infrastructures, and cloud environments. It orchestrates job queuing, resource allocation, and execution monitoring across compute nodes and management hosts, integrating with popular scientific applications, middleware, and cluster management tools.

Overview

Torque operates as middleware between users, job submission clients, and compute resources, providing queuing, scheduling hooks, and accounting capabilities. It interoperates with systems developed by organizations such as National Center for Supercomputing Applications, Lawrence Livermore National Laboratory, Oak Ridge National Laboratory, and integrates with schedulers and frameworks like SLURM, Maui Cluster Scheduler, Moab Workload Manager, Grid Engine, and HTCondor. Administrators often pair Torque with resource managers and monitoring stacks from Prometheus, Ganglia, Nagios, and provenance tools developed at Los Alamos National Laboratory.

History

Torque originated as a fork from components created under projects at OpenPBS and contributions by engineers affiliated with NASA, Sandia National Laboratories, and the National Energy Research Scientific Computing Center. Adaptive Computing Corporation and later Terascala maintained commercial distributions and support, while community contributors from universities such as University of California, Berkeley, Stanford University, and Massachusetts Institute of Technology continued open-source development. Major milestones include adoption by national laboratories including Argonne National Laboratory and porting efforts to interoperate with middleware from Globus Alliance and cloud platforms pioneered by Amazon Web Services research teams.

Architecture and Components

Torque’s architecture separates control plane and execution plane components to manage scale across clusters operated by institutions like CERN and Fermilab. Core components include a server daemon managing queue state, a scheduler interface that integrates with systems from Blue Waters projects, a broker that coordinates with XSEDE resource allocations, and a lightweight execution agent deployed on compute nodes used in environments by Princeton University and California Institute of Technology. Torque can be instrumented with hooks that interact with authentication and authorization services from Kerberos, accounting backends used at National Institutes of Health compute centers, and file systems like Lustre and BeeGFS common at supercomputing centers.

Features and Functionality

Torque provides job submission primitives compatible with command-line clients used by researchers at Harvard University, Yale University, and University of Cambridge, as well as APIs consumed by workflow engines developed at Broad Institute and EMBL-EBI. It supports job arrays, dependencies, reservation capabilities employed by projects at Lawrence Berkeley National Laboratory, and prologue/epilogue scripts used by computational science teams at Princeton Plasma Physics Laboratory. Accounting and reporting features align with metrics collected by tools from DOE facilities and are used in performance studies published by groups at MIT Lincoln Laboratory and Columbia University.

Deployment and Integration

Torque is deployed on clusters administered by entities such as Disney Research, Microsoft Research, and national academic consortia like HEPGrid. Integration commonly involves authentication via LDAP directories, authorization with role-based systems used at University of Michigan, and job scheduling interfacing with software from vendors like IBM and Hewlett-Packard. Torque-based installations often connect to scientific gateways developed at National Center for Atmospheric Research and workflow systems created by the European Grid Infrastructure consortium.

Licensing and Community

The Torque codebase includes contributions released under permissive licenses maintained by contributors from Adaptive Computing and community maintainers from consortia such as Open Grid Forum. Development activity occurs on public repositories used by teams at University of Edinburgh and contributors from companies like Intel and AMD. Community support channels involve mailing lists, user forums frequented by researchers from Imperial College London, and workshops organized at conferences such as SC Conference and International Supercomputing Conference.

Security and Maintenance

Security and maintenance practices for Torque installations follow guidelines from agencies such as NIST and deployment policies used by Department of Energy laboratories. Administrators apply patches from maintainers and audit deployments using tools promoted by CERT teams and vulnerability scanners common at Cisco and Red Hat. Long-term support arrangements have been provided by vendors including Adaptive Computing and third-party service providers engaged by academic centers like Ohio State University.

Category:Batch processing systems