PBS (computer system)

PBS (computer system)
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	PBS
Developer	Veridian Technologies; originally Sandia National Laboratories / NASA Ames Research Center
Released	1991
Latest release	2018 (OpenPBS forks evolved)
Operating system	UNIX; Linux; AIX; HP-UX
Platform	High-performance computing clusters; supercomputers
License	Original proprietary; later open-source forks and commercial variants

Contents

History
Architecture and Components
Scheduling and Workload Management
Performance and Scalability
Security and Administration
Use Cases and Deployments

PBS (computer system) is a job scheduling and workload management system for high-performance computing clusters, supercomputers, and grid environments. PBS originated as a research project and evolved through academic, national laboratory, and commercial stewardship, influencing a family of schedulers and orchestration tools used by institutions such as Lawrence Livermore National Laboratory, Argonne National Laboratory, and commercial vendors. The system mediates batch job submission, resource allocation, and job lifecycle control across heterogeneous compute nodes and storage arrays in large-scale installations like Oak Ridge National Laboratory and Los Alamos National Laboratory.

History

PBS began as the Portable Batch System developed in the early 1990s at NASA Ames Research Center and M. S. Swaminathan Research Foundation-era collaborations with researchers and engineers from Sandia National Laboratories and LLNL. Early adoption at facilities including National Energy Research Scientific Computing Center led to widespread community use and forks. Commercialization efforts by companies such as Altair Engineering and entities tied to Veridian Technologies created proprietary distributions and support models, while open-source offshoots like OpenPBS, TORQUE, and PBS Professional emerged from stewardship transitions involving Open Source Initiative advocates and national lab contributors. Major events shaping PBS include license changes, academic-to-commercial transfers, and integration into grid projects coordinated with organizations like Open Grid Forum and procurement programs at Department of Energy labs.

Architecture and Components

PBS employs a master–worker architecture with distinct daemons and components designed for distributed resource management on heterogeneous hardware. Core components include a server daemon that maintains the job queue and resource database, a scheduler that implements policy and matches queued jobs to available resources, and node-level executors that launch and monitor tasks on compute nodes such as blades in installations at Lawrence Berkeley National Laboratory or racks installed at Fermilab. Supporting services integrate with storage subsystems from EMC Corporation or NetApp arrays and with network fabrics including InfiniBand and Intel Omni-Path. Administrative tooling often leverages command-line utilities interoperable with cluster management suites used at CERN and workflow engines developed at Los Alamos National Laboratory. PBS also supports plugins and extensibility interfaces enabling integration with provisioning systems from Red Hat and authentication services from MIT Kerberos or Active Directory.

Scheduling and Workload Management

The scheduling model in PBS separates policy from mechanism, allowing site administrators at institutions like Princeton Plasma Physics Laboratory or Brookhaven National Laboratory to implement fair-share, priority, backfill, and reservations. The scheduler interprets resource requests expressed in job directives compatible with batch submission clients used at University of California, Berkeley and scientific workflows at Argonne National Laboratory. Advanced workload management integrates with meta-schedulers used in grid initiatives spanning European Grid Infrastructure collaborations and cloud bursting implementations for providers such as Amazon Web Services and Google Cloud Platform where PBS mediates hybrid HPC-cloud deployments. Backfill algorithms and queue hierarchies are comparable to those in other systems developed at Lawrence Livermore National Laboratory and reflect research from conferences like the ACM/IEEE Supercomputing Conference.

Performance and Scalability

PBS deployments scale from hundreds to tens of thousands of cores in production environments at facilities such as Oak Ridge National Laboratory and Sandia National Laboratories. Performance tuning involves optimizing server daemon throughput, scheduler decision latency, and node executor startup time to meet demands of tightly-coupled MPI applications developed at Argonne National Laboratory and embarrassingly parallel workloads common in bioinformatics groups at Broad Institute. Integration with high-performance interconnects like Cray XC networks and parallel filesystems such as Lustre or GPFS is critical to reduce I/O contention and maintain job throughput at supercomputing centers represented by National Center for Atmospheric Research. Benchmarks and operational metrics frequently reference standards and case studies presented at International Conference for High Performance Computing, Networking, Storage and Analysis.

Security and Administration

Administration of PBS in sensitive environments—such as classified programs at Los Alamos National Laboratory or regulated research at Food and Drug Administration labs—requires integration with centralized identity providers like MIT Kerberos and Active Directory, role-based access controls, and node-level hardening recommended by National Institute of Standards and Technology. Secure job submission workflows, audit logging, and accounting feed into institutional reporting systems used by funding agencies such as National Science Foundation and Department of Energy. Patch management, configuration orchestration with tools from Ansible and Puppet, and compliance with policies promulgated by agencies like Office of Management and Budget are common administrative practices for PBS operators.

Use Cases and Deployments

PBS is used across scientific computing, engineering simulation, and commercial analytics. Notable deployments include national laboratories performing climate modeling with codes from NOAA, astrophysics simulations developed at University of Chicago, computational chemistry in consortia including California Institute of Technology, and animation rendering pipelines at studios collaborating with Industrial Light & Magic. Enterprises in oil and gas exploration using software from Schlumberger and financial institutions running risk models have also adopted PBS-based solutions or its commercial derivatives. Academic clusters at universities such as Massachusetts Institute of Technology, Stanford University, and University of Illinois Urbana-Champaign continue to use PBS variants to schedule student and faculty workloads.

Category:Job scheduling systems Category:High-performance computing software