Generated by GPT-5-mini| HPSS | |
|---|---|
| Name | HPSS |
| Developer | International Business Machines IBM, Lawrence Livermore National Laboratory, Sandia National Laboratories, National Center for Supercomputing Applications |
| Released | 1992 |
| Latest release | (varies by deployment) |
| Written in | C, C++ |
| Operating system | UNIX, Linux, AIX |
| License | Proprietary / Research collaborations |
HPSS
HPSS is a high-performance, scalable archival storage system designed for large-scale data management in scientific, government, and enterprise environments. It was developed to address exascale-era storage needs by integrating tape libraries, disk caches, and hierarchical storage management with parallel data transfer services and policy-driven data lifecycle controls. HPSS is deployed in research centers, national laboratories, and archives that require long-term retention, high throughput, and coordination with compute resources.
HPSS originated as a collaborative research and production project to provide archival solutions for projects like Large Hadron Collider, Human Genome Project, NASA missions, and national supercomputing facilities such as Oak Ridge National Laboratory and Argonne National Laboratory. The system combines components influenced by efforts at Lawrence Livermore National Laboratory and Sandia National Laboratories and integrates with service ecosystems built by vendors including IBM and partners from National Energy Research Scientific Computing Center. HPSS supports protocols and tools that interoperate with standards used by European Organization for Nuclear Research, National Aeronautics and Space Administration, and major research consortia.
The HPSS project began in the early 1990s to meet storage demands from projects like US Department of Energy scientific simulations and collaborative programs at National Institutes of Health. Early contributions came from research groups at Argonne National Laboratory and Lawrence Livermore National Laboratory who needed hierarchical management comparable to commercial systems used by National Security Agency archives and large media repositories such as BBC. Over successive decades HPSS incorporated technologies from vendors like IBM and research initiatives tied to Oak Ridge National Laboratory computing centers, adapting to advances in tape robotics from companies such as Spectra Logic and disk technologies used by NetApp and EMC Corporation.
HPSS employs a modular architecture consisting of an archival manager, metadata servers, data movers, and device drivelines that interface with tape libraries and disk cache arrays. Core components were designed to interoperate with storage hardware from IBM libraries, Oracle-branded systems, and third-party robotics from Fujifilm and Sony tape vendors. The metadata layer integrates with directory services from Sun Microsystems (historically) and authentication frameworks like those used by National Institute of Standards and Technology-accredited deployments. Data transfer layers support parallel I/O engines that echo approaches from Message Passing Interface-driven HPC workflows and align with protocols endorsed by Internet2 and ESnet.
HPSS provides hierarchical storage management, automated data migration, staged retrieval, and policy-based retention suitable for archival mandates from agencies such as National Science Foundation and Department of Defense. It supports parallel streaming, multi-stream striping, and large-file aggregation strategies familiar to users of Xrootd and Globus transfer services. Integration points include workflow schedulers like Slurm Workload Manager, data catalogs similar to iRODS, and backup orchestration employed by organizations such as Lawrence Berkeley National Laboratory. Administrative tools provide reporting used by compliance programs like those at Centers for Disease Control and Prevention and long-term preservation projects at institutions such as Library of Congress.
HPSS is used for archiving experimental data from facilities like Large Hadron Collider, observational datasets from National Oceanic and Atmospheric Administration, simulation outputs from Princeton Plasma Physics Laboratory, and imaging archives at institutions like Memorial Sloan Kettering Cancer Center. Other applications include media vaulting for broadcasters such as BBC, disaster recovery for financial institutions coordinated with Federal Reserve System standards, and digital preservation for museums and libraries including Smithsonian Institution. Research-wide deployments enable data sharing across collaborations involving European Space Agency and multinational research consortia.
Designed for petabyte- to exabyte-scale environments, HPSS scales by adding metadata servers, data movers, and tape drives to distributed topologies used by supercomputing centers like Oak Ridge National Laboratory and Los Alamos National Laboratory. Performance tuning leverages parallelism conventions from Parallel Virtual File System deployments and network fabric optimizations common in Cray and HPE installations. Benchmarking exercises at facilities such as NERSC and Argonne National Laboratory have emphasized sustained throughput, tape drive concurrency, and cache hit ratios, comparing HPSS capabilities against object storage systems from Amazon Web Services and software-defined solutions developed by Ceph communities.
HPSS implementations incorporate authentication, authorization, and auditing integrations compatible with standards referenced by National Institute of Standards and Technology guidance and compliance regimes used by Department of Energy laboratories. Secure transports and access controls interoperate with identity federations like those advocated by InCommon and eduGAIN. Long-term integrity measures include checksum validation, media migration policies, and audit trails similar to those practiced by National Archives and Records Administration and preservation programs at major research libraries. Deployments frequently embed encryption, role-based access comparable to enterprise solutions at IBM and periodic validation procedures used by International Organization for Standardization-aligned archives.
Category:Archival storage systems