Generated by DeepSeek V3.2| HPSS | |
|---|---|
| Name | HPSS |
| Developer | IBM, Lawrence Livermore National Laboratory, Los Alamos National Laboratory |
| Released | 0 1994 |
| Operating system | AIX, Linux |
| Genre | Hierarchical storage management |
HPSS. The High Performance Storage System is a scalable, object-based hierarchical storage management software framework designed for petascale and exascale data environments. Developed through a multi-laboratory collaboration, it provides robust, high-performance data storage and retrieval for some of the world's most demanding scientific computing workloads. Its architecture is optimized for managing massive datasets across diverse storage media, from high-speed disk caches to robotic tape libraries.
HPSS serves as a foundational data management solution for major supercomputing centers and research institutions globally. It is engineered to address the challenges of big data in scientific domains such as climate modeling, weapons science, and high-energy physics. The system provides a unified namespace and transparent data movement, allowing users and applications to interact with a vast, distributed storage repository as a single entity. Key collaborators in its ongoing development include the United States Department of Energy national laboratories, with Oak Ridge National Laboratory playing a central role in its modern evolution.
The architecture of HPSS is highly modular and distributed, built around a client-server model that separates metadata management from data transfer operations. Core servers include the Metadata Server (MDS), which manages the file system namespace and attributes, and the Core Server, which handles storage resource management and data placement policies. Data movement is performed by dedicated Parallel Transfer servers, which can stripe files across multiple storage nodes for high-bandwidth transfers. This design allows the system to scale independently in terms of namespace capacity, storage capacity, and I/O performance, supporting integration with technologies like the IBM Spectrum Scale (formerly GPFS) and various tape drive technologies from vendors such as IBM and Oracle Corporation.
HPSS offers a comprehensive set of features for managing data lifecycles in high-performance computing environments. It provides policy-based hierarchical storage management, automatically migrating data between high-performance disk, SATA disk arrays, and magnetic tape based on usage, age, and other criteria. The system supports advanced data integrity through checksums and bit rot detection, alongside robust security features including Access Control Lists (ACLs) and integration with Kerberos authentication. Its POSIX-like interface and support for standards such as Storage Resource Manager (SRM) and Parallel Virtual File System 2 (PVFS2) client interfaces ensure broad application compatibility.
HPSS is deployed at many leading scientific and governmental institutions that generate and analyze massive datasets. Primary sites include the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, the Argonne Leadership Computing Facility at Argonne National Laboratory, and the Texas Advanced Computing Center (TACC). It forms the archival storage backbone for projects like the Large Hadron Collider's ALICE and ATLAS detectors at CERN, and is used by agencies such as the National Oceanic and Atmospheric Administration (NOAA) for long-term climate data preservation. These deployments routinely manage archives exceeding hundreds of petabytes.
The development of HPSS began in 1993 as a collaborative effort led by IBM Federal Systems Company, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory, under the auspices of the United States Department of Energy's Advanced Simulation and Computing Program. The first production version was released in 1994. Subsequent development expanded through the multi-institutional HPSS Collaboration, which included partners like the National Aeronautics and Space Administration (NASA) and the National Security Agency (NSA). In 2015, stewardship of the software transitioned to a consortium led by Oak Ridge National Laboratory, which now guides its development to meet the needs of next-generation exascale computing systems like the Frontier supercomputer.
Category:Data management Category:Storage software Category:Supercomputing