LLMpediaThe first transparent, open encyclopedia generated by LLMs

HPSS

Generated by DeepSeek V3.2
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 53 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted53
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
HPSS
NameHPSS
DeveloperIBM, Lawrence Livermore National Laboratory, Los Alamos National Laboratory
Released0 1994
Operating systemAIX, Linux
GenreHierarchical storage management

HPSS. The High Performance Storage System is a scalable, object-based hierarchical storage management software framework designed for petascale and exascale data environments. Developed through a multi-laboratory collaboration, it provides robust, high-performance data storage and retrieval for some of the world's most demanding scientific computing workloads. Its architecture is optimized for managing massive datasets across diverse storage media, from high-speed disk caches to robotic tape libraries.

Overview

HPSS serves as a foundational data management solution for major supercomputing centers and research institutions globally. It is engineered to address the challenges of big data in scientific domains such as climate modeling, weapons science, and high-energy physics. The system provides a unified namespace and transparent data movement, allowing users and applications to interact with a vast, distributed storage repository as a single entity. Key collaborators in its ongoing development include the United States Department of Energy national laboratories, with Oak Ridge National Laboratory playing a central role in its modern evolution.

Architecture

The architecture of HPSS is highly modular and distributed, built around a client-server model that separates metadata management from data transfer operations. Core servers include the Metadata Server (MDS), which manages the file system namespace and attributes, and the Core Server, which handles storage resource management and data placement policies. Data movement is performed by dedicated Parallel Transfer servers, which can stripe files across multiple storage nodes for high-bandwidth transfers. This design allows the system to scale independently in terms of namespace capacity, storage capacity, and I/O performance, supporting integration with technologies like the IBM Spectrum Scale (formerly GPFS) and various tape drive technologies from vendors such as IBM and Oracle Corporation.

Features and capabilities

HPSS offers a comprehensive set of features for managing data lifecycles in high-performance computing environments. It provides policy-based hierarchical storage management, automatically migrating data between high-performance disk, SATA disk arrays, and magnetic tape based on usage, age, and other criteria. The system supports advanced data integrity through checksums and bit rot detection, alongside robust security features including Access Control Lists (ACLs) and integration with Kerberos authentication. Its POSIX-like interface and support for standards such as Storage Resource Manager (SRM) and Parallel Virtual File System 2 (PVFS2) client interfaces ensure broad application compatibility.

Deployment and usage

HPSS is deployed at many leading scientific and governmental institutions that generate and analyze massive datasets. Primary sites include the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, the Argonne Leadership Computing Facility at Argonne National Laboratory, and the Texas Advanced Computing Center (TACC). It forms the archival storage backbone for projects like the Large Hadron Collider's ALICE and ATLAS detectors at CERN, and is used by agencies such as the National Oceanic and Atmospheric Administration (NOAA) for long-term climate data preservation. These deployments routinely manage archives exceeding hundreds of petabytes.

Development and history

The development of HPSS began in 1993 as a collaborative effort led by IBM Federal Systems Company, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory, under the auspices of the United States Department of Energy's Advanced Simulation and Computing Program. The first production version was released in 1994. Subsequent development expanded through the multi-institutional HPSS Collaboration, which included partners like the National Aeronautics and Space Administration (NASA) and the National Security Agency (NSA). In 2015, stewardship of the software transitioned to a consortium led by Oak Ridge National Laboratory, which now guides its development to meet the needs of next-generation exascale computing systems like the Frontier supercomputer.

Category:Data management Category:Storage software Category:Supercomputing

Some section boundaries were detected using heuristics. Certain LLMs occasionally produce headings without standard wikitext closing markers, which are resolved automatically.