This article was accepted into the corpus but its outbound wikilinks were never NER-processed — typical at the deepest BFS hop or when the run's entity cap was reached. No expansion funnel to show.
| HPCToolkit | |
|---|---|
| Name | HPCToolkit |
| Developer | Rice University, University of Michigan, Los Alamos National Laboratory |
| Released | 2002 |
| Programming language | C, C++ |
| Operating system | Linux, macOS, FreeBSD |
| Genre | Performance analysis |
| License | Open-source |
HPCToolkit is a performance analysis toolkit for profiling and tracing scientific and engineering applications on multicore processors, clusters, and supercomputers. It combines statistical sampling, call-path attribution, and measurement of hardware performance counters to attribute execution costs to procedures, loops, and call contexts in large codes. HPCToolkit has been applied to workloads developed at institutions such as Oak Ridge National Laboratory, Argonne National Laboratory, Lawrence Berkeley National Laboratory, Los Alamos National Laboratory, and in collaborations involving researchers from Rice University, University of Michigan, Princeton University, and Sandia National Laboratories.
HPCToolkit provides tools to measure where time and other resources are spent in applications written in languages used by projects from Massachusetts Institute of Technology, Stanford University, California Institute of Technology, University of California, Berkeley, and Carnegie Mellon University. It targets instrumentation and sampling for programs that use parallel programming models pioneered at Argonne National Laboratory and Lawrence Livermore National Laboratory, including variants of MPI, OpenMP, and task-based runtimes associated with teams at Intel Corporation and Cray Research. The toolkit decomposes performance into call paths and contexts to assist developers and performance engineers from organizations such as NVIDIA, IBM, Google, and Microsoft in optimizing compute- and memory-bound kernels derived from benchmarks like those of the SPEC and NAS Parallel Benchmarks communities.
HPCToolkit supports statistical sampling of execution using facilities from processor vendors including Intel Corporation, Advanced Micro Devices, and ARM Limited to gather hardware counter values such as cycles, instructions, cache-misses, and branch-mispredicts. It provides context-sensitive call-path attribution similar to techniques used in tools developed at Bell Labs and in academic work from University of Illinois at Urbana–Champaign and University of Washington. The toolkit integrates call-stack unwinding and binary analysis methods related to projects at GNU Project and LLVM to map samples to source lines and inlined functions. It also supports tracing of asynchronous events and communication patterns useful for analyses inspired by studies from Los Alamos National Laboratory and Sandia National Laboratories on large-scale MPI applications.
HPCToolkit’s architecture separates measurement, analysis, and presentation components, following software engineering practices common at IBM Research, Huawei Technologies, and Sony Corporation. Measurement components use low-overhead statistical sampling via kernel interfaces implemented in kernels from Linux kernel developers and employ binary inspection techniques that rely on formats standardized by the Free Software Foundation and processor documentation from ARM Limited and Intel Corporation. The analysis back-end aggregates call-path profiles and performs attribution computations akin to algorithms developed at Stanford University and Princeton University, producing hierarchical profiles that can be explored in graphical browsers similar to tools created by Google and Mozilla Foundation.
A typical workflow involves building instrumented binaries and executing them under sampling regimes comparable to practices used in performance campaigns at National Center for Supercomputing Applications and Pittsburgh Supercomputing Center. Users collect samples during runs on systems administered by staff from Oak Ridge National Laboratory and Argonne National Laboratory, then process measurements with HPCToolkit’s analysis tools to produce call-path-centric reports. The resulting profiles are examined alongside source code annotated with line-level metrics, a practice shared by developers at NVIDIA, Intel Corporation, and AMD when tuning kernels for libraries such as those from BLAS and vendor math libraries. Integration with visualization front ends enables interactive exploration similar to dashboards built by teams at Google and Facebook.
Evaluations of HPCToolkit have compared its overhead and attribution accuracy against tools and frameworks developed at Cray Research, Intel Corporation, AMD, and the GNU Project. Studies from research groups at Rice University, University of Michigan, and Lawrence Berkeley National Laboratory report low perturbation from sampling, enabling its use on production-scale runs on systems at NERSC and XSEDE allocations. Case studies profiling codes from projects at Los Alamos National Laboratory and Sandia National Laboratories have demonstrated HPCToolkit’s ability to reveal performance bottlenecks in MPI communication, OpenMP synchronization, and memory hierarchy utilization, facilitating optimizations adopted by teams at Oracle Corporation and Microsoft Research.
HPCToolkit’s development has been led by research groups at Rice University and University of Michigan with contributions and collaborations involving engineers from Los Alamos National Laboratory and users at national labs including Oak Ridge National Laboratory and Argonne National Laboratory. The project’s community includes performance engineers from NVIDIA, Intel Corporation, and academic researchers at University of California, Berkeley and Carnegie Mellon University. Development practices follow open-source collaboration models used by projects hosted by organizations such as the Free Software Foundation and infrastructure patterns seen in repositories maintained by GitHub and GitLab.
HPCToolkit is distributed under an open-source license consistent with licensing models used by software from the Free Software Foundation and research toolchains developed at Lawrence Berkeley National Laboratory and Oak Ridge National Laboratory. Source code and binaries are made available to academic, national laboratory, and industry users following distribution norms of projects at GitHub and archives maintained by Zenodo and institutional repositories at Rice University and University of Michigan.
Category:Performance analysis tools