This article was accepted into the corpus but its outbound wikilinks were never NER-processed — typical at the deepest BFS hop or when the run's entity cap was reached. No expansion funnel to show.
| TAU Performance System | |
|---|---|
| Name | TAU Performance System |
| Developer | University of Oregon Performance Research Lab; ParaTools, Inc.; DOE laboratories |
| Released | 1996 |
| Latest release | 2.30 (example) |
| Programming languages | C, C++, Fortran, Python |
| Operating system | Linux, macOS, Windows |
| License | Open source / Proprietary components |
TAU Performance System
The TAU Performance System is a performance evaluation toolkit designed for profiling, tracing, and analysis of parallel and distributed applications on high-performance computing platforms. It integrates runtime instrumentation, measurement, and visualization to support performance tuning of applications written in languages such as C, C++, and Fortran and using models such as MPI, OpenMP, and CUDA. TAU is positioned within a broader ecosystem of tools and institutions that shape high-performance computing research and practice.
TAU was initiated at the University of Oregon Performance Research Lab and evolved through collaborations with Argonne National Laboratory, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, Sandia National Laboratories, and private vendors like ParaTools, Inc. and IBM. Its design parallels other tools such as VampirTrace, Score-P, HPCToolkit, gprof, and Intel VTune Amplifier while addressing needs seen in projects from TOP500 centers, NERSC, and ALCF installations. TAU supports instrumentation strategies used by teams developing applications for DOE science programs, NSF-funded projects, and community codes like LAMMPS, GROMACS, FLASH, and OpenFOAM.
TAU's architecture comprises a measurement subsystem, a data aggregation layer, and analysis/display tools. The measurement subsystem interfaces with runtimes such as MPI, OpenMP, Pthreads, CUDA, OpenCL, and accelerators from NVIDIA and AMD. Componentry includes the TAU Instrumentor, TAU Runtime, TAU Performance Database, and connectors to visualization systems like ParaView and Jumpshot. The system interoperates with profiling formats like OPARI, PAPI, and Cube and integrates with software build systems from CMake to Autoconf.
TAU offers source-code instrumentation, compiler-directed instrumentation, and binary re-writing to capture events, call-paths, and metrics. It uses sampling methods alongside event-based tracing, leveraging hardware counters accessible via PAPI and processor-specific interfaces from Intel and ARM. TAU supports statistical sampling similar to approaches used by gprof, time-based profiling found in perf, and fine-grained tracing comparable to DTrace and SystemTap. Instrumentation strategies are applied in workflows that include continuous integration with tools like Jenkins and performance regression frameworks employed by Cray and HPE centers.
Analysis is conducted using TAU tools and external suites such as ParaView, Vampir, Scalasca, CubeViewer, and R. Workflows integrate with schedulers and resource managers like Slurm, PBS, and LSF, enabling large-scale data collection on systems at Oak Ridge Leadership Computing Facility, Argonne Leadership Computing Facility, and NERSC. TAU enables cross-correlation of metrics with application-level events, leveraging statistical methods from research groups at University of Illinois Urbana-Champaign, University of Texas and visualization techniques advanced by teams at KAUST and University of Cambridge.
TAU is applied in performance tuning of climate models such as CESM, astrophysics codes like FLASH, computational chemistry packages such as NWChem and Quantum ESPRESSO, and engineering simulations using OpenFOAM and ABAQUS. It supports optimization tasks for machine learning frameworks including TensorFlow and PyTorch when ported to HPC platforms, and for data analytics stacks like Hadoop and Spark in HPC contexts. TAU aids teams at centers like LLNL, LANL, and SNL in code modernization and exascale readiness work tied to initiatives by DOE Office of Science and collaborations with vendors such as NVIDIA and Intel.
Published case studies demonstrate TAU's effectiveness on large-scale runs for projects such as ClimateModeling campaigns at NCAR, cosmology simulations linked to LSST data pipelines, and materials science investigations using DFT codes like VASP. Benchmarks and evaluations compare TAU against tools including HPCToolkit, Score-P, and TAU-adjacent systems in studies involving the HPCG and LINPACK-derived workloads on platforms from Cray and IBM Blue Gene. Industry and laboratory reports from Argonne, Oak Ridge, and Lawrence Livermore National Laboratory document TAU-enabled optimizations that reduced runtime hotspots and improved scaling efficiency for MPI+OpenMP mixed-mode applications.
TAU began in the mid-1990s at the University of Oregon and expanded through funding from agencies like the Department of Energy and the National Science Foundation. Key collaborators include ParaTools, Inc. and research groups at University of Illinois, University of Tennessee, and Princeton University. Over time, TAU incorporated advances from projects such as SCORE-P interoperability efforts, adoption of PAPI for hardware counters, and integration with community measurement standards promoted by SPEC and OpenMP consortium activities. Releases evolved alongside HPC hardware trends set by companies like Intel, AMD, and NVIDIA.
TAU is used across academic, national laboratory, and industry sites including University of Oregon, Argonne National Laboratory, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, Sandia National Laboratories, Los Alamos National Laboratory, and commercial partners such as ParaTools, Inc. and IBM. The community engages via workshops at conferences like SC Conference, Supercomputing, ISC High Performance, and domain-specific meetings hosted by AGU and APS. Integration with repositories and CI systems such as GitHub, GitLab, and Travis CI supports collaborative development and adoption by projects including LAMMPS, GROMACS, NWChem, and Quantum ESPRESSO.
Category:Performance analysis tools