Kokkos — LLMpedia

Kokkos
Name	Kokkos
Developer	Sandia National Laboratories
Released	2013
Programming language	C++
Operating system	Linux, macOS, Windows
License	BSD licenses

Contents

Overview
Design and Architecture
Programming Model and API
Backends and Portability
Performance and Optimization
Adoption and Use Cases
History and Development

Kokkos

Kokkos is a C++ library and programming model developed to enable performance portability across diverse high-performance computing platforms. It provides abstractions for parallel execution, data management, and memory spaces so that scientific applications can target architectures such as multicore CPUs, manycore accelerators, and GPUs without rewriting core algorithms. Kokkos has been adopted in projects at national laboratories and in open-source ecosystems to support scalable simulations in fields like computational fluid dynamics and materials science.

Overview

Kokkos is produced by Sandia National Laboratories as part of efforts in exascale computing and is used in conjunction with projects at Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, and other national facilities. The project aims to separate algorithmic intent from hardware-specific details so teams at organizations such as Argonne National Laboratory, Los Alamos National Laboratory, and industry partners like Cray Research can maintain single-source C++ codes. Kokkos interoperates with toolchains and ecosystems including CMake, Trilinos, RAJA (software), HPX, and MPI to integrate into established simulation stacks. The library emphasizes a small, expressive API that maps to backends such as CUDA, HIP, and native POSIX-threaded implementations.

Design and Architecture

Kokkos is designed around a small set of core abstractions that decouple algorithm expression from execution resource choice. The architecture centers on execution policies, memory spaces, and views which model multidimensional data. These core pieces allow portability across hardware from vendors such as NVIDIA, AMD, and Intel by providing backend implementations that target vendor runtimes like CUDA and ROCm. The library uses modern C++ features including templates and type traits to implement zero-overhead abstractions compatible with compilers like GCC, Clang, and Intel C++ Compiler. Integration layers tie Kokkos to build systems like CMake and performance tools such as TAU (software), VTune Amplifier, and GPUView for profiling.

Programming Model and API

Kokkos exposes an API built on three principal concepts: execution spaces, memory spaces, and Views. Execution spaces encapsulate where parallel work runs and map to runtimes like CUDA, OpenMP, and Pthreads; memory spaces model data residency such as device memory on NVIDIA GPUs or host memory on x86 nodes. Views are templated, multi-dimensional array wrappers that carry memory space and layout information, enabling interoperability with libraries like Trilinos and KokkosKernels. The programming model offers parallel patterns including parallel_for, parallel_reduce, and parallel_scan; these patterns are analogous to constructs found in OpenMP and Thrust (library). Kokkos also provides Team-level APIs to express hierarchical parallelism for mapping to hardware features like NVIDIA Ampere SMs or AMD CDNA compute units.

Backends and Portability

Kokkos implements multiple backends to target diverse execution environments. Notable backends include CUDA for NVIDIA GPUs, HIP for AMD GPUs, OpenMP for shared-memory CPU execution on IBM POWER and Intel Xeon, and a serial backend for functional testing. Portability is achieved by selecting an execution space at compile time or via build configuration; this allows the same application source to run on platforms sponsored by organizations such as DOE and cloud providers using AWS or Microsoft Azure hardware offerings. Backends interoperate with device toolchains like NVIDIA CUDA Toolkit and ROCm and with vendor-specific compilers to leverage architecture-specific optimizations.

Performance and Optimization

Kokkos enables performance tuning through layout choices, memory space selection, and execution policy parameters. Users optimize data locality by selecting LayoutLeft or LayoutRight for Views and by orchestrating deep copies between memory spaces when staging data for accelerators produced by corporations like NVIDIA or AMD. Performance engineers often combine Kokkos with libraries such as KokkosKernels for sparse linear algebra and Tpetra in Trilinos to realize scalable solvers on supercomputers like Summit (supercomputer) and Perlmutter. Profiling integrations with TAU (software), LIKWID, and vendor profilers help guide optimizations. The design strives for low overhead compared with native CUDA or OpenMP implementations while preserving abstraction, using inlining and compile-time dispatch techniques common in modern C++ metaprogramming.

Adoption and Use Cases

Kokkos is widely used in simulation codes across domains: multiphysics frameworks such as ALEGRA, climate modeling systems like CESM, computational chemistry packages, and finite element suites including Albany (software). National laboratories including Sandia National Laboratories, Oak Ridge National Laboratory, and Los Alamos National Laboratory use Kokkos to maintain performance portability for codes that run on leadership-class machines such as Frontier (supercomputer), Summit (supercomputer), and Aurora (supercomputer). The library supports community projects such as Trilinos, KokkosKernels, and domain-specific efforts in plasma physics and materials science, enabling cross-institution collaborations and publications in venues like SC (conference) and IEEE (organization) conferences.

History and Development

Kokkos originated at Sandia National Laboratories in the early 2010s as part of exascale readiness initiatives and collaborations with the U.S. Department of Energy. Development involved contributions from researchers and engineers affiliated with institutions such as Lawrence Livermore National Laboratory, Argonne National Laboratory, and academic groups at University of California, Berkeley and University of Tennessee. Over successive releases the project added backends, refined the View semantics, and introduced Team-level parallelism and enhanced interoperability with Trilinos and third-party performance tools. The community-developed ecosystem now includes KokkosKernels, bindings in other languages, and continuous integration pipelines that test against compilers like GCC and Clang on hardware platforms from NVIDIA, AMD, and Intel.

Category:Numerical libraries Category:Parallel computing