BLIS — LLMpedia

BLIS
Name	BLIS
Title	BLIS
Developer	University of Texas at Austin; consortium contributors
Released	2013
Operating system	Cross-platform
License	BSD

Contents

History
Architecture and Design
Performance and Benchmarks
Implementations and Platforms
Applications and Use Cases
Development and Community

BLIS is a portable software framework for rapidly instantiating high-performance Basic Linear Algebra Subprograms (BLAS) functionality on modern processors. It provides a library and an architecture for implementing dense linear algebra kernels tuned to CPUs such as those from Intel Corporation, Advanced Micro Devices, and ARM Limited, and to accelerators like NVIDIA GPUs via vendor ecosystems. BLIS emphasizes modularity, reproducibility, and performance portability for scientific computing communities including users from National Aeronautics and Space Administration, Lawrence Berkeley National Laboratory, and industry partners.

History

BLIS originated from research efforts at the University of Texas at Austin and was motivated by limitations in reference implementations such as the original Netlib BLAS and optimized distributions like ATLAS. Early work drew inspiration from projects at IBM and Cray Research, while learning from implementations like OpenBLAS, Intel Math Kernel Library, and AMD Core Math Library. Development milestones include prototype releases influenced by the microkernels of GotoBLAS and algorithmic ideas parallel to those in ScaLAPACK and LAPACK ecosystems. Funding and collaboration involved institutions such as National Science Foundation, DARPA, and collaborations with research groups at Rice University and Georgia Institute of Technology.

Architecture and Design

BLIS adopts a layered design with algorithmic panels, packing routines, and microkernels inspired by the register-blocking strategies used by Sven Hammarling's work and the blocking approaches in Francois Tisseur's research. The framework decomposes matrix operations into familar operations reminiscent of approaches from Jack Dongarra and Cleve Moler's numerical libraries, enabling optimizations targeting cache hierarchies found in Intel Xeon, AMD EPYC, and ARM Cortex-A processors. BLIS supports selective use of threading backends such as OpenMP, integration with task schedulers like Intel Threading Building Blocks, and interoperability with runtime systems like MPI. Its microkernel interface permits vendor-specific tuning similar to strategies used by NVIDIA's cuBLAS and Google's XLA efforts.

Performance and Benchmarks

BLIS achieves competitive performance on single- and multi-core systems, often matching or approaching the throughput of Intel Math Kernel Library and OpenBLAS on dense matrix multiplication benchmarks derived from suites used by SPEC and Top500. Benchmarks on platforms such as AMD Ryzen, Intel Core i9, and ARM Neoverse show high efficiency for GEMM and Level-3 BLAS operations, and results have been compared alongside libraries used in scientific codes at Los Alamos National Laboratory, Argonne National Laboratory, and Oak Ridge National Laboratory. Performance studies have been reported at conferences like SC Conference, International Conference on High Performance Computing, Networking, Storage and Analysis, and IEEE International Parallel and Distributed Processing Symposium.

Implementations and Platforms

BLIS provides a portable C implementation and has been integrated into numerical stacks used by projects such as SciPy, NumPy, Julia (programming language), and GNU Octave. Vendors have adapted BLIS strategies for processors from ARM Holdings, Qualcomm, and custom accelerators used at Google (company) and Amazon Web Services. It runs on operating systems including Linux, Microsoft Windows, and macOS and has been used in containerized deployments on Docker and orchestration platforms like Kubernetes. Platforms ranging from embedded systems like Raspberry Pi to supercomputers such as those listed on Top500 have hosted BLIS-optimized builds.

Applications and Use Cases

BLIS is used in areas requiring dense linear algebra performance including computational fluid dynamics codes developed at NASA Ames Research Center, climate modeling efforts at NOAA, machine learning frameworks similar to TensorFlow and PyTorch that rely on efficient kernels, and finite element analysis packages used by industry partners like Siemens. Scientific software suites such as MATLAB-compatible toolchains, signal processing stacks developed at Bell Labs derivatives, and econometrics packages in research at National Bureau of Economic Research have leveraged BLIS concepts. High-performance database analytics and graph processing engines at companies like Facebook, Microsoft, and Twitter have also benefited from optimized linear algebra primitives.

Development and Community

BLIS development is coordinated by academic contributors and open-source collaborators including researchers from University of Texas at Austin, University of Tennessee, University of California, Berkeley, and independent contributors. Discussion and contributions occur on platforms such as GitHub, at workshops like Workshop on High Performance Embedded Computing, and during collaborations with industrial partners including Intel, AMD, and ARM. Documentation, testing harnesses, and continuous integration practices reflect standards used by projects like LLVM Project and GNU Project, while outreach and adoption are fostered through tutorials at conferences like SIAM meetings and summer schools at institutions such as Courant Institute.

Category:Numerical linear algebra