ARM Compute Library

ARM Compute Library
Name	ARM Compute Library
Developer	ARM Holdings
Released	2015
Programming language	C++
License	Apache License 2.0
Website	ARM Developer

Contents

Introduction
Architecture and Components
Supported Platforms and Hardware Acceleration
Supported Algorithms and APIs
Performance and Optimizations
Development, Licensing, and Deployment

ARM Compute Library

The ARM Compute Library is an open-source software library providing optimized low-level routines for computer vision and machine learning on ARM architecture processors such as Cortex-A and Neon (ARM)-capable cores. It is used by vendors, researchers, and integrators to accelerate workloads on devices produced by companies like Samsung Electronics, Qualcomm, NVIDIA, and Apple Inc., and to integrate with frameworks such as TensorFlow, PyTorch, and ONNX. The project is maintained by engineers affiliated with Arm Ltd. and contributors from industry and academia including teams from Google, Intel, and Xilinx.

Introduction

The library originated to address the performance gap between hand-optimized kernels and generic implementations on embedded processors used in products from Sony, LG Electronics, and Huawei Technologies. It targets applications in mobile imaging, augmented reality in products from Microsoft, and autonomous systems developed by Tesla, Inc. and Waymo. The codebase emphasizes portability across families such as ARMv7-A and ARMv8-A while exposing primitives that map to ISA extensions like NEON (ARM) and SVE (ARM). The project aligns with standards and ecosystems exemplified by collaborations with Khronos Group and integrations used in projects incubated at Linux Foundation.

Architecture and Components

The library organizes functionality into modular components including a tensor abstraction, convolution primitives, image processing kernels, and activation functions. Key modules interact with hardware-specific backends developed for instruction sets such as NEON (ARM), compute engines from Mali (GPU), and DSP blocks in SoCs by Imagination Technologies. It provides a scheduler and memory manager that integrates with platform-specific drivers from vendors including ARM Mali and Adreno (GPU). The source tree contains tests and benchmarks used by contributors from University of California, Berkeley, ETH Zurich, and corporate labs like IBM Research.

Supported Platforms and Hardware Acceleration

Supported processors include families in products by MediaTek, Samsung, Apple Inc., and legacy devices using cores from ARM Holdings. The library supports acceleration via NEON (ARM), SVE (ARM), OpenCL implementations conformant with Khronos Group specifications, and vendor-specific GPU drivers from ARM Mali and Adreno (Qualcomm). Integration points exist for heterogeneous systems combining CPUs, GPUs, and NPUs found in platforms from NVIDIA and Google's TPU research prototypes. Build systems target toolchains such as GCC, Clang, and ARM Compiler to support cross-compilation for embedded boards like Raspberry Pi and reference platforms like ARM Development Boards.

Supported Algorithms and APIs

The library implements convolutional neural network building blocks used in architectures originating from research at University of Toronto and Stanford University such as layers found in AlexNet, VGG (neural network), ResNet, and mobile-focused designs like MobileNet. It provides low-level APIs for tensor manipulation, convolutions, pooling, batch normalization, activation functions (ReLU, Sigmoid), and element-wise operations. Higher-level interoperability is provided for frameworks including TensorFlow, Caffe, MXNet, and ONNX runtimes, enabling deployment pipelines used by teams at Facebook, Uber, and Amazon Web Services.

Performance and Optimizations

Optimizations rely on hand-tuned assembly and intrinsic implementations exploiting microarchitectural features from vendors like ARM Ltd. and Qualcomm. Techniques include cache-aware tiling, loop unrolling pioneered in HPC work at Cray Research and compiler-assisted vectorization similar to strategies in LLVM and GCC projects. The library provides benchmarking harnesses used in comparative studies alongside libraries such as Eigen (C++), Intel Math Kernel Library, and cuDNN to demonstrate throughput and latency on vision workloads. Profiling integrations exist with tools from ARM Development Tools and third-party profilers used by teams at NVIDIA Research.

Development, Licensing, and Deployment

Development is hosted in a public repository with contributions governed by practices similar to those at Linux Foundation projects and corporate open-source programs from Google Open Source and Microsoft Open Source Programs Office. The project is released under the Apache License 2.0 enabling commercial use in consumer products from Sony, LG Electronics, and Huawei Technologies while permitting integration into research stacks at institutions such as Massachusetts Institute of Technology and Stanford University. Deployment workflows are documented for CI/CD systems used by companies like Intel Corporation and cloud providers such as Amazon Web Services and Google Cloud Platform for scaling inference workloads in production.

Category:Software libraries Category:ARM