Apache TVM — LLMpedia

Apache TVM
Name	Apache TVM
Developer	Apache Software Foundation
Initial release	2018
Latest release	2024
Programming language	C++, Python
Operating system	Linux, Windows, macOS
License	Apache License 2.0

Contents

History
Architecture and components
Supported frontends and backends
Compilation workflow
Performance, optimization and auto-tuning
Use cases and adoption
Community and governance

Apache TVM is an open-source deep learning compiler stack designed to optimize and deploy machine learning models across diverse hardware targets. It provides a graph-level and tensor-level intermediate representation to transform models from multiple frontends into optimized kernels for backends including CPUs, GPUs, and accelerators. The project emphasizes automated optimization, portability, and an extensible infrastructure for research and production deployments.

History

TVM originated from research initiatives at the University of Washington laboratories that engaged with projects and institutions such as Microsoft Research, Amazon Web Services, NVIDIA, Intel Corporation, ARM Holdings, and Facebook AI Research. Early prototypes were influenced by research papers and toolchains developed at Berkeley AI Research, Stanford University, and collaborations with contributors from Google Research and Tencent. The project entered incubation under the Apache Software Foundation and graduated to a top-level project, attracting contributions from corporations like IBM, Samsung Electronics, Qualcomm, and Xilinx. Release milestones aligned with conferences and workshops such as NeurIPS, ICML, CVPR, and MLSys, while technique cross-pollination drew on work from TensorFlow, PyTorch, ONNX, TVM Relay, and Halide research.

Architecture and components

TVM's architecture layers include a model import and frontend layer, an intermediate representation layer, a tensor expression and scheduling layer, a code generation and runtime layer, and an RPC-based deployment layer. The intermediate representation connects to ecosystems such as ONNX (software), TensorFlow, PyTorch, MXNet, and Keras. The scheduling and codegen engines incorporate ideas from LLVM, CUDA, OpenCL, ROCm, and Vulkan. Components like the Relay IR, the Tensor Expression (TE) language, the AutoTVM tuner, the Ansor auto-tuning system, and the TVM runtime interact with external tools including CMake, Bazel, GitHub, Travis CI, and Jenkins. The stack integrates with hardware-specific runtimes from vendors such as Arm NN, Intel oneAPI, NVIDIA CUDA Toolkit, and Xilinx Vitis.

Supported frontends and backends

TVM accepts models from frontends and model formats developed by organizations including TensorFlow, PyTorch, ONNX, MXNet, Keras, Chainer, Caffe, and Theano. It emits code and binaries targeting backend toolchains and accelerators maintained by NVIDIA, AMD, Intel Corporation, ARM Holdings, Qualcomm, Google, Huawei, Xilinx, MediaTek, and Cadence Design Systems. TVM supports execution via backends such as CUDA, ROCm, OpenCL, Vulkan, Metal (API), WebAssembly, and native x86 and ARM ABIs. Integration points exist for inference runtimes like TensorRT, OpenVINO, NNAPI, Core ML, and vendor SDKs from NVIDIA DriveWorks and Google Coral.

Compilation workflow

The compilation workflow transforms frontend graphs into optimized binaries using passes and tools rooted in research from LLVM, Halide, TVM Relay, and auto-tuning systems developed in collaboration with teams from Amazon Web Services and Microsoft Research. Steps include graph lowering, operator fusion, memory planning, schedule generation, and code emission targeting toolchains such as gcc, clang, nvcc, and hipcc. The end-to-end pipeline is orchestrated alongside continuous integration services provided by GitHub Actions, Travis CI, and Azure DevOps, while reproducibility and benchmarking often leverage suites like MLPerf, DAWNBench, and hardware testbeds from NVIDIA DGX, Google TPU Pod, and Intel Xeon clusters.

Performance, optimization and auto-tuning

TVM emphasizes performance through multi-level optimizations, drawing on algorithms and heuristics from research at Berkeley AI Research, Stanford DAWN Project, and laboratories such as Facebook AI Research. Auto-tuning frameworks like AutoTVM and Ansor perform search-based optimization inspired by work from NeurIPS and ICLR communities, leveraging reinforcement learning and evolutionary strategies discussed at ICML and AAAI. Performance engineering relies on vendor profilers such as NVIDIA Nsight, Intel VTune, and ROCm Profiler, and on benchmarking against suites curated by MLCommons. Tactics include operator fusion, layout transformation, mixed-precision support popularized by NVIDIA Tensor Cores and Intel MKL-DNN, and memory optimizations akin to those in XLA.

Use cases and adoption

TVM is used for inference and deployment workflows by companies and projects in autonomous systems, cloud services, edge computing, mobile platforms, and data centers, including adopters like Amazon Web Services, Microsoft Azure, Google Cloud Platform, Facebook, Bytedance, Huawei, SenseTime, Baidu, DJI, and Tesla research teams. Typical applications span computer vision models from ResNet, YOLO, and Mask R-CNN to natural language models such as BERT, GPT-2, and sequence models popularized by OpenAI and Google Brain. TVM enables optimizations for embedded platforms like Raspberry Pi, Jetson Xavier, Coral Dev Board, and specialized accelerators from NVIDIA Drive and Intel Movidius.

Community and governance

The project governance follows the Apache Software Foundation model with a Project Management Committee and a diverse contributor base including engineers and researchers from Amazon Web Services, Microsoft, NVIDIA, Intel Corporation, Huawei, Xilinx, ARM Holdings, IBM Research, Facebook AI Research, and academic groups at University of Washington, UC Berkeley, and Tsinghua University. Community activities include developer meetings at conferences such as NeurIPS, ICML, CVPR, and MLSys, collaborative workshops with IEEE and ACM events, and public mailing lists and repositories hosted on GitHub. The project accepts contributions under the Apache License 2.0 and organizes mentorship programs aligned with initiatives like Google Summer of Code.

Category:Compilers Category:Machine learning software