MLPerf — LLMpedia

MLPerf
Name	MLPerf
Founded	2018
Focus	Artificial intelligence benchmarking
Key people	David Patterson, Vijay Janapa Reddi
Website	mlperf.org

Contents

Overview
Benchmarks
Results and Impact
Governance and Organization
History and Development

MLPerf is a broad consortium that develops standardized benchmarks for measuring the performance of machine learning hardware, software, and services. Founded in 2018 by a coalition of leading academic and industry researchers, it aims to provide fair, reproducible, and representative evaluations to guide innovation and purchasing decisions. The suite of benchmarks covers a wide range of tasks, from computer vision and natural language processing to recommendation systems and reinforcement learning.

Overview

The primary goal is to establish a rigorous, vendor-neutral framework for assessing the speed and efficiency of machine learning systems. This initiative was launched by researchers from institutions including Stanford University, Harvard University, and the University of California, Berkeley, alongside engineers from major technology companies. The benchmarks are designed to reflect real-world workloads, moving beyond simplistic metrics to evaluate full training and inference pipelines. This provides valuable data for comparing offerings from vendors like NVIDIA, Google, Intel, and Amazon Web Services.

Benchmarks

The benchmark suite is divided into several categories, each targeting a critical area of modern artificial intelligence research and deployment. The training benchmarks measure the time required to train a model to a target quality on tasks such as image classification using ResNet on the ImageNet dataset, and language modeling with BERT on the Wikipedia corpus. Inference benchmarks evaluate the performance of running already-trained models, covering scenarios like data center servers, edge computing devices, and mobile platforms. Additional benchmarks focus on emerging areas, including the HPC-oriented CosmoFlow and the recommendation system model DLRM.

Results and Impact

Published results, submitted by participants and audited for compliance, have become a key reference point in the industry, driving competition and transparency. These submissions have highlighted the rapid performance improvements in hardware like NVIDIA's A100 and H100 GPUs, Google's TPU v4, and specialized accelerators from startups. The data heavily influences procurement decisions for cloud computing resources and enterprise AI infrastructure. Furthermore, the focus on power efficiency has spurred innovation in green computing, pushing vendors to optimize not just for speed but also for performance per watt.

Governance and Organization

The project is stewarded by MLCommons, a nonprofit open engineering consortium formed to oversee its development and related initiatives. MLCommons includes a broad membership of industry leaders, academic institutions, and researchers who contribute to the benchmark definitions and rules. Technical working groups, comprising experts from organizations like Google, Intel, Facebook AI Research, and Microsoft, are responsible for developing and updating the specific benchmarks. This collaborative governance model ensures the benchmarks remain relevant and reflect the evolving landscape of machine learning.

History and Development

The effort was publicly announced in May 2018 by a founding group that included David Patterson of Google and University of California, Berkeley, and Vijay Janapa Reddi of Harvard University. The first set of training benchmarks was released later that year, with the inaugural inference benchmarks following in 2019. Since its inception, the suite has expanded significantly, adding new domains such as reinforcement learning with the MiniGo benchmark and the MLPerf Storage benchmark to evaluate data pipeline performance. Its success has led MLCommons to launch sister projects like MLPerf HPC and MLPerf Tiny for ultra-low-power devices.

Category:Artificial intelligence organizations Category:Computer benchmarks Category:Technology consortia