Criterion (Rust) — LLMpedia

Criterion (Rust)
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	Criterion
Author	Ben Striegel
Developer	Criterion Developers
Released	2014
Programming language	Rust
Platforms	Linux, macOS, Windows
License	MIT License

Contents

Overview
Features
Usage
Performance and Benchmarks
Comparison with Other Rust Benchmarking Tools
Examples and Case Studies

Criterion (Rust) is a benchmarking framework for the Rust ecosystem designed to provide statistically rigorous measurements of performance for libraries, applications, and microbenchmarks. It integrates techniques from statistical analysis, reproducible benchmarking, and continuous integration to produce comparable results across machines and revisions. Criterion aims to help contributors to projects such as Servo, ripgrep, tokio, Diesel, and Hyper detect regressions, quantify improvements, and attribute performance changes to commits, builds, or configuration differences.

Overview

Criterion began as a port of ideas from benchmarking tools in the C++ and Haskell communities, drawing inspiration from projects like Google Benchmark, BenchmarkDotNet, and the Haskell criterion package. Its design emphasizes statistically sound sampling, outlier detection, and automated plotting to reduce noise introduced by Linux, macOS, and Windows scheduling, Intel and AMD CPU frequency scaling, virtualization under Docker, and interactions with LLVM and the Rust compiler. Criterion is commonly used alongside Cargo and continuous integration services such as Travis CI, GitHub Actions, and Azure Pipelines to track performance over time.

Features

Criterion provides: - Automated warmup and measurement phases with configurable iteration counts to account for Just-in-time compilation-like effects in rustc optimizations and LLVM codegen interactions. - Statistical analysis including median, mean, standard deviation, and confidence intervals to present robust metrics suitable for comparison with projects like Firefox telemetry and Chromium benchmarking. - Regression detection and comparison across commits by integrating with version control systems like Git and hosting platforms such as GitHub and GitLab. - Output formats for plots and tables consumable by Prometheus exporters, Grafana dashboards, and scientific reproducibility tools developed in academia (e.g., workflows used by ACM and IEEE benchmarking studies). - Support for harnessing low-level features like CPU affinity with numa policies, control over perf counters and sampling, and hooks for memory profiling tools like Valgrind and heaptrack to correlate time with allocation patterns. - Integration with ecosystem tools including criterion.rs macros, cargo-bench, and test harnesses used by projects such as Clippy and rustfmt.

Usage

To add Criterion to a Rust project, developers typically add the criterion dependency to Cargo.toml and create benchmark modules under the benchmarks directory used by cargo bench. Typical workflows involve running cargo bench locally, generating comparison reports, and committing benchmark artifacts to CI services like CircleCI or GitHub Actions. Developers working on performance-sensitive crates such as regex, Serde, Rayon, and Hyper use Criterion to create deterministic microbenchmarks, automate detection of regressions introduced by pull requests, and provide maintainers with evidence for optimization efforts. Criterion suites can be parameterized to exercise large datasets from projects like SQLite, PostgreSQL, or Redis wrappers, and combined with fuzzing tools such as AFL and libFuzzer for performance-oriented testing.

Performance and Benchmarks

Criterion emphasizes reproducibility and statistical validity over raw throughput. Benchmarks generated by Criterion report distribution summaries used by researchers at venues like USENIX, PLDI, and OOPSLA to validate claims. In practice, Criterion handles common sources of measurement error: processor turbo boost (seen on Intel and AMD Ryzen CPUs), background daemons on Ubuntu, macOS Big Sur, and Windows Server builds, and allocator effects when swapping between jemalloc and the system allocator. Results are often compared with baselines extracted from releases of rustc and cargo, and visualized alongside historical data from repositories similar to rust-lang/rust and servo/servo.

Comparison with Other Rust Benchmarking Tools

Criterion differs from simpler harnesses like Bencher (the standard library benchmarking interface historically used in Rust) and tools such as cargo-benchcmp by offering richer statistical analysis, automated plotting, and CI-friendly change detection. Compared with BenchmarkDotNet used in the .NET ecosystem or Google Benchmark in C++, Criterion focuses on Rust idioms and integrates tightly with cargo and the crates.io ecosystem. Projects that require microsecond-level profiling may complement Criterion with system profilers such as perf, Instruments on macOS, or Windows Performance Toolkit for sampling, while still relying on Criterion to provide baseline distributions and regression alerts.

Examples and Case Studies

Notable uses include performance work in ripgrep (measuring search throughput across character encodings), tokio (comparing executor scheduling strategies), Servo (tracking rendering pipeline regressions), and database-related crates like Diesel (query serialization overhead). Case studies published by maintainers often show side-by-side plots comparing commits, with annotations referencing pull requests and issues on GitHub and discussions on Mozilla Discourse or users.rust-lang.org. In continuous benchmarking setups, teams combine Criterion outputs with dashboards powered by Grafana and alerting integrated with Slack and PagerDuty to notify contributors of performance regressions tied to specific pull requests or merges.

Category:Rust (programming language)