UndefinedBehaviorSanitizer

UndefinedBehaviorSanitizer
Name	UndefinedBehaviorSanitizer
Developer	Google (company), Clang (compiler), LLVM Project
Released	2012
Programming language	C (programming language), C++
Operating system	Linux, macOS, Microsoft Windows
License	BSD license

Contents

Overview
Supported Undefined Behaviors
Implementation and Architecture
Usage and Integration
Performance and Limitations
Examples and Diagnostics

UndefinedBehaviorSanitizer is a runtime instrumentation tool integrated into Clang (compiler) and the LLVM Project toolchain that detects violations of the C (programming language) and C++ language specifications at program execution. It complements tools such as AddressSanitizer, ThreadSanitizer, and MemorySanitizer by focusing on undefined behavior documented in the C++11 and C11 standards and later revisions. Major contributors and users include Google (company), Apple Inc., and various open-source projects hosted by GitHub and developed in cooperation with compiler communities such as GCC and FreeBSD.

Overview

UndefinedBehaviorSanitizer operates as a compile-time instrumentation and runtime checker that inserts checks for illegal operations specified by the C11 and C++11 standards and subsequent standards. It targets errors ranging from type-punning violations identified in Strict aliasing guidelines to integer overflow scenarios referenced in ISO/IEC standards and common bug classes tracked in large codebases like those of Chromium (web browser), Mozilla, and Kubernetes. Its design philosophy aligns with other Sanitizer (tool) projects from the LLVM Project ecosystem and with static analysis efforts such as Clang Static Analyzer.

Supported Undefined Behaviors

The sanitizer detects a broad set of undefined behaviors explicitly called out in the C11 and C++11 standards and later, including arithmetic and memory errors, pointer misuse, and control-flow anomalies. Examples include: - Signed integer overflow as described in C11 and implicated in optimization issues discussed in LLVM optimization and GCC optimizations. - Misaligned pointer access and uninitialized variable use relevant to platforms like x86-64, ARM, and PowerPC. - Type punning violations that violate the strict aliasing rule and interact with implementations such as glibc and runtime libraries used by Linux kernel components. - Illegal shifts and division by zero relevant to projects like OpenSSL and LibreSSL. - Use-after-return and use-after-scope scenarios examined in codebases like OpenBSD and FreeBSD.

Implementation and Architecture

The implementation embeds checks via compiler instrumentation in Clang (compiler) front-end passes and leverages the LLVM IR for transformation and lowering. It relies on runtime support libraries linked into binaries, similar to how AddressSanitizer and ThreadSanitizer provide shared runtime hooks; these are maintained in the llvm-project sources alongside sanitizer runtimes. The architecture includes: - Compile-time insertion of guards in LLVM optimization passes influenced by Loop invariant code motion and Dead code elimination considerations. - Runtime error-reporting components integrating with platform-specific logging on Linux, macOS, and Microsoft Windows. - Integration points for build systems such as CMake, Bazel (software), and Make (software), and continuous integration pipelines used by Travis CI, GitLab CI, and Jenkins.

Usage and Integration

Developers enable the tool by passing flags to Clang (compiler), or via configuration in build systems like CMake and Bazel (software), commonly alongside AddressSanitizer and ThreadSanitizer. Integration patterns appear in large projects such as Chromium (web browser), LLVM Project, Android (operating system), and server software like nginx and Apache HTTP Server. Typical workflows: - Instrumentation during debug or fuzzing runs using fuzzers like American Fuzzy Lop and libFuzzer to catch undefined behaviors in continuous integration. - Combining with static analyzers such as Clang Static Analyzer and formal verification tools used in seL4 and CompCert-adjacent efforts. - Employing sanitizer suppression files analogous to AddressSanitizer suppressions to manage expected issues in third-party libraries like OpenSSL or zlib.

Performance and Limitations

Runtime overhead varies with the enabled checks and program behavior; common trade-offs resemble those observed with AddressSanitizer and MemorySanitizer. Overheads are influenced by architecture specifics like x86-64 versus ARMv8 instruction sets and by optimizations applied in GCC versus Clang (compiler). Limitations include: - Incomplete coverage for some language constructs optimized away by aggressive link-time optimization or by undefined-behavior-dependent transformations in LLVM IR. - False negatives when code paths are not exercised; complementary approaches such as unit testing, fuzzing with libFuzzer, and static analysis reduce blind spots. - Interaction complexity with third-party runtimes like glibc, language runtimes for Rust (programming language) and Go (programming language), and with kernel mode components where runtime instrumentation may be infeasible.

Examples and Diagnostics

Typical diagnostic output pinpoints source locations and stack traces and often includes remediation hints used by developers at organizations such as Google (company), Mozilla, and Apple Inc.. Example categories: - Signed integer overflow report with stack trace referencing functions in std::vector usage scenarios and libraries like Boost (software). - Misaligned access reports showing interaction with architecture-specific code in Linux kernel device drivers and virtualization layers such as QEMU and KVM. - Type-punning and strict aliasing violations revealed in code using memcpy or union-based casts in projects like OpenSSL and FFmpeg (software), with remediation often guided by standards committees such as ISO/IEC JTC 1/SC 22.

Category:Software debugging tools