Generated by GPT-5-mini| Flamegraph (perf) | |
|---|---|
| Name | Flamegraph (perf) |
| Developer | Brendan Gregg, [Brendan Gregg] not linked in text per rules |
| Released | 2010s |
| Programming language | C, Perl, Python |
| Operating system | Linux |
| Genre | Performance analysis |
Flamegraph (perf) Flamegraph (perf) is a visualization technique and toolchain that maps sampled call stacks from the Linux perf profiler into an interactive stacked graph. It enables engineers to spot hot paths in CPU-bound workloads and to correlate runtime behavior with system events, libraries, and kernel subsystems. The approach has been adopted across industry and academia for performance tuning of applications running on Linux kernel, cloud platforms such as Amazon Web Services, and enterprise systems maintained by organizations like Google, Facebook, and Netflix.
Flamegraph (perf) converts sampled stack traces into a layered, color-coded representation where width corresponds to cumulative sample counts; this makes bottlenecks visually prominent for software such as nginx, PostgreSQL, MySQL, Java Virtual Machine, and runtime engines like V8 (JavaScript engine). The visualization links occurrences to system-level artifacts including libraries in GNU C Library, kernel functions exposed by Linux kernel, and framework code in Django, Rails, or Spring Framework. Security operations teams at companies including Microsoft and IBM use flamegraphs to investigate regressions introduced by commits in repositories hosted on GitHub or GitLab.
The technique evolved from sampling profilers and visualization work by practitioners dealing with performance issues in datacenter-scale services operated by Yahoo!, Netflix, and high-frequency trading firms on NASDAQ. Early inspirations include call-graph visualizations in tools like gprof and tracing systems such as DTrace and SystemTap. The modern flamegraph presentation was popularized in performance talks at conferences like USENIX and LinuxCon, and adopted in tooling ecosystems maintained by Red Hat and cloud providers such as Google Cloud Platform and Microsoft Azure.
Integration relies on the Linux perf utility to collect samples via events such as CPU cycles or software events (context switches, page faults). Engineers record perf data using commands that reference the Linux kernel perf_events subsystem and map samples to binaries and shared objects managed by GNU Binutils and ELF metadata. After collection, stack collapse and visualization stages use scripts compatible with environments on distributions like Ubuntu and Fedora, and are integrated into CI pipelines hosted on platforms such as Jenkins and Travis CI for regression detection.
The pipeline accepts perf data in the form of sampled stacks annotated with instruction pointers and symbol names resolved by tools from GNU Binutils such as addr2line and nm. Collation produces a stack-frequency text format where semicolon-separated frames and counts encode aggregation, akin to outputs from gprof-style profilers but oriented to visualization. The rendering stage uses vector or SVG output optimized for browsers in projects like Chromium and Mozilla Firefox, and can interoperate with binary instrumentation frameworks such as Intel PIN and dynamic tracers like eBPF for richer metadata fusion.
Interpreters examine wide frames to identify "hot" code paths within runtimes like OpenJDK or native modules in Node.js applications. Color palettes and grouping strategies map frames to components such as libc, kernel modules developed by Linus Torvalds's tree, or third-party libraries distributed via Maven or npm. Cross-referencing with changelogs in Git repositories, or release notes from vendors like Canonical and SUSE, helps correlate performance shifts to specific commits, system updates, or library upgrades.
Common workflows begin with baseline collection on staging instances provisioned in Amazon EC2 or Google Compute Engine, followed by targeted sampling during load tests executed by tools such as JMeter and Locust. Teams instrument deployments orchestrated by Kubernetes and monitor metrics in systems like Prometheus and Grafana while using flamegraphs to triage hotspots. Postmortem analyses combine perf-derived flamegraphs with traces from distributed tracing systems like Jaeger and Zipkin to align local CPU hotspots with remote latency effects.
Sampling-based flamegraphs depend on statistical coverage and can miss short-lived or I/O-bound events prominent in subsystems such as NVMe drivers or encrypted storage stacks from vendors like Intel Corporation. Alternatives and complements include deterministic tracers like SystemTap, DTrace, and instrumentation-driven profilers in Java Flight Recorder or eBPF-based tools such as projects within the Linux Foundation. For memory-centric analysis, tools like Valgrind and heap profilers integrated with GDB or LLDB may be preferred.
Category:Performance analysis tools