pt-query-digest — LLMpedia

pt-query-digest
Name	pt-query-digest
Developer	Percona
Released	2007
Programming language	Perl
Platform	Unix-like
License	GNU GPL

Contents

Overview
Installation and usage
Output formats and reports
Configuration and options
Integration and workflows
Performance and limitations

pt-query-digest

pt-query-digest is a command-line tool for analyzing query logs and profiling database performance, developed by Percona as part of the Percona Toolkit. It parses slow query logs, general logs, binary logs, and tcpdumps to summarize hotspots, fingerprint queries, and prioritize optimization, and is used in production environments alongside tools from Oracle Corporation, MySQL AB, and MariaDB Corporation Ab. System administrators and database administrators apply it together with monitoring stacks such as Prometheus, Grafana, and Zabbix to identify regressions and triage incidents with databases like MySQL, MariaDB, and forks used in cloud services from Amazon Web Services, Google Cloud Platform, and Microsoft Azure.

Overview

pt-query-digest reads query sources and produces ranked summaries of queries by metrics such as query time, lock time, rows sent, and rows examined. It fingerprints queries to collapse syntactic variants and attributes cost to normalized statements, aiding capacity planning for clusters managed by Facebook-inspired sharding solutions, replication topologies similar to those used by Twitter and LinkedIn, and transactional systems deployed with orchestration from Kubernetes. The tool is commonly cited in workflows that include troubleshooting with Sysdig or observability practices influenced by incidents at Netflix and change-control approaches from GitHub and GitLab.

Installation and usage

pt-query-digest is distributed with the Percona Toolkit and is written in Perl, so installation generally requires a Perl runtime and the toolkit package from distributions maintained by Debian, Red Hat, Ubuntu, or from source on environments provisioned by Ansible, Chef, or Puppet. Typical invocation accepts inputs from files, standard input, or network captures produced by tcpdump; examples in community guides reference using it alongside utilities like mysqldump, mysqlbinlog, and the pt-stalk helper from Percona. Operators running on virtual machines in OpenStack or containers in Docker often integrate the binary into images alongside client tools from Oracle Corporation's MySQL and MariaDB Corporation.

Output formats and reports

pt-query-digest emits human-readable summaries and machine-friendly output such as JSON for ingestion by analytics systems including Elasticsearch, Splunk, or time-series collectors like InfluxDB. Reports include aggregated sections: query classes (fingerprints), sample queries, and detailed histograms of latency useful in incident retrospectives that reference playbooks used at Facebook and Google. Output can be piped into visualization tools such as Kibana or dashboards in Grafana, or stored in ticketing systems like Jira or PagerDuty for follow-up by teams influenced by site reliability engineering practices at Etsy.

Configuration and options

pt-query-digest exposes options for filtering and scoring, including thresholds for minimum total time, sample sizes, and rules for ignoring schema-qualified queries, allowing policies similar to those recommended by O’Reilly Media books and database tuning guides used by practitioners from Percona and Oracle Corporation. The tool supports custom grouping via regular expressions and normalization rules that mirror patterns discussed in talks at conferences such as Percona Live, FOSDEM, and USENIX events. Configuration can be automated in repositories managed with GitHub, audited with SonarQube, and deployed using CI pipelines from Jenkins or GitLab CI.

Integration and workflows

In production workflows pt-query-digest is often embedded in scheduled analysis crons, incident playbooks, or automated pipelines that forward summaries to monitoring stacks like Prometheus or log stores like Graylog. It pairs with Percona utilities such as pt-duplicate-key-checker and pt-table-checksum and complements observability stacks derived from engineering practices at Netflix and LinkedIn. Organizations integrate its JSON output into analytics workflows built on Apache Kafka and Apache Spark for trend analysis, or into change management processes managed with Atlassian tools and deployment workflows inspired by Continuous Delivery practices from Martin Fowler.

Performance and limitations

pt-query-digest is efficient for offline analysis of large logs but is not a continuous real-time profiler; for live sampling use cases engineers prefer lightweight samplers and instrumentation frameworks used at Google or Facebook. Its accuracy depends on the representativeness of input logs—binlog extracts or tcpdumps may miss context such as application-level retries or transaction boundaries documented in case studies from Twitter and GitHub. Large-scale deployments should consider the performance characteristics of Perl-processing on high-volume datasets and may delegate real-time aggregation to streaming platforms like Apache Flink or Apache Kafka Streams while retaining pt-query-digest for forensic postmortem analysis akin to incident reviews at Uber.

Category:Database administration tools