Thanos (software)

Thanos (software)
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	Thanos
Developer	Multiple organizations and contributors
Released	2019
Programming language	Go
Repository	GitHub
License	Apache License 2.0

Contents

Overview
Architecture and Components
Deployment and Operation
Use Cases and Integrations
Performance and Scalability
Security and Reliability
Development and Community

Thanos (software) is an open-source project that extends Prometheus (software) by providing long-term storage, global query view, and high-availability features for metrics. It integrates with cloud providers such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure and complements monitoring stacks involving Grafana, Kubernetes, and Helm (software). Initially developed by contributors from companies like Improbable (company), Red Hat, and CoreOS it has been adopted by organizations including SoundCloud, Monzo, and Slack.

Overview

Thanos provides a set of components that enable scalable, durable observability across clusters and regions, aligning with practices popularized by projects such as Prometheus (software), Cortex (software), and OpenTelemetry. It focuses on features like global querying, downsampling, and object storage integration for cold metrics through adapters for backends like Amazon S3, Google Cloud Storage, and MinIO. The project is governed by community processes similar to those used in Cloud Native Computing Foundation, Linux Foundation, and other cloud native foundations.

Architecture and Components

Thanos is composed of discrete services that interoperate in a microservices pattern inspired by designs used in Kubernetes, Envoy (software), and Istio. Core components include a sidecar that attaches to Prometheus (software), a store gateway that reads from object stores, a compactor that performs retention and downsampling, and a query layer that federates data similar to Prometheus Federation implementations. Additional pieces such as receive adapters, ruler components, and shipper utilities interact with systems like Alertmanager, VictoriaMetrics, and Cortex (software) to provide alerting, ingestion, and long-term retention.

Deployment and Operation

Thanos deployments commonly follow patterns established by Kubernetes, with package management via Helm (software) charts and continuous delivery pipelines using tools like Argo CD and Flux (software). Typical topologies place sidecars alongside Prometheus (software) instances, store gateways accessing Amazon S3 or Google Cloud Storage, and query replicas behind load balancers such as NGINX, HAProxy, or cloud-native services from Amazon Web Services and Google Cloud Platform. Operational practices draw on runbooks and incident responses modeled after procedures from PagerDuty, SRE (site reliability engineering), and major operators like Netflix.

Use Cases and Integrations

Organizations use Thanos for multi-cluster observability across platforms including Kubernetes, OpenShift, and bare-metal deployments managed by Ansible. It integrates with visualization tools like Grafana, alerting platforms such as Alertmanager and PagerDuty, and storage backends like Amazon S3, Google Cloud Storage, Azure Blob Storage, and Ceph (software). Enterprises often combine Thanos with tracing solutions like Jaeger (software) and metrics ecosystems including Prometheus (software), Cortex (software), and VictoriaMetrics to implement unified telemetry strategies akin to those advocated by Observability (software) practitioners at Google and Microsoft.

Performance and Scalability

Thanos scales horizontally by adding more store gateway replicas and query nodes, using architectural patterns similar to Cassandra, Etcd, and CockroachDB for distributed availability. Compaction and downsampling reduce query cost and storage usage using techniques comparable to time-series optimizations in InfluxDB and OpenTSDB. Benchmarks from operators reference cluster sizes and retention windows similar to deployments at Spotify, Airbnb, and Shopify, with designs accommodating petabyte-scale cold storage via object stores such as Amazon S3 and distributed filesystems like Ceph (software).

Security and Reliability

Thanos supports transport security and access controls following practices outlined by TLS (Transport Layer Security), OAuth 2.0, and role-based access patterns seen in Kubernetes RBAC and OpenID Connect (OIDC). Reliability strategies include multi-region redundancy, read-replicas, and consistent compaction schedules mirroring high-availability designs from Google Cloud Platform and Amazon Web Services. Backup and restore workflows integrate with snapshotting and lifecycle policies comparable to those used in Velero and object lifecycle features in Amazon S3.

Development and Community

The project is developed in Go (programming language) with source hosted on GitHub and contributions coordinated through issues and pull requests, a model shared with projects like Kubernetes, Prometheus (software), and Helm (software). The community includes contributors from companies like Improbable (company), Red Hat, CoreOS, and users in ecosystems associated with Cloud Native Computing Foundation events, meetups, and conferences such as KubeCon and CloudNativeCon. Roadmaps and release processes align with practices seen in Semantic Versioning-driven projects and continuous integration workflows using GitHub Actions and CircleCI.

Category:Free software