Generated by GPT-5-mini| Horizontal Pod Autoscaler | |
|---|---|
| Name | Horizontal Pod Autoscaler |
| Developer | Kubernetes |
| First release | 2016 |
| Programming language | Go |
| License | Apache License 2.0 |
Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler is a Kubernetes controller that automatically adjusts the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed resource metrics and custom metrics. It integrates with the Kubernetes Control Plane and Metrics Server to respond to workload demand, aiming to maintain performance objectives and resource efficiency across clusters. Originating from ideas in cloud auto-scaling and container orchestration, the controller interacts with the API Server, kube-controller-manager, and custom adapters to reconcile desired replica counts.
The controller operates within the Kubernetes ecosystem, interacting with the Kubernetes API and kubelet agents to change replica counts for objects such as Deployment (Kubernetes), ReplicaSet (Kubernetes), and StatefulSet (Kubernetes). It relies on the Metrics Server (Kubernetes) and external adapters like the Prometheus Adapter to fetch metrics such as CPU and memory usage, as well as custom application metrics. Historically influenced by auto-scaling services from Amazon Web Services, Google Cloud Platform, and Microsoft Azure, the autoscaler reflects patterns established in systems like Amazon EC2 Auto Scaling and Google Cloud Autoscaler. The feature was designed alongside cluster management tools such as kubeadm and integrates with logging and observability stacks including Prometheus, Grafana, and Elastic Stack.
The HPA comprises a controller loop in the kube-controller-manager that reads target resource definitions from the Kubernetes API and writes updated replica counts back. Core components include the HPA controller, metrics backends such as Metrics Server (Kubernetes), external adapters like the Prometheus Adapter, and HorizontalPodAutoscaler API objects defined by the CustomResourceDefinition patterns. It is influenced by controller patterns used in projects like etcd, Helm (software), and Operator pattern. The controller relies on reconciliation concepts from Control loop (engineering) and integrates with scheduler concepts exemplified by kube-scheduler and resource accounting used in cAdvisor. For authentication and RBAC it uses Role-Based Access Control and interacts with service accounts managed by kube-system components.
Users declare an HPA via a YAML manifest referencing a target object such as Deployment (Kubernetes) and specify metrics like CPU utilization or custom metrics exposed through adapters. Common fields include scaleTargetRef, minReplicas, maxReplicas, and metrics (resource, pods, external, object). Typical workflows involve CI/CD tools such as Jenkins, Tekton, Argo CD, and Flux to apply manifests. Cluster administrators often combine HPA with cluster autoscaling solutions like Cluster Autoscaler (Kubernetes) and cloud provider autoscaling services including Google Kubernetes Engine, Amazon EKS, and Azure Kubernetes Service to manage node capacity. Policies and manifests are versioned with Git and managed alongside infrastructure as code tools like Terraform and Pulumi.
HPA supports resource metrics (CPU, memory) via the Metrics Server (Kubernetes) and custom metrics via adapters such as Prometheus Adapter that query Prometheus servers. External metrics can be pulled from services like AWS CloudWatch, Stackdriver, and Datadog through respective adapters and exporters. Scaling policies may include stabilization windows, select behavior parameters, and scaling rules that mirror techniques from control theory and mechanisms used in auto-scaling systems. Operators tune thresholds based on SLOs and SLIs tracked in tools like New Relic, Dynatrace, and Sentry to avoid oscillation and ensure steady-state performance.
The controller uses a periodic reconciliation loop that samples metrics and computes the desired replica count using formulas analogous to proportional controllers from PID controller theory (though HPA implements primarily proportional logic with rate-limiting features). It calculates desiredReplicas using current metric values and target metrics, applies bounds from minReplicas and maxReplicas, and then applies scaling policies such as stabilization windows and tolerance. Metrics retrieval uses the Custom Metrics API and External Metrics API extension points, enabling integration with systems like Prometheus, InfluxDB, and OpenTelemetry. The controller persists state in the etcd datastore through the Kubernetes API and uses optimistic concurrency and leader election patterns similar to those in kube-controller-manager and etcd to ensure high availability.
Best practices include combining HPA with Cluster Autoscaler (Kubernetes) and node pool strategies offered by Google Kubernetes Engine and Amazon EKS to ensure nodes can satisfy increased pod counts, setting conservative scaling policies to avoid thrashing, and exposing stable custom metrics through Prometheus or OpenTelemetry. Limitations include reliance on metrics freshness from Metrics Server (Kubernetes), delayed reaction to sudden traffic spikes, difficulty scaling stateful workloads like those managed by StatefulSet (Kubernetes), and potential interactions with admission controllers like OPA or Gatekeeper (software). Operators also must consider quota and limit range constraints defined by ResourceQuota and LimitRange and plan for cold-start characteristics common in serverless patterns from Knative.
Common use cases include web service scaling for frameworks and platforms such as Django, Ruby on Rails, Node.js, and Spring Boot applications, batch processing with tools like Apache Spark, event-driven processing with Kafka (software) and RabbitMQ, and microservice fleets managed by Istio or Linkerd. Example scenarios: an e-commerce frontend deployed via Helm (software) on Amazon EKS with metrics scraped by Prometheus and alerted by Alertmanager; a CI runner fleet scaled in GitLab backed by Runner autoscaling; and data processing pipelines orchestrated by Argo Workflows scaling workers based on queue depth in RabbitMQ or Amazon SQS. Integration patterns often pair HPA with observability stacks including Grafana, Prometheus, and Loki for dashboards and incident response coordinated with PagerDuty or Opsgenie.