Auto Scaling
Per-workload autoscaling for Kubernetes pods and standalone Docker containers, from one policy surface.
Auto Scaling turns KubeWatch from an observability product into an observability plus control product. Alongside watching your workloads, it can scale them. You set a per-workload policy (thresholds, replica bounds, cooldowns) and the Scaling-Engine acts on the same metrics you already see on your dashboards.
The same policy model backs both runtimes, but what happens underneath differs sharply:
| Kubernetes mode | Docker mode | |
|---|---|---|
| Native primitive | HorizontalPodAutoscaler + Karpenter NodePool | None. KubeWatch is the orchestrator |
| What KubeWatch writes | Native API objects (declarative) | Direct Docker Engine API calls |
| Who executes the scale | The cluster's own controllers | The KubeWatch agent, on the host |
| Traffic routing on scale-out | Service / kube-proxy (already exists) | KubeWatch-managed load balancer |
| If KubeWatch is down | HPA keeps working as a native object | Scaling pauses; running containers keep serving |
Creating a policy
Navigate to Auto Scaling, then New policy:
| Field | Description | Example |
|---|---|---|
| Name | Human-readable policy name | checkout-api autoscaler |
| Runtime | Kubernetes or Docker (sets the strategy) | kubernetes |
| Target | namespace/deployment (K8s) or host-pool/group (Docker) | prod/checkout-api |
| CPU target % | Desired average CPU utilization | 70 |
| Memory target % | Desired average memory utilization (optional) | 75 |
| Min / Max replicas | Hard floor and ceiling, the blast-radius cap | 2 / 10 |
| Step size | Replicas added or removed per action | 1 |
| Scale-up / Scale-down cooldown | Minimum time between actions, asymmetric by default | 30s / 300s |
| Node-pressure target % | Kubernetes only. Drives a Karpenter NodePool (if installed) | 80 |
| Max replicas per host | Docker only. Caps replicas of a group on any one host | 2 |
| Require approval | Hold each action for human approval before it executes | off |
New policies start in dry-run (see below). Cooldowns default to an asymmetric pattern, quick to scale up and slow to scale down, because reacting late to a load spike is worse than keeping a spare replica around a few extra minutes.
Dry-run first
Every policy can run with dry_run on: the Scaling-Engine evaluates the thresholds against live data and records exactly what it would do, including the rendered HorizontalPodAutoscaler YAML or the docker.scale command and chosen host, but applies nothing. This is the recommended first week of any policy's life. When you are confident, toggle Go live.
Kubernetes: HPA and Karpenter
For a live Kubernetes policy, KubeWatch renders a standard autoscaling/v2 HorizontalPodAutoscaler (and a Karpenter NodePool when a node-pressure target is set) and the in-cluster agent server-side-applies it with the kubewatch-scaling-engine field manager. The cluster's own HPA controller then moves the replica count. KubeWatch never writes the replica count directly, so it never fights the HPA, and scaling keeps working even if KubeWatch is unreachable.
KubeWatch-generated NodePools default to the conservative WhenEmpty consolidation policy so a cost-saving node consolidation cannot terminate a disruption-sensitive workload you did not opt in. Node-level scaling requires Karpenter to already be installed; if it is not, the option is shown as unavailable.
Docker: placement and the managed load balancer
Standalone Docker has no HPA, so KubeWatch owns the whole loop. On scale-up the engine picks placement, the host with the most CPU/memory headroom, subject to max replicas per host, then enqueues a docker.scale command. The agent clones an existing replica (same image, env, and mounts) to add containers, or stops the newest to remove them.
Because Docker has no Service to route traffic, KubeWatch can run a managed load-balancer pool (opt-in, set KUBEWATCH_MANAGED_LB=true on the agent): a lightweight Caddy reverse proxy on the host, one upstream pool per container group, with active health checks so a new replica receives no traffic until it passes. When the pool is active, the policy shows routing managed. Without it, KubeWatch adjusts the container count and you route to the replicas yourself.
Rollback
Every action that changed something gets a Roll back action in the history, retained for 7 days. The behaviour differs by runtime, and the UI says so at the point of action:
- Kubernetes: a clean, unqualified restore. KubeWatch re-applies the previous HPA/NodePool spec and the cluster reconciles. No pods are recreated.
- Docker: restores the previous replica count. Rolling back a scale-up cleanly removes the newest containers; rolling back a scale-down starts fresh containers from the image (the originals are gone), and the confirmation states this plainly.
Safety
- Cooldowns prevent flapping, the single most important field. Kubernetes inherits this as HPA
stabilizationWindowSeconds; Docker enforces it in the engine. - Hard bounds cap the blast radius. Setting
max = currentis a legitimate "scaling paused" configuration. - Approval gates (
approval_required) hold high-stakes actions for a human, delivered through your existing notification channels. - The decision log is the single source of truth for what happened and why: every evaluation that resulted in an action, including dry-run previews, applies, and rollbacks, in chronological order. It is append-only, so a rollback is a new entry referencing the one it reverts, never a rewrite.
How agents receive instructions
KubeWatch agents are push-only and never accept inbound connections, a property worth preserving for a self-hosted product. The autoscaler keeps it: agents receive commands over an agent-initiated long-poll (one more outbound HTTPS request, indistinguishable at the firewall from the existing metrics push). The Kubernetes path does not even use this channel; it applies native objects directly. See the Scaling API for the endpoints.
Required Kubernetes RBAC
The in-cluster agent needs a small set of additional permissions to apply autoscaling objects, and nothing more. Notably it has no verb on pods or deployments directly, and no delete:
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["karpenter.sh"]
resources: ["nodepools"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
verbs: ["get", "list"]
The KubeWatch agent Helm chart includes these rules.