Auto Scaling

Per-workload autoscaling for Kubernetes pods and standalone Docker containers, from one policy surface.

Auto Scaling turns KubeWatch from an observability product into an observability plus control product. Alongside watching your workloads, it can scale them. You set a per-workload policy (thresholds, replica bounds, cooldowns) and the Scaling-Engine acts on the same metrics you already see on your dashboards.

The same policy model backs both runtimes, but what happens underneath differs sharply:

Kubernetes modeDocker mode
Native primitiveHorizontalPodAutoscaler + Karpenter NodePoolNone. KubeWatch is the orchestrator
What KubeWatch writesNative API objects (declarative)Direct Docker Engine API calls
Who executes the scaleThe cluster's own controllersThe KubeWatch agent, on the host
Traffic routing on scale-outService / kube-proxy (already exists)KubeWatch-managed load balancer
If KubeWatch is downHPA keeps working as a native objectScaling pauses; running containers keep serving
The runtime is not a choice you make in the policy editor. A workload already lives on exactly one runtime, and that determines the only valid scaling strategy (`pods` for Kubernetes, `containers` for Docker). The editor only ever offers the thresholds and bounds that are meaningful for the workload in front of you.

Creating a policy

Navigate to Auto Scaling, then New policy:

FieldDescriptionExample
NameHuman-readable policy namecheckout-api autoscaler
RuntimeKubernetes or Docker (sets the strategy)kubernetes
Targetnamespace/deployment (K8s) or host-pool/group (Docker)prod/checkout-api
CPU target %Desired average CPU utilization70
Memory target %Desired average memory utilization (optional)75
Min / Max replicasHard floor and ceiling, the blast-radius cap2 / 10
Step sizeReplicas added or removed per action1
Scale-up / Scale-down cooldownMinimum time between actions, asymmetric by default30s / 300s
Node-pressure target %Kubernetes only. Drives a Karpenter NodePool (if installed)80
Max replicas per hostDocker only. Caps replicas of a group on any one host2
Require approvalHold each action for human approval before it executesoff

New policies start in dry-run (see below). Cooldowns default to an asymmetric pattern, quick to scale up and slow to scale down, because reacting late to a load spike is worse than keeping a spare replica around a few extra minutes.

Dry-run first

Every policy can run with dry_run on: the Scaling-Engine evaluates the thresholds against live data and records exactly what it would do, including the rendered HorizontalPodAutoscaler YAML or the docker.scale command and chosen host, but applies nothing. This is the recommended first week of any policy's life. When you are confident, toggle Go live.

Kubernetes: HPA and Karpenter

For a live Kubernetes policy, KubeWatch renders a standard autoscaling/v2 HorizontalPodAutoscaler (and a Karpenter NodePool when a node-pressure target is set) and the in-cluster agent server-side-applies it with the kubewatch-scaling-engine field manager. The cluster's own HPA controller then moves the replica count. KubeWatch never writes the replica count directly, so it never fights the HPA, and scaling keeps working even if KubeWatch is unreachable.

KubeWatch-generated NodePools default to the conservative WhenEmpty consolidation policy so a cost-saving node consolidation cannot terminate a disruption-sensitive workload you did not opt in. Node-level scaling requires Karpenter to already be installed; if it is not, the option is shown as unavailable.

Docker: placement and the managed load balancer

Standalone Docker has no HPA, so KubeWatch owns the whole loop. On scale-up the engine picks placement, the host with the most CPU/memory headroom, subject to max replicas per host, then enqueues a docker.scale command. The agent clones an existing replica (same image, env, and mounts) to add containers, or stops the newest to remove them.

Because Docker has no Service to route traffic, KubeWatch can run a managed load-balancer pool (opt-in, set KUBEWATCH_MANAGED_LB=true on the agent): a lightweight Caddy reverse proxy on the host, one upstream pool per container group, with active health checks so a new replica receives no traffic until it passes. When the pool is active, the policy shows routing managed. Without it, KubeWatch adjusts the container count and you route to the replicas yourself.

Rollback

Every action that changed something gets a Roll back action in the history, retained for 7 days. The behaviour differs by runtime, and the UI says so at the point of action:

  • Kubernetes: a clean, unqualified restore. KubeWatch re-applies the previous HPA/NodePool spec and the cluster reconciles. No pods are recreated.
  • Docker: restores the previous replica count. Rolling back a scale-up cleanly removes the newest containers; rolling back a scale-down starts fresh containers from the image (the originals are gone), and the confirmation states this plainly.

Safety

  • Cooldowns prevent flapping, the single most important field. Kubernetes inherits this as HPA stabilizationWindowSeconds; Docker enforces it in the engine.
  • Hard bounds cap the blast radius. Setting max = current is a legitimate "scaling paused" configuration.
  • Approval gates (approval_required) hold high-stakes actions for a human, delivered through your existing notification channels.
  • The decision log is the single source of truth for what happened and why: every evaluation that resulted in an action, including dry-run previews, applies, and rollbacks, in chronological order. It is append-only, so a rollback is a new entry referencing the one it reverts, never a rewrite.

How agents receive instructions

KubeWatch agents are push-only and never accept inbound connections, a property worth preserving for a self-hosted product. The autoscaler keeps it: agents receive commands over an agent-initiated long-poll (one more outbound HTTPS request, indistinguishable at the firewall from the existing metrics push). The Kubernetes path does not even use this channel; it applies native objects directly. See the Scaling API for the endpoints.

Required Kubernetes RBAC

The in-cluster agent needs a small set of additional permissions to apply autoscaling objects, and nothing more. Notably it has no verb on pods or deployments directly, and no delete:

- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["karpenter.sh"]
  resources: ["nodepools"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["get", "list"]

The KubeWatch agent Helm chart includes these rules.