Platform teams on clustervps Mac clusters need event-driven canaries: scale when the build queue spikes, gate traffic on gateway slices, and broadcast webhook failures before KEDA adds replicas. This guide is a minimal repro—ScaledObject triggers tied to your queue, multi-AZ gateway canary slices, merged latency and error probes, failure-summary broadcast, skill-pack locks, and 1TB/2TB disk watermarks—with a decision matrix that contrasts KEDA with our Flagger canary walkthrough and Argo Rollouts AnalysisRun guide.

Pain points before the first KEDA scale event

KEDA answers how many gateway workers to run; it does not replace canary judgment on OpenClaw Macs that still tail logs and promote artifacts.

  • Queue-only scaling: replicas climb while a hot canary slice still serves the wrong skill hash.
  • Split metrics: HPA sees low CPU while gateway 5xx rates rise on the VIP your webhook never scores.
  • Webhook storms: ScaledObject polling plus unbounded retries stampede every clustervps node behind the load balancer.

KEDA ScaledObject triggers and the build queue

Point the ScaledObject trigger at the same depth signal your OpenClaw build lane exports—Redis list length, NATS pending messages, or a Prometheus gauge from the coordinator on a clustervps gateway Mac. Cap maxReplicaCount during canary so scale-out cannot outrun probe budgets.

TriggerStarter valueCanary note
Queue depthScale at ≥ 8 pending jobsHold max replicas +1 until merged JSON passes.
Cooldown120s scale-down delayPrevents thrash when webhooks retry during analysis.
minReplicaCount1 stable + 0 canary workersCanary workers use a separate Deployment label.
Activation0 → idle gateways sleepWake only the AZ slice under test.

Align promotion locks with Nomad build-lock patterns so rsync never runs while KEDA scales the canary Deployment.

Multi-AZ gateway canary slices

Pin gateway_version and az labels per Mac. Route canary webhooks to a tagged hostname from multi-AZ gateway webhooks, not the stable pool operators use for SSH. Reuse traffic ratios from multi-AZ canary skills and per-node fragments from fragment merge workflows. Pick regions on home before adding a fourth gateway node.

Metric probes: latency and error-rate thresholds

Return one merged JSON document per ScaledObject polling interval—same discipline as Rollouts or Flagger, but scored by your gateway webhook before KEDA raises replicas.

SignalStarter thresholdFail when
Canary 5xx rate≤ 0.5% over five minutesTwo consecutive windows above ceiling.
Gateway p99 latency≤ 220 ms on canary VIPRegression > 15% vs stable baseline.
Queue depth≤ 12 pending jobsDepth grows while canary weight increases.
degraded flagHTTP 200 with explicit booleandegraded: true fails closed.
{
  "status": "healthy",
  "keda": "openclaw-build-lane",
  "canary": { "5xx_rate": 0.003, "p99_ms": 158 },
  "gateway": { "disk_ok": true, "queue_depth": 5, "skill_hash": "c4e1…" },
  "degraded": false
}

Webhook failure-summary broadcast

Mount dual bearer secrets with overlap for at least one full KEDA polling window. On non-success classifications, batch a digest to the notifier Mac using cluster logs and webhook digests—operators read one summary while the scaler retries.

  • Primary token: ScaledObject custom metric / gateway webhook header.
  • Overlap token: accepted seven days after rotation.
  • Retry cap: three gateway attempts with jitter; polling interval ≥ 60s during maintenance.

Skill-pack version lock and rollback

Freeze skill-pack hashes while canary_active=true. On abort, revert hash, set KEDA maxReplicaCount to the stable lane value, and release rsync flock locks per the artifact rsync matrix. Doctor failures still fail the merged JSON—see Doctor deep checks before widening traffic.

1TB / 2TB disk watermarks on gateway Macs

Include disk_ok in merged probes. Fail closed when APFS crosses yellow gates during scale-out.

TierYellow gateAction
1TB gateway≥ 82% usedBlock KEDA scale-up; broadcast digest.
2TB gateway≥ 78% usedSame; allow stable lane only.
Red gate≥ 90% either tierScale to min replicas; drain canary VIP.

KEDA vs Flagger vs Argo Rollouts (what this guide adds)

Flagger and Rollouts shift traffic weights on a mostly fixed replica count. KEDA shifts capacity when queues or custom metrics demand it—ideal for burst builds on clustervps parallel Mac lanes.

  • KEDA (this article): ScaledObject triggers, queue coupling, scale caps during canary.
  • Flagger: Canary CRD + AnalysisRun webhooks—see our Flagger guide.
  • Argo Rollouts: AnalysisRun on Rollouts objects—not Argo CD—see Rollouts probes.
  • Flux: GitOps image automation—see Flux canary walkthrough.

Pick one upstream caller per gateway measurement URL. Never double-fire the same handler from KEDA scale events and a Rollouts AnalysisRun in the same minute.

Minimal reproducible rollout (seven steps)

  1. Install KEDA and confirm ScaledObject targets your OpenClaw worker Deployment—not the gateway DaemonSet.
  2. Wire the trigger to build-queue depth with cooldown and a canary-specific max replica cap.
  3. Expose /keda/metrics on a canary-tagged gateway Mac with bearer auth.
  4. Return merged JSON with latency, error rate, queue depth, disk_ok, and skill_hash.
  5. Lock skill packs and pause rsync until probes pass or abort.
  6. Enable failure broadcast to your notifier path; rehearse scale-down on degraded true.
  7. Validate by curling the endpoint from a bastion while KEDA holds replicas at the canary cap.

Citable guardrails

  • Measurement contract: one merged JSON schema versioned in Git per gateway fleet.
  • Scale freeze: no KEDA scale-up while degraded: true for two polling windows.
  • Promotion freeze: no delete-heavy rsync while canary_active is true.
Operational guidance only. KEDA, Kubernetes, and OpenClaw APIs evolve; validate ScaledObject triggers and webhook payloads against your installed versions before production.
Parallel Mac cluster on clustervps

Wire KEDA canaries on a multi-node OpenClaw fleet

Compare Flagger and Nomad build locks, then start from home or purchase to provision parallel Mac mini M4 gateways with SSH/VNC access.

Deploy parallel cluster nodes View cluster pricing