Platform teams running OpenClaw across clustervps multi-node clusters win when templates merge the same way on every Mac, workflows stay isolated during promotions, and canary traffic is a dial you can turn back. The smallest reproducible setup is four directories per tenant, one merge contract, and one readiness surface that already understands Doctor plus synthetic gateway checks.

This HowTo assumes two or more gateway-class nodes behind a load balancer plus a build or staging Mac that owns git. You will not need bespoke orchestration beyond rsync or your CM tool, numbered YAML fragments, and saved load-balancer weight bookmarks. Read it alongside the multi-AZ canary and skill-pack guide for ratio-first promotions, the tenant split Doctor and webhook digest playbook for directory discipline, and the multi-AZ gateway and webhook notes when traffic crosses regions. Together they form the spine for fragment merges without surprise downtime.

Per-tenant configuration directories

Mirror the same tree on every clustervps host before you enable launchd: /etc/openclaw/tenants/<tenant>/ with gateway.d, workflows.d, and doctor.d as drop-in directories. Gateway fragments describe listeners, upstream pools, and per-tenant rate limits; workflow fragments only reference logical queue names and retention policies, while the heavy SQLite state lives under /var/db/openclaw/workflows/<tenant>/ on each node and is never rsynced during a template promotion. Doctor fragments capture invariants your on-call expects to hold after every merge.

  • Prefix order: Name files 10-base.yaml, 20-canary.yaml so two platform squads cannot fight over merge precedence.
  • Parity gate: Run openclaw config hash --tenant <id> on all gateways; refuse promotion when digests diverge.
  • Instant rollback: Keep tenants/<tenant>.prev as a symlink to the last known-good tree so operators recover with a single atomic flip.

Secrets stay out of git: mount /var/db/openclaw/secrets/<tenant> with POSIX ACLs identical on each node so fragments remain diffable in code review. When you add another dedicated gateway, size it from the public pricing page before opening a finance thread.

Template merge strategy

Treat merges as a three-layer cake: a shared 00-platform.yaml every tenant inherits, numbered tenant overlays, and an optional 99-canary.yaml that exists only on the canary node’s disk image. The daemon walks directories in lexical order; later files win on scalar keys but must extend, not redefine, composite blocks so openclaw config lint --tenant can prove the graph is still acyclic. Commit the merged preview from your build Mac’s dry-run output next to the ticket so reviewers see the effective config without SSH access.

2026 gateway hot-reload boundaries. OpenClaw 2026 builds can hot-apply upstream weight changes, additional fragment includes, and notifier fan-out budgets without dropping established TLS sessions. Bind addresses, client-authenticated listener ports, mTLS trust bundles, and workflow root paths still require a drained restart: set MAINTENANCE=drain in your gateway fragment, wait until in-flight jobs finish, bounce launchd, then clear the flag. Never rely on reload for secret rotation—swap files atomically and reread environment files on a clean process boundary.

Document which keys are reload-class versus restart-class in your platform README so junior engineers do not ship a Friday listener move expecting SIGHUP magic. Pair every merge with openclaw.lock parity across nodes; mixed semver is how fragment schemas silently fork.

Canary node selection and rollback

Pick exactly one gateway Mac per promotion window—usually the node that already carries 99-canary.yaml—and shift roughly five percent of load-balancer weight to it while stable peers keep the previous template symlink. Watch error rate, p95 queue latency, and merged readiness JSON for at least two probe intervals before doubling the slice. Isolation matters: the canary must dequeue only from its own workflow subdirectory so a bad merge cannot poison production SQLite files on siblings.

Rollback is operational hygiene, not shame: store the previous balancer weights in your ticket tool, export them to JSON before each change, and rehearse one-click restore monthly. If merged health flips red, move weight to zero on the canary, flip tenants/<tenant> back to .prev, and restart only that host if the merge touched restart-class keys. Peers should never need a coordinated restart for a failed canary unless you promoted a shared lockfile bump too early.

Merged Doctor and health checks

Expose a single readiness URL per gateway that returns JSON fusing openclaw doctor --json, synthetic loopback calls through the gateway’s own listener, and lightweight disk or APFS pressure checks. Load balancers and humans curl the same surface; no shadow endpoints that disagree during incidents. When Doctor is yellow but synthetic traffic succeeds, emit HTTP 200 with "status":"degraded" so traffic keeps flowing while paging fires.

#!/usr/bin/env bash
set -euo pipefail
TENANT="${TENANT:-demo}"
/usr/local/bin/openclaw doctor --tenant "$TENANT" --json >/tmp/oc-doctor.json
/usr/bin/curl -fsS --max-time 2 "https://127.0.0.1:8443/healthz" -o /tmp/oc-gateway.json
/usr/bin/python3 - <<'PY'
import json
d=json.load(open("/tmp/oc-doctor.json")); g=json.load(open("/tmp/oc-gateway.json"))
print(json.dumps({"tenant":"demo","doctor_ok":d.get("ok"),"gateway":g.get("status")}))
PY

Schedule the script from launchd every minute on each node and let your edge proxy scrape file output or wrap it with a tiny HTTP helper. Extend the JSON with webhook digest fields when you adopt the notifier pattern from the linked tenant guide.

Minimal reproducible checklist

  1. Bootstrap parity. Install identical macOS builds, lay per-tenant directories, and verify openclaw version --json matches openclaw.lock on every clustervps node.
  2. Sync templates only. Rsync numbered fragments from git; exclude workflow SQLite paths and secret mounts.
  3. Lint and hash. Run openclaw config lint --tenant <id> then openclaw config hash --tenant <id> cluster-wide.
  4. Shift canary weight. Move five percent of traffic, hold, then promote or restore saved balancer JSON.
  5. Prove merged health. Fail a synthetic path on purpose in staging and confirm readiness JSON surfaces the regression once.
  6. Log the drill. Append ticket id, actor, and checksum to your promotion audit log so the next on-call inherits context.

When the checklist is green, you have a clustervps multi-node pattern you can rehearse monthly without customer impact. If your team is outgrowing hand-rolled gateways, add capacity from the public purchase flow—same specs your engineers already validated in staging—and keep help center articles open beside the console for onboarding. That pairing turns this HowTo into a repeatable platform program instead of a one-off hero script.

Operational guidance only. OpenClaw deployment details vary by release; validate flags against your installed build. Network timings depend on ISP paths and are engineering targets, not contractual guarantees.
No login required to explore

Scale Mac gateways the same way you scale ideas

Review plans, skim help articles, or return to the homepage—every link opens publicly so your team can align before anyone signs into the console.

Browse Mac plans Open help center