Rolling OpenClaw 2026.4.x on clustervps needs one npm -g order, matching peer trees, Doctor on every node, a single merged readiness JSON, failure broadcasts, and explicit rollback bookmarks before canary traffic widens.

Read alongside canary ratios, gateway webhooks, and tenant Doctor merge; here we focus on install order before traffic moves. After binaries match, use the rsync artifact matrix so golden trees stay consistent across regions.

Why rolling upgrades still fail on small Mac clusters

Incidents cluster around split semver across gateways, probe sprawl that hides stale Doctor data, and silent peers that never saw the same npm -g tree. Without rollback bookmarks between npm, launchd, and load balancer steps—and without a shared failure signal—rolling upgrades look healthy while ratios drift.

The fix is boring operations: freeze evidence, upgrade deliberately, prove Doctor on every host, collapse probes, broadcast reds, and only then touch canary percentages using the companion traffic guides linked above.

What recent 2026.4.x notes imply—without the hype

2026.4.x changelogs mention clearer plugin peer handling and registry link fixes when private scopes reorder tarballs, plus refreshed optional image generation defaults (temp paths, GPU hints). Treat both as review items: reinstall on every node and benchmark disk latency on your clustervps hosts instead of assuming mirror parity.

Decision matrix: big-bang night versus rolling canary

Match strategy to how many gateways you can drain and how long tenants can tolerate mixed builds.

Dimension Big-bang window Rolling canary Operator note
Peer consistency Simple: everything moves together. Harder: enforce lockfile parity per hop. Rolling needs explicit stop rules.
Blast radius All gateways bounce at once. One node carries risk first. Pair with LB ratio discipline.
Doctor signal One maintenance banner for users. Requires merged probes to avoid false green. Never widen while probes disagree.

Whatever you pick, store LB weight JSON, openclaw.lock hashes, and composite probe checksums beside each rollback bookmark. Incident reviews go faster when the spreadsheet row—not Slack memory—is the source of truth.

npm -g order that keeps peer graphs honest

  1. Freeze: npm ls -g --depth=0, openclaw.lock, LB weights.
  2. Core then plugins: openclaw@2026.4.x first, then first-party plugin bundles; rerun npm ls -g --depth=1 and fail on duplicates.
  3. Restart: bounce launchd only after local Doctor passes (checklist below).
# Example only—pin the exact version your lockfile allows
sudo npm install -g openclaw@2026.4.3
sudo npm install -g @openclaw/plugin-bundle@2026.4.3
/usr/local/bin/openclaw doctor --json | /usr/bin/tee /tmp/doctor.post-upgrade.json

Rolling checklist with explicit rollback points

Rolling mode: one node at a time. After each step, confirm merged readiness matches archived good JSON.

  1. Drain: shift LB weight off the target. Rollback: restore weight snapshot.
  2. Upgrade: run the npm sequence. Rollback: reinstall prior semver from cache.
  3. Doctor + restart: openclaw doctor --json per tenant slice; restart gateway; Doctor again warm. Rollback: prior build; restore skill symlink if used.
  4. Probe merge: ship one readiness JSON (Doctor, queue, digest). Rollback: LB off if hash ≠ peers.
  5. Widen: nudge ratios only after peers acknowledge green. Rollback: weight + symlink together.
  6. Next node: repeat; avoid adjacent gateways on mismatched cores without a ratio.
Failure broadcast: on red merged readiness, post node, semver, and first bad Doctor block to chat plus a webhook so peers freeze ratio ramps.

Merge probes first, then trust traffic moves

Expose one readiness URL whose JSON merges Doctor, queue snapshots, and digest totals; if on-disk semver ≠ openclaw.lock, fail hard even when webhooks look fine.

#!/usr/bin/env bash
set -euo pipefail
/usr/local/bin/openclaw doctor --json >/tmp/doctor.json
/usr/bin/curl -fsS --max-time 3 http://127.0.0.1:9099/v1/queue-snapshot -o /tmp/queue.json
/usr/bin/curl -fsS --max-time 3 http://127.0.0.1:9099/v1/webhook-digest -o /tmp/digest.json
/usr/bin/python3 - <<'PY'
import hashlib, json, pathlib
parts = [pathlib.Path(p).read_bytes() for p in ("/tmp/doctor.json","/tmp/queue.json","/tmp/digest.json")]
print(json.dumps({"ready_probe_sha256": hashlib.sha256(b"".join(parts)).hexdigest()}))
PY

Hash mismatch versus peers should fire the same failure broadcast and optionally an lb freeze hook before automation widens ratios.

FAQ

Plugins before core? Only if release notes demand it; otherwise peer warnings may hide until runtime.

Image helpers after 2026.4.x? Expect different temp disk churn or GPU scheduling hints; watch APFS during the first canary hour and cross-check capacity with disk watermark guidance if batch jobs share the host.

Cluster size? Three clustervps Macs cover stable, canary, and witness roles.

Operational guidance only. Verify flags, mirrors, and LB APIs against your stack; read the patch notes you actually ship.
Runbooks need metal behind them

Stage OpenClaw upgrades on dedicated Mac mini M4 nodes

Before rehearsing 2026.4.x, browse the homepage, help, and purchase flows. More cluster context: canary skills, split DNS, rsync matrix, Mosh vs SSH.

Rent Mac cluster nodes Help center