On clustervps Mac gateways, duplicate launchd Labels, a green shallow doctor --json, and canary ports colliding with stable hide real drift. This complements ratio-first guides: doctor --deep cleanup, port slices, compact webhook failure summaries, and per-step rollback.

Read with canary ratios, rolling upgrades, gateway webhooks, fragment merges. Those articles optimize load-balancer math and semver slices; this one keeps macOS service inventory honest so merged readiness never lies after a plist copy-paste across AZs.

Three pain patterns Doctor --json alone will miss

  1. Duplicate Labels: two plists share a Label but different binaries—launchctl keeps whichever loaded first; probes stay green.
  2. Port overlap: canary and stable both bind 127.0.0.1:9099; merged readiness scrapes the wrong PID after hand-copied plists.
  3. Webhook noise: raw logs hide the short digest that should freeze LB ramps.

After template merges or golden-image clones, run openclaw doctor --deep --json and diff the inventory block across AZs. If any path differs from witness while shallow Doctor is still green, stop: you are one launchctl race away from promoting the wrong binary under canary traffic.

Decision matrix: shallow Doctor versus --deep on clustervps nodes

Pick evidence depth before editing launchd on rented Macs.

Signal doctor --json doctor --deep --json Operator note
Speed Seconds; safe in tight loops. Slower; walks plist-derived discovery. Run deep after plist edits or AMI-style clones.
Duplicate labels Often invisible. Surfaces conflicting units. Diff deep JSON witness vs canary before LB nudges.
Canary gate Daily smoke. Required after launchd edits. Pair with merged probes in the rolling-upgrade guide.

Archive shallow and deep JSON beside openclaw.lock and LB snapshots so rollback is a tarball plus weight restore—not a forensic thread. Keep one row per node in your promotion sheet linking file names to ticket ids.

Minimal reproducible checklist (each line ends with rollback)

Run on one canary host first; keep a witness for JSON diff.

  1. Freeze: tarball LaunchAgents mentioning OpenClaw plus LB weights JSON. Rollback: restore tarball only.
  2. Baseline: doctor --json and doctor --deep --json to /tmp. Rollback: abort before plist edits.
  3. Dedupe Labels: launchctl list, plutil -p each plist; one plist per Label. Rollback: bootout test unit, bootstrap archived plist.
  4. Port slice: stable 9099, canary high block (e.g. 19199+) via plist EnvironmentVariables; update host firewall and any security group notes for your region. Rollback: revert env keys, bootout/bootstrap prior plist.
  5. Reload + deep Doctor: bootout/bootstrap once; deep JSON must match witness paths and ports. Rollback: prior semver + plist bundle.
  6. Digest broadcast: when merged readiness + digest crosses threshold, POST one-line summary (see tenant webhook merge). Rollback: disable notifier; restore digest window.
  7. Widen: only if probe SHA256 matches peers per canary guide. Rollback: LB snapshot from step one.
/usr/bin/find ~/Library/LaunchAgents /Library/LaunchAgents -name '*openclaw*' -maxdepth 1 -print
/usr/local/bin/openclaw doctor --deep --json | /usr/bin/tee "/tmp/$(/bin/hostname -s).doctor.deep.json"

Canary port slices versus stable listeners

Keep stable gateways on loopback 9099 so existing curl checks stay boring. Bind canary admin, queue, and digest helper endpoints on a documented high block so merged readiness and webhook summaries always scrape the canary PID—even when operators duplicate plists during a rushed incident. Log the slice beside fragment merge rows so automation and humans share one truth table.

Broadcast: hostname, AZ, digest failure class count, merged SHA256—freeze ratios until deep JSON matches witness.

Keep merged probes; add digest context after deep Doctor passes

Reuse the rolling-upgrade hash pattern; point curls at the canary loopback port from your slice.

#!/usr/bin/env bash
set -euo pipefail
CANARY_PORT="${CANARY_PORT:-19199}"
/usr/local/bin/openclaw doctor --deep --json >/tmp/doctor.deep.json
/usr/bin/curl -fsS --max-time 3 "http://127.0.0.1:${CANARY_PORT}/v1/queue-snapshot" -o /tmp/queue.json
/usr/bin/curl -fsS --max-time 3 "http://127.0.0.1:${CANARY_PORT}/v1/webhook-digest" -o /tmp/digest.json
/usr/bin/python3 - <<'PY'
import hashlib, json, pathlib
parts = [pathlib.Path(p).read_bytes() for p in ("/tmp/doctor.deep.json","/tmp/queue.json","/tmp/digest.json")]
print(json.dumps({"ready_probe_sha256": hashlib.sha256(b"".join(parts)).hexdigest(),"canary_port": __import__("os").environ.get("CANARY_PORT","19199")}))
PY

If SHA256 diverges while digest totals climb, broadcast before widening LB—same freeze as gateway webhook drills.

Rollback ladder

  1. LB: zero canary weight.
  2. Plists: restore tarball; bootout/bootstrap.
  3. Binary: reinstall witness-good semver.
  4. Proof: shallow + deep Doctor match witness before ratios return.
1
Active LaunchAgent plist per OpenClaw Label after cleanup—never two.
10
Example high-port block width per AZ beside stable 9099.
5m
Digest window before summarizing webhook failures.

FAQ

bootout vs delete? bootout unloads; keep plist tarballs for restore.

Skip --deep? Risky after any hand-edited plist or cloned image.

Batch jobs on host? See Nomad build locks.

Operational guidance only. Validate paths and flags against your pinned OpenClaw in openclaw.lock.
Metal-backed gateways

Rent Mac mini M4 nodes to rehearse Doctor --deep safely

Pricing and purchase stay public without a login wall; help covers SSH. Related: canary, rolling, webhooks.

Start renting (no login wall) View pricing