Pain points before the first Gluster mount
Who should read: platform and storage owners running Xcode or CI plus cross-AZ artifact mirrors. Bottom line: replica factor three needs one brick per AZ, rsync must use bounded backoff, and build locks must cover the longest sync chain. Compare object paths in the JuiceFS matrix, MinIO matrix, and OpenClaw canary probes; map regions on the home page.
Parallel clusters rarely fail on CPU. They fail when brick layout, rsync, and heal share one NVMe without a throttle contract.
- Brick drift: mismatched APFS paths per AZ break replica quorum during promotion.
- rsync without backoff: artifact storms fill one brick while self-heal adds sequential IO.
- Missing build locks: parallel lanes write the same promotion tree—split-brain risk rises.
Replica volumes and brick planning
Name the production volume artifacts-rep3 as replica 3. Give each brick a dedicated APFS data disk—never share the system volume. Observer nodes run glusterd and metrics only. Cap client cache with performance.cache-size so Xcode indexing and self-heal do not contend on the same NVMe during compile peaks.
| Layout | Brick pattern | Best for |
|---|---|---|
| replica 3 | 3 AZ × 1 brick each | Production parallel CI and shared DerivedData |
| replica 2 | 2 Mac nodes | Lab clusters; cannot survive dual-host loss |
| arbiter | 2 data + 1 witness brick | Tight disks; more complex write paths |
Artifact rsync backoff matrix
Object blobs may live on MinIO or JuiceFS; directory artifacts still rsync into the Gluster mount. Backoff means lower bwlimit, fewer concurrent jobs, and longer retry spacing—not a full stop.
| Scenario | Starter parameters | Backoff action |
|---|---|---|
| Daytime delta | --bwlimit=35000 | Yellow disk gate → 22000; red gate → pause delete phases |
| Initial seed | --partial-dir=/mnt/scratch/partial | Stagger thirty minutes away from heal windows |
| CI peak | Concurrency 1; --timeout=600 | Three failures → cap retry at 900 seconds |
ionice -c2 -n4 rsync -az --delete-delay --bwlimit=35000 --timeout=600 \
--partial-dir=/mnt/scratch/partial "${GOLDEN}:/artifacts/" /mnt/gluster/artifacts/
Multi-project build locks
Use flock on /var/locks/build-${tenant} with a 1200 second TTL. While the lock is held, block a second rsync into the same Gluster subtree. Pair locks with the artifact rsync matrix so delete phases never overlap promotions. Freeze canary weight increases when any brick crosses the red watermark—mirror probes in the OpenClaw Flagger canary guide.
Rollout steps (seven)
- Inventory volumes, bricks, mount points, and rsync source DAGs per tenant.
- Create replica 3 and verify
gluster volume infoshows three healthy bricks. - Write boot mounts plus client cache limits so Xcode indexing does not starve heal.
- Apply the rsync matrix with a dedicated partial disk per node.
- Deploy build locks and disk watermark cron on every brick host.
- Wire OpenClaw probes that merge heal queue depth with APFS utilization.
- Rehearse failover: offline one brick, validate read-only CI behavior, then heal after quorum freeze.
1TB and 2TB expansion and disk watermark acceptance
Expand by adding bricks and add-brick—do not rely on swapping a single disk in place. Acceptance must include inode counts, snapshots, and partial-dir size—not capacity percent alone.
| SKU | Yellow gate | Red gate | Action |
|---|---|---|---|
| 1TB | ~78% | ~88% | Cut rsync; schedule heal off peak |
| 2TB | ~72% | ~84% | Drain partial-dir; pause non-critical volumes |
Failover FAQ
GlusterFS vs MinIO vs JuiceFS? Large immutable objects stay on MinIO; metadata-heavy S3 caches use JuiceFS; Gluster holds POSIX workspaces and incremental artifact trees. Never mount two primaries on the same prodiv path.
May rsync continue with one brick offline? Replica 3 can accept writes but cut bwlimit immediately. Replica 2 should go read-only and fail CI until quorum returns.
Heal or rsync first after a red gate? Pause rsync, drain partial and build scratch, wait for yellow, then run gluster volume heal so IO never double-peaks.
Citable guardrails
- rsync baseline: start at 35000 KB/s; drop to 22000 at the yellow gate.
- Build lock TTL: 1200 seconds per tenant on shared prodiv paths.
- Disk contract: 1TB yellow 78% / red 88%; 2TB yellow 72% / red 84%.
Land GlusterFS topology on clustervps parallel lanes
Cross-region Mac Mini M4 clusters need multiple bare-metal nodes to host bricks and CI. Open the cluster purchase page for multi-node packages, compare plans on pricing, and validate disks with the dual-threshold tables above.