1. WAN contention. Uncapped uploads steal bandwidth from git fetch and registry pulls, stretching every lane in the cluster.
2. Hidden APFS pressure. Pack writers and snapshot metadata can shrink free space faster than df alerts because temporary files linger until upload completes.
3. Lock collisions. Nightly timers that ignore Nomad or flock lanes replay the same failure modes described for artifact promotion—only now your backup job holds the disk lock.
Backup topology for parallel regions
Each Mac keeps Derived Data, Simulator roots, and checkouts on one APFS volume. Reuse the fan-out discipline from the artifact matrix: give restic its own subdirectory per host or AZ, or give rclone distinct prefixes, so uploads never share one repository lock.
Place the primary repository nearest the bucket region and stream incremental packs only in the quiet window; record owner metadata in your CMDB when you add another Mac Mini M4 node.
Rate limits, concurrency, and chunk sizing
Cap uplinks before backups compete with SSH. Chart a few nights of vm_stat and interface counters, then tune the launchd snippets below.
| Goal | Example restic invocation | Notes |
|---|---|---|
| Cap upload throughput | restic backup --limit-upload 5120 /Volumes/ci |
KiB/s units; add --limit-download for restores. |
| Reduce concurrent packers | restic backup --read-concurrency 2 --pack-size 8 |
Smaller packs tame RAM on 16 GB hosts. |
| Skip noisy paths | restic backup --exclude='*.pcm' --exclude='CoreSimulator/**' |
Document excludes beside Simulator policy. |
| Prune after forget | restic forget --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --prune |
Run heavy prune on a maintenance SKU, not peak CI. |
| Goal | Example rclone invocation | Notes |
|---|---|---|
| Throttle WAN copy | rclone copy --bwlimit 8M --transfers 4 --checkers 8 ~/Artifacts s3:team-artifacts/prod |
Use copy for additive trees; avoid destructive sync on shared caches. |
| Chunk large files | rclone copy --s3-chunk-size 64M --s3-upload-concurrency 2 |
Watch APFS snapshots if Time Machine shares the disk. |
| Verify integrity | rclone copy --checksum --immutable |
Immutable mode blocks silent overwrites—ideal for tarball promotions. |
Repository retention versus rsnapshot rotations
rsnapshot chains hard links on one tree—fast locally, awkward when many Macs rotate the same mount. restic writes content-addressed packs to object storage with dedupe across snapshots.
| Dimension | rsnapshot style tree | restic repository |
|---|---|---|
| Locking | Single writer assumed; manual flock discipline | Per-repo lock file; serialize forget/prune windows |
| Retention math | Rotation counts tied to local inode budget | forget policies expressed as daily weekly monthly arcs |
| Off-site readiness | Requires rsync of entire snapshot tree | Native S3 or MinIO backend with TLS |
Alert when forget lags pruning packs. If compliance still needs flat mirrors, run rclone only after restic finishes.
Stagger backups away from CI build locks
Build locks from Nomad-style lanes often peak late evening. Offset launchd backups by ninety minutes after the merge queue drains, or pick the AZ that is already quiet.
- Inventory cron and Timer windows. Export every plist or systemd unit that touches disk-heavy work.
- Map lock holders. Record which lane owns DerivedData promotion so backups never run mid-rsync.
- Simulate overlap once. Deliberately collide backup and CI on a canary host to capture latency numbers.
- Encode offsets per region. UTC-based schedules rarely match human quiet hours across Tokyo and Virginia pairs.
- Automate pager hooks. If backup duration exceeds twice the rolling median, page infra before disks hit red.
- Document rollback. Keep a one-command pause file so on-call can stop restic without orphaning half-written packs.
1TB / 2TB APFS acceptance checklist
Reuse the yellow-red framing from Pulumi disk notes: pack writers hide large temp files until upload completes.
| Check | 1TB tier gate | 2TB tier gate |
|---|---|---|
| Free space after backup peak | ≥ 90 GiB sustained | ≥ 180 GiB sustained |
| Snapshot or pack staging headroom | Dedicated 40 GiB scratch volume | Dedicated 80 GiB scratch volume |
| Inode pressure from tiny pack files | df -i under seventy percent | df -i under sixty percent |
- Citable guardrail: keep at least fifteen percent raw APFS free space before any restic prune window.
- Citable guardrail: cap simultaneous restic processes at one per Mac unless the host has thirty-two gigabytes unified memory.
- Citable guardrail: log upload megabytes per minute per AZ so finance can compare egress invoices against CI throughput charts.
FAQ
Same plist for restic and rclone? Chain on success only; never parallel both against one tree without excludes.
Forget during CI? Prune spikes IOPS—treat it like a lock and run Sunday afternoon.
Per-customer remotes? Yes—prefix isolation keeps deletion scopes tight.