HF_HUB_ENABLE_HF_TRANSFER, HF_ENDPOINT, caches, Xet concurrent range GETs), decision matrix, resume notes, CI cache keys. Links: homepage, blog index, instant AI model pulling, tiered model cache HowTo, Git/npm CI cache strategy—no login.
Scenarios & bottleneck checklist
Classify time first: metadata RTT to huggingface.co, Xet shard reconstruction, or APFS noise from symlink-heavy caches on shared runners.
- Import-order env drift:
huggingface_hubreads env vars at import time; exportingHF_HUB_CACHEafterimport huggingface_hubsilently does nothing. - Wrong disk tier: pointing
HF_HUB_CACHEat a network home folder turns every chunk into random SMB latency; NVMe local paths win for Xet parallel writes. - Mirror without contract: setting
HF_ENDPOINTto a community mirror may violate compliance or break gated repos—obtain written allowance and testHF_TOKENscopes. - Legacy transfer flags: cargo-culting
HF_HUB_ENABLE_HF_TRANSFER=1after the Hub migrated to Xet wastes review cycles; align withHF_XET_HIGH_PERFORMANCEinstead when supported. - Cache poisoning: reusing one GitHub Actions cache key across branches that pin different
revisionhashes yields flaky “model works here, fails there” tickets.
Environment variable reference table
Export in the shell before import huggingface_hub (values are read at import). Defaults follow upstream docs.
| Variable | Role | Typical CI value / note |
|---|---|---|
HF_HUB_ENABLE_HF_TRANSFER |
Legacy fast path via hf_transfer; deprecated in favor of Xet-backed transfers in modern huggingface_hub. |
Leave unset unless you are pinned to an older stack; if your security team still mandates it, set 1 only after proving Xet is unavailable. |
HF_XET_HIGH_PERFORMANCE |
Raises CPU and network saturation for hf-xet; analogous intent to legacy high-throughput transfer mode. |
1 on dedicated M4 runners with spare CPU; keep unset on noisy neighbors to avoid starving Xcode compiles in the same pool. |
HF_XET_NUM_CONCURRENT_RANGE_GETS |
Concurrent byte-range fetches per Xet-backed file (default 16). | Try 8 on shared hosts; raise toward 16–24 only when nettop shows headroom and disk queue stays flat. |
HF_ENDPOINT |
Hub API base URL (default https://huggingface.co). |
Private Hub or approved mirror base, e.g. org-provided host; verify LFS and Xet both honor the override in your SDK version. |
HF_HOME |
Root for token, default hub cache parent, Xet chunk cache, assets. | /usr/local/ci/huggingface on fast APFS; avoids cluttering portable home directories. |
HF_HUB_CACHE |
Snapshot and blob store for models, datasets, spaces (default $HF_HOME/hub). |
$HF_HOME/hub explicitly; never SMB mount. |
HF_XET_CACHE |
Xet chunk storage (default $HF_HOME/xet). |
Co-locate with HF_HOME on NVMe; large multi-repo pools may set a separate volume with monitoring. |
HF_HUB_DISABLE_SYMLINKS |
Disable symlink tricks in cache (duplicates files). | 1 when cache path is on NAS or cross-OS shares; prefer local APFS instead when possible. |
HF_HUB_DOWNLOAD_TIMEOUT |
Per-download HTTP timeout seconds (default 10). | 120–300 for cross-border cold pulls; lower in preflight jobs that should fail fast. |
HF_HUB_ETAG_TIMEOUT |
Metadata / ETag probe timeout (default 10). | 30–60 when warm caches exist but metadata calls still traverse a slow path. |
HF_TOKEN |
User access token for gated models. | Inject via CI secret store; file permission 600 if written to disk; never log. |
Concurrent downloads: Xet uses HF_XET_NUM_CONCURRENT_RANGE_GETS; also cap snapshot_download(..., max_workers=4) on shared Macs until telemetry is flat.
# macOS remote agent — source before python -c "import transformers"
export HF_HOME="/usr/local/ci/huggingface"
export HF_HUB_CACHE="$HF_HOME/hub"
export HF_XET_CACHE="$HF_HOME/xet"
export HF_ENDPOINT="https://huggingface.co"
export HF_HUB_DOWNLOAD_TIMEOUT=180
export HF_HUB_ETAG_TIMEOUT=45
# export HF_XET_HIGH_PERFORMANCE=1
# export HF_XET_NUM_CONCURRENT_RANGE_GETS=12
Decision matrix: pick one column at a time
Change endpoint policy or concurrency per experiment—not both at once.
| Scenario | Endpoint / auth | Transfer mode | Concurrency starter | Cache directory |
|---|---|---|---|---|
| Public OSS weights, weak cross-border | Default HF_ENDPOINT or org-approved mirror; no token |
Default Xet; optional HF_XET_HIGH_PERFORMANCE=1 off-peak |
HF_XET_NUM_CONCURRENT_RANGE_GETS=8, max_workers=4 |
HF_HUB_CACHE on local NVMe |
| Gated commercial model | Default endpoint + HF_TOKEN from vault |
Avoid unapproved mirrors; keep defaults until stable | Conservative: range GETs 8, workers 2–4 | Dedicated HF_HOME per tenant if compliance requires isolation |
| Shared build pool + mixed jobs | Same as policy above | Skip HF_XET_HIGH_PERFORMANCE |
Defaults or slightly lower than defaults | Single shared HF_HUB_CACHE with LRU pruning job |
| Legacy pin (pre-Xet stack) | Mirror only if legal approves | HF_HUB_ENABLE_HF_TRANSFER=1 only when verified compatible |
Let hf_transfer internal chunking run; watch CPU | Local SSD; monitor partial files manually |
Resume behavior & CI cache key design
Resume: Clients reuse partial blobs under HF_HUB_CACHE when revision matches; wiping cache or swapping HF_ENDPOINT resets progress. Pin revision across retries.
CI cache (e.g. GitHub Actions): tarball $HF_HUB_CACHE after warm jobs; key parts:
- Lockfile or manifest hash—e.g. hash of a checked-in
models.lock.jsonlistingrepo_id@revisionpairs. - Runner OS slice—
macos-14-arm64or your pool id so ARM caches are not mixed with x86. - Hub endpoint fingerprint—short SHA of the exact
HF_ENDPOINTstring to avoid cross-mirror collisions.
Weaker fallback: hf-hub-${{ hashFiles('**/requirements.txt', '**/pyproject.toml') }} only if those files gate weights. See CI cache strategy for sizing.
FAQ: failure retries & timeout thresholds
When should I raise HF_HUB_DOWNLOAD_TIMEOUT versus HF_HUB_ETAG_TIMEOUT on a remote Mac runner?
Raise HF_HUB_DOWNLOAD_TIMEOUT first when large shard or LFS-style downloads stall mid-stream; values such as 120 to 300 seconds are common on congested cross-border paths. Raise HF_HUB_ETAG_TIMEOUT when metadata probes to huggingface.co time out but blobs are already local—start near 30 to 60 seconds so cold jobs still resolve revisions, while warm cache hits stay fast.
Is HF_HUB_ENABLE_HF_TRANSFER still the right switch in 2026?
Official huggingface_hub documentation marks HF_HUB_ENABLE_HF_TRANSFER as deprecated because Hub transfers increasingly use the hf-xet stack. Treat it as a legacy compatibility knob only; prefer HF_XET_HIGH_PERFORMANCE=1 plus tuned HF_XET_NUM_CONCURRENT_RANGE_GETS when hf-xet is installed, and fall back to default hub downloads if your org forbids saturating CPU or disk.
How do I design a CI cache key so branches do not poison each other's Hugging Face snapshots?
Never key only on runner OS: include a content hash of revision selectors such as a pinned commit SHA in your model card workflow, the digest of a manifest file listing repo@revision pairs, or the hash of poetry.lock or requirements.txt when those files gate which weights you fetch. Pair that with a stable HF_HUB_CACHE path on NVMe on the runner plus optional Actions cache restore keyed the same way.
What should I do when HF_HUB_CACHE lives on a network volume or mixed OS clients?
Set HF_HUB_DISABLE_SYMLINKS=1 to avoid broken symlink semantics across SMB or Linux to macOS mounts at the cost of duplicated large files. Better: keep HF_HUB_CACHE on local APFS NVMe for each remote Mac agent and export only metrics, not the cache directory, to observability stacks.
Summary
Put HF_HOME / HF_HUB_CACHE on NVMe; use HF_ENDPOINT only with approved mirrors; prefer Xet over legacy HF_HUB_ENABLE_HF_TRANSFER; raise HF_HUB_DOWNLOAD_TIMEOUT before blaming the network. Manifest-based cache keys prevent branch poisoning. More ops detail: tiered model cache, instant model pulling.
Rent remote Mac capacity for warm HF_HUB_CACHE and stable egress. Next: homepage, pricing, purchase, help, blog.
Treat Hugging Face pulls like any other CI artifact: pin revisions, measure metadata versus payload time separately, and spend parallelism budget where telemetry proves it helps.
Remote Mac for HF Hub & ML CI
Dedicated M-series hosts with fast local cache paths—useful when large weights and Xcode jobs share the same pool. Browse the homepage, purchase options, help articles, or blog without signing in.