feat(capsule): auto-prune to bound disk usage of capsules cache#10382
feat(capsule): auto-prune to bound disk usage of capsules cache#10382davidfirst wants to merge 15 commits into
Conversation
Adds origin markers, fast delete, prune command, and a once-per-24h auto-trigger so the global capsules cache stops growing unbounded. Workspace caps are deleted on each prune; aspect-version and scope caps are evicted by last-used age (default 30d). New configs: capsules_max_size_gb (10), capsules_max_age_days (30), capsules_auto_prune (true).
There was a problem hiding this comment.
Pull request overview
Adds bounded-disk-usage management to the global capsules cache by tracking per-capsule origin metadata, fast-deleting via a .trash rename + detached rm -rf, and introducing a bit capsule prune command plus an automatic ~24h trigger gated by a stamp file. Also enriches bit capsule list with cache size/orphan/stale-aspect stats and adds three new user config keys.
Changes:
- New
CapsulePruneCmdand origin-marker bookkeeping inIsolatorMain(kind, originPath, last-used mtime), plus prune/eviction/size-target logic. - Fast-delete pipeline: move capsule dir into sibling
.trash/<uuid>/and detachrm -rf. - Auto-prune
onBeforeExithook gated by.last-capsule-prunestamp, with three new config keys.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scopes/workspace/workspace/workspace.main.runtime.ts | Wires CapsulePruneCmd into the capsule command group. |
| scopes/workspace/workspace/capsule.cmd.ts | Adds prune subcommand, formatBytes, and richer list output (size/orphans/stale). |
| scopes/component/isolator/isolator.main.runtime.ts | Core: origin markers, fast-delete via trash, prune/size-target/LRU logic, auto-prune hook. |
| scopes/component/isolator/index.ts | Re-exports new types and constants from the isolator runtime. |
| components/legacy/constants/constants.ts | Adds capsules_max_size_gb, capsules_max_age_days, capsules_auto_prune config keys. |
Comments suppressed due to low confidence (1)
scopes/component/isolator/isolator.main.runtime.ts:1252
discrepancy_with_pr_description: The PR description states that auto-prune is "spawned detached so it never blocks foreground", butmaybeAutoPrunerunspruneCapsulesin-process and awaits it inside theonBeforeExithook. Either the implementation needs to actually detach (e.g., spawn a child process), or the PR description should be corrected.
const report = await this.pruneCapsules({
olderThanDays,
sizeTargetGb,
includeOrphans: true,
});
this.logger.debug(
`[auto-prune] removed ${report.removed.length} capsule(s), freed ${report.totalRemovedBytes} bytes`
);
- read --no-orphans as opts.noOrphans (bit CLI doesn't apply commander negation); previously the flag was a no-op. - replace `rm -rf` spawn with portable `node -e fs.rmSync` so the trash sweep also works on Windows. - detach auto-prune via a spawned `bit capsule prune` child so the slow size walk doesn't delay every bit command's exit once per day.
The dated-capsules dir (`<root>/dated-capsules/<YYYY-M-D>/<uuid>/`) holds in-flight isolation runs. Its parent mtime is bumped on every new isolation, so it never aged out under the standard rule. Walk one level deep and prune individual date subdirs older than `--older-than`. Honors the `capsules_scopes_aspects_dated_dir` config.
Dated capsules are recreated on every isolation, so anything that isn't today's YYYY-M-D subdir is leftover from a previous run and safe to delete regardless of age. Today's subdir is preserved to avoid racing a concurrent bit process. Drops the unused age cutoff from this path.
- deriveCapsuleKind: detect bare-scope host via duck-type check on bitMap so non-aspect scope isolations get kind=scope (warm-cache) instead of workspace (deleted unconditionally). - pruneCapsules: project totalSizeAfter from removed bytes in dry-run so the report no longer reads "46GB → 46GB (freed 12GB)". - scheduleFastDelete: when dir is the global capsules root itself, skip the rename-to-trash dance (would put .trash inside the doomed dir) and just remove directly. - CapsuleListCmd: walking size/orphan/stale stats is now gated behind `--with-stats`. Default `bit capsule list` is back to near-instant. - CapsuleListCmd: the stale-aspect cutoff now reads CFG_CAPSULES_MAX_AGE_DAYS instead of hard-coding 30.
- registerAutoPruneHook JSDoc: clarify that the prune now runs out-of- process via a detached child, not in-process. - maybeAutoPrune: accept both string `'false'` and boolean false for capsules_auto_prune so JSON-config users can disable it too. - pruneDatedCapsulesChildren: drop the misleading getMonth() < 12 branch (always true) and use the same getMonth()+1 expression directly. - CapsulePruneCmd: read CFG_CAPSULES_MAX_AGE_DAYS and CFG_CAPSULES_MAX_SIZE_GB as fallbacks for the CLI flags so manual prune and auto-prune use the same effective thresholds when the user has overridden the defaults.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (3)
scopes/workspace/workspace/capsule.cmd.ts:345
- Same numeric-coercion issue for
sizeTargetGb:Number(opts.sizeTarget)/Number(sizeTargetFromConfig)can yieldNaN, which then breaks the size-target logic inpruneCapsules(no evictions will happen because the computed threshold becomesNaN). Validate that the parsed value is finite (and > 0) before passing it through.
sizeTargetGb:
opts.sizeTarget !== undefined
? Number(opts.sizeTarget)
: sizeTargetFromConfig !== undefined
? Number(sizeTargetFromConfig)
: undefined,
dryRun: opts.dryRun === true,
scopes/component/isolator/isolator.main.runtime.ts:1507
pruneCapsules()computestotalSizeBeforeviagetCapsulesTotalSize()(which callslistAllCapsuleRoots()and walks sizes), and then immediately callslistAllCapsuleRoots()again. This doubles the most expensive part of pruning. Consider callinglistAllCapsuleRoots({ withSizes: true })once, derivingtotalSizeBeforefrom the returnedroots, and reusing that array for the prune loop.
const datedDirName = this.configStore.getConfig(CFG_CAPSULES_SCOPES_ASPECTS_DATED_DIR) || 'dated-capsules';
const totalSizeBefore = await this.getCapsulesTotalSize();
const roots = await this.listAllCapsuleRoots();
const removed: PruneCapsulesReport['removed'] = [];
scopes/component/isolator/isolator.main.runtime.ts:1662
applySizeTarget()callslistAllCapsuleRoots()without{ withSizes: false }, so it will (by default) compute full directory sizes for every top-level cache entry again, even though this method only needs the root classification/paths and then computes sizes for each aspect child explicitly. Passing{ withSizes: false }here would avoid an unnecessary full-cache walk.
const targetBytes = sizeTargetGb * 1024 * 1024 * 1024;
const removedPaths = new Set(removed.map((r) => r.path));
// Re-walk what's left to find the oldest aspect-version children.
const roots = await this.listAllCapsuleRoots();
const aspectChildren: Array<{ path: string; lastUsedMs: number; sizeBytes: number }> = [];
- toFiniteNumber helper: guard against empty/non-numeric config values (which would otherwise become NaN and silently disable age/size enforcement). Used in both maybeAutoPrune and CapsulePruneCmd. - listAllCapsuleRoots: switch unbounded Promise.all to bounded pMap so large caches (hundreds of subdirs with many files each) don't risk EMFILE from concurrent recursive size walks. - computeDirSize: same bounded-concurrency switch inside the recursive walk itself. E2E coverage for `bit capsule prune` is a worthwhile follow-up but not included in this PR.
- CapsuleListCmd --with-stats: use toFiniteNumber + Math.max(0,...) for capsules_max_age_days so a corrupt config value can't yield NaN/0d in the printed cutoff label. - bit capsule list -j: omit totalSizeBytes/allRoots unless --with-stats is set, instead of returning misleading zeros. - pruneCapsules: walk the cache once. listAllCapsuleRoots already reports sizeBytes, so derive totalSizeBefore from that rather than calling getCapsulesTotalSize (which would re-walk). - readOriginMarker: validate kind against the known set. Markers with an unknown kind (corrupt or from a future version) now fall through to the unmarked path instead of being silently skipped by prune.
bit capsule prune was blocking 5+ minutes on multi-GB caches because
pruneCapsules called listAllCapsuleRoots({ withSizes: true }), which
recursively lstats every file before doing any deletion. The deletes
themselves are O(1) renames (scheduleFastDelete), so the size walk was
the only slow part.
New behavior:
- pruneCapsules takes a withSizes flag (default false). When false, all
sizeBytes are reported as 0 and the cache walk skips computeDirSize.
- --size-target still forces sizes on (mandatory for LRU enforcement).
- CapsulePruneCmd exposes a --with-sizes flag for users who want byte
accounting in the report.
- The report omits the byte summary line when sizes weren't computed
and hints how to get it.
Auto-prune is unaffected (it sets --size-target, which still walks).
| const ONE_DAY_MS = 24 * 60 * 60 * 1000; | ||
| try { | ||
| const stat = await fs.stat(stampPath); | ||
| if (Date.now() - stat.mtime.getTime() < ONE_DAY_MS) return; | ||
| } catch { | ||
| // missing — first run, fall through and prune | ||
| } | ||
| // Write the stamp first to claim the slot — even if the spawn fails, we don't retry | ||
| // within 24h. The detached child sees this recent stamp on its own exit and skips its | ||
| // own auto-prune, so no recursion. | ||
| await fs.outputFile(stampPath, ''); | ||
|
|
||
| // Guard against non-numeric or empty config values: a stray string would otherwise | ||
| // become NaN and silently disable age/size enforcement. | ||
| const maxAgeRaw = this.configStore.getConfig(CFG_CAPSULES_MAX_AGE_DAYS); | ||
| const olderThanDays = toFiniteNumber(maxAgeRaw) ?? 30; | ||
| const maxSizeRaw = this.configStore.getConfig(CFG_CAPSULES_MAX_SIZE_GB); | ||
| const sizeTargetGb = toFiniteNumber(maxSizeRaw) ?? 10; | ||
|
|
||
| this.logger.debug( | ||
| `[auto-prune] spawning detached child. olderThanDays=${olderThanDays}, sizeTargetGb=${sizeTargetGb}` | ||
| ); | ||
| this.spawnDetachedAutoPrune(olderThanDays, sizeTargetGb); |
| async pruneCapsules(opts: PruneCapsulesOptions = {}): Promise<PruneCapsulesReport> { | ||
| const olderThanDays = opts.olderThanDays ?? 30; | ||
| const includeOrphans = opts.includeOrphans !== false; | ||
| const keepWorkspaceCaps = opts.keepWorkspaceCaps === true; | ||
| const dryRun = opts.dryRun === true; | ||
| // Size accounting requires an expensive recursive lstat across the whole cache. Skip | ||
| // it by default so the foreground command returns in ms (deletes are O(1) renames); | ||
| // force on for size-target enforcement and when the caller asks for byte accounting. | ||
| const computeSizes = opts.withSizes === true || opts.sizeTargetGb !== undefined; | ||
| const ageCutoffMs = Date.now() - olderThanDays * 24 * 60 * 60 * 1000; | ||
| const datedDirName = this.configStore.getConfig(CFG_CAPSULES_SCOPES_ASPECTS_DATED_DIR) || 'dated-capsules'; |
| for (const entry of entries) { | ||
| if (!entry.isDirectory() || entry.name === 'node_modules' || entry.name.startsWith('.')) continue; | ||
| const childPath = path.join(rootPath, entry.name); | ||
| const markerPath = path.join(childPath, '.bit-capsule-origin.json'); | ||
| try { | ||
| const stat = await fs.stat(markerPath); |
Two bugs were combining to thrash disk: 1. orphan-check wrongly flagged every scope-aspect subdir. Their marker stores `originPath` = the *logical* scope-aspects path (e.g. <scope.path>-aspects) used only to hash a capsule dir name; it does not have to exist as a real directory. The old check treated any non-existent originPath as orphan and deleted the capsule. With per-aspect-version subdirs that meant we deleted every current aspect cap on every prune. Now: scope-aspect children are pruned purely by marker mtime (touched on every aspect load). 2. sweepTrashAsync ran unconditionally on every isolator construction, so each `bit` invocation (server-forever, e2e workers, compile, etc.) spawned its own detached `rm -rf .trash`. We observed 1,409 concurrent sweep processes saturating disk I/O, blocking foreground bit commands. Now: a PID-stamped lock file ensures at most one sweep runs across all bit processes, and the sweep is skipped entirely when `.trash` is empty. The child clears the lock on exit; stale locks (dead PID) are reclaimed.
The global capsules cache (
~/Library/Caches/Bit/capsules) grows unbounded — measured 46 GB across 490 subdirs on a real machine, oldest 4+ weeks old. There was zero automatic eviction. Each hashed subdir is tied to a host path, so workspaces that move/disappear leave orphans forever, and aspect-version subdirs pile up (3 versions of one env at ~450 MB each is typical).Capsule kinds — handled differently
Every top-level subdir under the capsules root falls into one of four kinds, and each is treated differently by the prune logic:
bit build). Cheap to regenerate from a live workspace, so these are deleted unconditionally on every prune run.--older-than(default 30d) or the origin path no longer exists.teambit.node_envs_node-babel-mocha@0.2.6,@0.2.7,@0.2.8). The root itself is never deleted; instead each per-aspect-version child is checked individually — children whose last-used marker mtime is older than the threshold (or whose origin is gone) are evicted, current versions stay warm. This avoids cold pnpm-install on every aspect load while still bounding the unbounded version accumulation.<root>/dated-capsules/<YYYY-M-D>/<uuid>/holds in-flight isolation runs (used whenuse_dated_capsulesis enabled). Dated capsules are recreated on every isolation, so anything that isn't today's date subdir is leftover from a previous run and gets deleted regardless of age. Today's subdir is preserved to avoid racing a concurrent bit process. Honors thecapsules_scopes_aspects_dated_dirconfig name.Mechanism
.bit-capsule-origin.jsonper capsule dir (kind+originPath+createdAt). Mtime is touched on every reuse so it acts as a reliable "last used" signal independent of filesystematime(which is often disabled)..trash/<uuid>/(O(1) on APFS/ext4) + detached portable Node subprocess runningfs.rmSync.bit capsule deletenow returns instantly even for multi-GB dirs.bit capsule prunewith--older-than,--keep-workspace-caps,--no-orphans,--size-target,--dry-run,--json. Legacy unmarked dirs sniffed by aspect-version pattern to avoid nuking a pre-existing aspect root.onBeforeExit, gated by a stamp file, spawns a detachedbit capsule prunechild so the parent's exit is never delayed by the size walk.bit capsule listnow reports total cache size, orphan count, and stale-aspect-version count.capsules_max_size_gb(default 10),capsules_max_age_days(default 30),capsules_auto_prune(default true).