Skip to content

local-mount: speed up init for large projects #102

@willwashburn

Description

@willwashburn

Init is slow on large projects because every file is byte-copied serially. Recent fix 868874d (excluding .npm-cache) reduced scope but didn't address the per-file cost. Below are four orthogonal improvements; each can ship independently.

Source: packages/local-mount/src/mount.ts


1. Hardlink read-only files instead of copying

Where: copyMountedFile (mount.ts:310) — currently copyFileSync + chmodSync(safeMountPath, 0o444) for readonly matches.

Read-only files are chmod 0o444 and never written from inside the mount, so a hardlink (fs.linkSync) is semantically equivalent and is a pure metadata op — collapses the dominant cost on repos where most files are gitignored-but-readonly (lockfiles, vendored deps, etc.).

  • Use linkSync(source, target) when isPathMatched(relativePath, readonlyMatcher) is true.
  • Fall back to copyFileSync if link fails with EXDEV (cross-volume) or EPERM.
  • The existing 0o444 chmod is unnecessary for hardlinks (mode is shared with the source) — and arguably wrong since chmodding a hardlink mutates the source file's mode. Verify and drop the chmod on the readonly path.

2. Parallelize the walk and copy

Where: walkProjectTree (mount.ts:225) — fully synchronous recursion via readdirSync + copyFileSync.

Switch to async fs.promises.copyFile / readdir driven by a bounded concurrency queue (~os.availableParallelism() * 4). On a single thread the kernel is mostly idle waiting on syscalls; bounded async lets multiple copies overlap.

3. Drop redundant per-file syscalls

Where: copyMountedFile (mount.ts:310) and resolveSafeCopyTarget (mount.ts:505).

For every file, the current code does roughly:

  1. resolveSafeCopyTargetensureDirectoryWithinRootmkdirSync(parent, recursive) + realpathSync(parent)
  2. Another realpathSync(parent) immediately after
  3. resolveVerifiedFilePathrealpathSync(source) + statSync(source)
  4. copyFileSync
  5. statSync(safeSourcePath) again at mount.ts:336 to read mode for chmod
  6. chmodSync

That's ~6 extra syscalls per file beyond the actual copy. Wins:

  • Pre-create the destination directory tree once during the walk (we already visit every dir), then in the file path skip the ensureDirectoryWithinRoot step.
  • Cache parent-dir realpaths (one entry per directory, not per file).
  • For non-readonly files, skip the chmod entirely unless we actually need to preserve a non-default mode (e.g. exec bit). The double statSync on the source can go.
  • Path-safety checks should still happen, but at directory-entry time once, not on every file.

Goal: in the common case, file processing is copyFile + nothing else.

4. Expand default excludes

Where: DEFAULT_EXCLUDED_DIRS (mount.ts:60) — currently ['.git', 'node_modules', '.npm-cache'].

Add common build/cache/venv directories that are never useful inside an agent mount and are often huge:

  • target/ (Rust, Java/Maven)
  • .next/ (Next.js)
  • dist/, build/, out/ (generic build output)
  • __pycache__/, .pytest_cache/, .mypy_cache/, .ruff_cache/ (Python)
  • .venv/, venv/, env/ (Python)
  • .gradle/ (Java/Gradle)
  • coverage/, .nyc_output/ (test coverage)
  • .turbo/, .cache/ (build caches)
  • .DS_Store (macOS metadata files — already a single file, but worth filtering)

Considerations:

  • Keep these as defaults but ensure callers can override (the existing excludeDirs option appends, doesn't replace — verify that semantics are still right, and consider exposing a way to opt out of a specific default).
  • Document the default list in the README.
  • Some of these (dist/, build/) are project-source in some repos. Decide whether to gate by presence of common project markers or just ship the list and let users opt out.

Out of scope for this issue

Reflinks / clonefile / same-volume mount placement — likely the biggest single win, but it's an architectural change to the mount-path default and warrants its own issue + design discussion.

A native (Rust) walker — possibly worth revisiting if these four together don't get init under ~100ms on representative repos, but not yet.

Suggested verification

  • Add a benchmark test that times createMount against a synthetic tree (e.g. 10k files, 100 dirs, mix of readonly/writable).
  • Capture before/after numbers in the PR descriptions for each item.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions