Skip to content

Optimize Zoekt mirror: direct-to-WSL streaming and scoped reindexing#22

Merged
Joxx0r merged 3 commits intomainfrom
optimize-zoekt-mirror-reindex
Feb 8, 2026
Merged

Optimize Zoekt mirror: direct-to-WSL streaming and scoped reindexing#22
Joxx0r merged 3 commits intomainfrom
optimize-zoekt-mirror-reindex

Conversation

@Joxx0r
Copy link
Collaborator

@Joxx0r Joxx0r commented Feb 8, 2026

Summary

Closes #15

  • Per-project shard indexing: Each project gets its own Zoekt shard via -shard_prefix. File changes only reindex the affected project (~5-15s) instead of all 468K files (~90s). Uses -incremental to skip unchanged files within each shard
  • Direct-to-WSL bootstrap: First-time setup streams tar archive directly from SQLite to WSL via tar-stream, skipping the intermediate Windows filesystem mirror. Reduces bootstrap from ~15min to ~4min. Windows mirror populated in background for watcher
  • Watcher project tracking: processPendingUpdates() collects affected project names and passes them to triggerReindex() for scoped reindexing

Changed Files

File Changes
src/service/zoekt-manager.js runIndexForProject(), reindexProjects(), _listMirrorProjects(), modified triggerReindex() with project tracking, bootstrapDirect()
src/service/zoekt-mirror.js bootstrapToStream() for tar streaming, _computePathPrefix() extracted as shared helper
src/service/watcher.js Collect affected project names, pass to triggerReindex(count, affectedProjects)
src/service/index.js Reordered startup: init manager before mirror, direct-to-WSL bootstrap path, background Windows mirror
package.json Added tar-stream dependency

Expected Performance Impact

  • Incremental reindex: 90s → 5-15s (per-project) or <1s (incremental, unchanged files)
  • First-time bootstrap: ~15min → ~4min (direct-to-WSL)

Test plan

  • Start server with existing mirror — verify per-project sharding logs (Starting per-project index (N projects))
  • Modify a single .as file — verify scoped reindex logs show only that project, duration <15s
  • Delete .zoekt-mirror-marker + WSL mirror, restart — verify direct-to-WSL bootstrap path used
  • Run grep searches with project filter — verify results unchanged
  • Verify watcher incremental updates still work (Windows + WSL mirrors both updated)

🤖 Generated with Claude Code

Addresses three major bottlenecks in the Zoekt mirror pipeline:

1. Per-project shard indexing: Instead of reindexing all 468K files on
   every change (~90s), each project gets its own Zoekt shard using
   -shard_prefix. The watcher now tracks affected projects and triggers
   scoped reindex (5-15s per project). Uses -incremental flag to skip
   unchanged files within each shard.

2. Direct-to-WSL bootstrap: On first-time setup, streams tar archive
   directly from SQLite to WSL via tar-stream, skipping the intermediate
   Windows filesystem mirror. Reduces bootstrap from ~15min to ~4min.
   The Windows mirror is populated in the background for watcher use.

3. Watcher project tracking: processPendingUpdates() now collects
   affected project names and passes them to triggerReindex() for
   scoped reindexing instead of full directory walks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 8, 2026

@claude Please review this PR. Focus on:

  • Code quality and potential bugs
  • Security issues
  • Test coverage
  • Documentation completeness

Use the opus model for thorough analysis.

Joxx0r and others added 2 commits February 8, 2026 06:15
The installed zoekt-index binary doesn't support these flags. Per-project
sharding works without them — each project subdirectory naturally produces
its own named shard files based on the source directory path.

Tested: all 5 projects (Discovery, Engine, Pioneer, Shared, _assets) index
successfully with per-project shards in 76.9s total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove symlink wrapper approach (zoekt-index doesn't follow symlinks)
- Use Zoekt's Repository field to prepend project prefix to file paths
- Update query filters to support both monolithic and per-project shards
  using (file:project/ or repo:project) and -repo:_assets
- Fix bootstrapDirect stream error: suppress EPIPE/premature close errors
  that occur when WSL tar process exits after consuming complete archive
- Add old shard cleanup after successful per-project indexing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Joxx0r Joxx0r merged commit 8aec11b into main Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize mirror bootstrap: direct-to-WSL streaming and scoped reindexing

1 participant