Optimize Zoekt mirror: direct-to-WSL streaming and scoped reindexing#22
Merged
Optimize Zoekt mirror: direct-to-WSL streaming and scoped reindexing#22
Conversation
Addresses three major bottlenecks in the Zoekt mirror pipeline: 1. Per-project shard indexing: Instead of reindexing all 468K files on every change (~90s), each project gets its own Zoekt shard using -shard_prefix. The watcher now tracks affected projects and triggers scoped reindex (5-15s per project). Uses -incremental flag to skip unchanged files within each shard. 2. Direct-to-WSL bootstrap: On first-time setup, streams tar archive directly from SQLite to WSL via tar-stream, skipping the intermediate Windows filesystem mirror. Reduces bootstrap from ~15min to ~4min. The Windows mirror is populated in the background for watcher use. 3. Watcher project tracking: processPendingUpdates() now collects affected project names and passes them to triggerReindex() for scoped reindexing instead of full directory walks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. Focus on:
Use the opus model for thorough analysis. |
The installed zoekt-index binary doesn't support these flags. Per-project sharding works without them — each project subdirectory naturally produces its own named shard files based on the source directory path. Tested: all 5 projects (Discovery, Engine, Pioneer, Shared, _assets) index successfully with per-project shards in 76.9s total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove symlink wrapper approach (zoekt-index doesn't follow symlinks) - Use Zoekt's Repository field to prepend project prefix to file paths - Update query filters to support both monolithic and per-project shards using (file:project/ or repo:project) and -repo:_assets - Fix bootstrapDirect stream error: suppress EPIPE/premature close errors that occur when WSL tar process exits after consuming complete archive - Add old shard cleanup after successful per-project indexing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #15
-shard_prefix. File changes only reindex the affected project (~5-15s) instead of all 468K files (~90s). Uses-incrementalto skip unchanged files within each shardtar-stream, skipping the intermediate Windows filesystem mirror. Reduces bootstrap from ~15min to ~4min. Windows mirror populated in background for watcherprocessPendingUpdates()collects affected project names and passes them totriggerReindex()for scoped reindexingChanged Files
src/service/zoekt-manager.jsrunIndexForProject(),reindexProjects(),_listMirrorProjects(), modifiedtriggerReindex()with project tracking,bootstrapDirect()src/service/zoekt-mirror.jsbootstrapToStream()for tar streaming,_computePathPrefix()extracted as shared helpersrc/service/watcher.jstriggerReindex(count, affectedProjects)src/service/index.jspackage.jsontar-streamdependencyExpected Performance Impact
Test plan
Starting per-project index (N projects)).asfile — verify scoped reindex logs show only that project, duration <15s.zoekt-mirror-marker+ WSL mirror, restart — verify direct-to-WSL bootstrap path used🤖 Generated with Claude Code