Replace grep with Zoekt code search (235x faster)#14
Merged
Conversation
Integrate Zoekt (Google/Sourcegraph) for sub-20ms source code search, replacing the SQLite trigram grepInline pipeline. Key changes: - zoekt-mirror.js: Bootstrap mirror directory from compressed SQLite content (one-time ~30s for 70K files) - zoekt-manager.js: Manage Zoekt processes via WSL2 (with WSL-native filesystem for fast I/O, tar pipe bulk sync, rsync incremental) - zoekt-client.js: HTTP client for Zoekt search API with base64 decoding and result format translation - api.js: Simplified /grep to use Zoekt only (no grepInline fallback), asset search still via grepInline - index.js: Zoekt startup integration, deferred watcher start (60s) to avoid event loop blocking from chokidar on slow P4 drives - watcher.js: Mirror updates + Zoekt re-index on file changes Performance: Zoekt 2-20ms vs old grepInline 1-3s (100-500x faster) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Index 398K assets alongside source files in Zoekt mirror (under _assets/ prefix with .uasset extension to avoid EISDIR collisions). The /grep endpoint now runs two parallel Zoekt queries instead of Zoekt + grepInline, eliminating the ~900ms SQLite trigram bottleneck. Fix webserver restart cascade: replace pkill-in-loop with _waitForPortFree(), add _restartPending flag, delay restartAttempts reset for 10s stability window. Add comprehensive 65-test stress test covering search, filters, concurrency, and structural queries. Total /grep latency: 940ms avg -> 4ms avg (235x improvement). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. Focus on:
Use the opus model for thorough analysis. |
- Move runIndex() to background so server starts in ~5s instead of ~90s (existing Zoekt shards serve queries immediately) - Extend /health endpoint with zoekt status (available, indexing, port, lastIndexTime, restartAttempts, useWsl) via new getStatus() method - Add setup wizard "Check prerequisites" option detecting Go, Zoekt, and WSL2 with install instructions; auto-runs after full setup - Add 2 stress tests for Zoekt health fields (67/67 passing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase A (bug fixes): - Deduplicate asset search results (one result per asset, matchedFields count) - Fix WSL incremental mirror sync (propagate watcher changes to WSL mirror) Phase B (reliability): - Add bootstrap progress reporting (every 5K files with ETA) - Add mirror integrity check on startup (auto-rebuild on >5% drift) - Adaptive reindex debounce (2s-30s based on change volume) Phase C (quality of life): - Basic search result ranking (header files + match density scoring) - Web search UI at / (vanilla HTML, dark theme, vscode:// links) 73/73 stress tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/grependpointPerformance Impact
Architecture
zoekt-mirror.js— Bootstraps flat file mirror from SQLite (source + assets under_assets/prefix)zoekt-manager.js— Process lifecycle forzoekt-webserverandzoekt-index, WSL2 support on Windowszoekt-client.js— HTTP client withsearch()andsearchAssets()methods_waitForPortFree()replaces pkill cascade,_restartPendingflag prevents concurrent restartsPrerequisites
Requires Go + Zoekt binaries (
go install github.com/sourcegraph/zoekt/cmd/...@latest). Falls back to trigram search if unavailable.Test plan
🤖 Generated with Claude Code