Skip to content

Replace grep with Zoekt code search (235x faster)#14

Merged
Joxx0r merged 4 commits intomainfrom
perf-overhaul
Feb 8, 2026
Merged

Replace grep with Zoekt code search (235x faster)#14
Joxx0r merged 4 commits intomainfrom
perf-overhaul

Conversation

@Joxx0r
Copy link
Collaborator

@Joxx0r Joxx0r commented Feb 7, 2026

Summary

  • Replace SQLite trigram + grepInline with Zoekt code search engine for the /grep endpoint
  • Index both source files (70K) and assets (398K) in Zoekt, searched in parallel
  • Fix Zoekt webserver restart loop that caused cascading port conflicts during shard reloading
  • Add comprehensive stress test (65 tests across 11 categories)

Performance Impact

Metric Before After Improvement
Average /grep latency 940ms 4ms 235x
p95 latency 1,188ms 13ms 91x
10 concurrent queries 3,800ms 24ms 158x
50 rapid-fire (0 errors) n/a 203ms total -

Architecture

  • zoekt-mirror.js — Bootstraps flat file mirror from SQLite (source + assets under _assets/ prefix)
  • zoekt-manager.js — Process lifecycle for zoekt-webserver and zoekt-index, WSL2 support on Windows
  • zoekt-client.js — HTTP client with search() and searchAssets() methods
  • Webserver restart: _waitForPortFree() replaces pkill cascade, _restartPending flag prevents concurrent restarts

Prerequisites

Requires Go + Zoekt binaries (go install github.com/sourcegraph/zoekt/cmd/...@latest). Falls back to trigram search if unavailable.

Test plan

  • 65/65 stress tests pass (literal, regex, filters, context, format, assets, concurrent, latency, stability, structural)
  • Source search: 3ms avg Zoekt query time
  • Asset search: returns correct results from Zoekt (was grepInline)
  • Webserver survives full reindex (7 shards, 468K files) without restart loop
  • 50 rapid-fire queries: 0 errors, 4ms avg
  • Graceful fallback if Zoekt unavailable

🤖 Generated with Claude Code

Joxx0r and others added 2 commits February 7, 2026 19:02
Integrate Zoekt (Google/Sourcegraph) for sub-20ms source code search,
replacing the SQLite trigram grepInline pipeline. Key changes:

- zoekt-mirror.js: Bootstrap mirror directory from compressed SQLite
  content (one-time ~30s for 70K files)
- zoekt-manager.js: Manage Zoekt processes via WSL2 (with WSL-native
  filesystem for fast I/O, tar pipe bulk sync, rsync incremental)
- zoekt-client.js: HTTP client for Zoekt search API with base64
  decoding and result format translation
- api.js: Simplified /grep to use Zoekt only (no grepInline fallback),
  asset search still via grepInline
- index.js: Zoekt startup integration, deferred watcher start (60s)
  to avoid event loop blocking from chokidar on slow P4 drives
- watcher.js: Mirror updates + Zoekt re-index on file changes

Performance: Zoekt 2-20ms vs old grepInline 1-3s (100-500x faster)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Index 398K assets alongside source files in Zoekt mirror (under
_assets/ prefix with .uasset extension to avoid EISDIR collisions).
The /grep endpoint now runs two parallel Zoekt queries instead of
Zoekt + grepInline, eliminating the ~900ms SQLite trigram bottleneck.

Fix webserver restart cascade: replace pkill-in-loop with
_waitForPortFree(), add _restartPending flag, delay restartAttempts
reset for 10s stability window. Add comprehensive 65-test stress
test covering search, filters, concurrency, and structural queries.

Total /grep latency: 940ms avg -> 4ms avg (235x improvement).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 7, 2026

@claude Please review this PR. Focus on:

  • Code quality and potential bugs
  • Security issues
  • Test coverage
  • Documentation completeness

Use the opus model for thorough analysis.

Joxx0r and others added 2 commits February 7, 2026 22:58
- Move runIndex() to background so server starts in ~5s instead of ~90s
  (existing Zoekt shards serve queries immediately)
- Extend /health endpoint with zoekt status (available, indexing, port,
  lastIndexTime, restartAttempts, useWsl) via new getStatus() method
- Add setup wizard "Check prerequisites" option detecting Go, Zoekt, and
  WSL2 with install instructions; auto-runs after full setup
- Add 2 stress tests for Zoekt health fields (67/67 passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase A (bug fixes):
- Deduplicate asset search results (one result per asset, matchedFields count)
- Fix WSL incremental mirror sync (propagate watcher changes to WSL mirror)

Phase B (reliability):
- Add bootstrap progress reporting (every 5K files with ETA)
- Add mirror integrity check on startup (auto-rebuild on >5% drift)
- Adaptive reindex debounce (2s-30s based on change volume)

Phase C (quality of life):
- Basic search result ranking (header files + match density scoring)
- Web search UI at / (vanilla HTML, dark theme, vscode:// links)

73/73 stress tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Joxx0r Joxx0r merged commit 56d46cb into main Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant