perf: Tier 2 optimizations - vector reset and file caching #180

tnm · 2026-01-07T06:37:42Z

Summary

Two performance optimizations from Tier 2 (removed module map change after benchmarking showed pathlib overhead made it slower):

1. Vector search collection reset (`vector_searcher.py`)

Track reset state to avoid redundant resets on multiple add() calls
Use delete_collection as primary fast path (avoids count() call)
Only fall back to expensive ID enumeration when necessary
~2-5x faster build_index() restarts

2. Context extractor file caching (`context_extractor.py`)

Add mtime-based file content cache shared across methods
chunk_file_by_lines, chunk_file_by_symbols, extract_context_around_line now share cached file reads
Simple FIFO eviction when cache exceeds 100 files
10-94x faster for repeated reads of same file

Benchmark Results

File Caching (synthetic)

Reads of same file	Uncached	Cached	Speedup
10	0.78ms	0.08ms	10x
50	3.69ms	0.12ms	30x
100	7.49ms	0.08ms	94x

Real-world (kit codebase)

Operation	Result
Context extractor (180 calls on 20 files)	0.88ms avg per call

Removed

Module map pathlib optimization - Benchmarking showed pathlib object creation overhead (Path(), .parents, str()) made it 10x slower than the simple nested os.path.dirname() loops. The theoretical O(n²) → O(n) improvement was overwhelmed by constant factors.

Test plan

All existing tests pass
Formatting/linting passes
Benchmark verification completed

Two performance optimizations (removed module map change after benchmarking showed pathlib overhead made it slower): 1. Vector search collection reset (vector_searcher.py) - Track reset state to avoid redundant resets on multiple add() calls - Use delete_collection as primary fast path (avoids count() call) - Only fall back to expensive ID enumeration when necessary - ~2-5x faster build_index() restarts 2. Context extractor file caching (context_extractor.py) - Add mtime-based file content cache shared across methods - chunk_file_by_lines, chunk_file_by_symbols, extract_context_around_line now share cached file reads - Simple FIFO eviction when cache exceeds 100 files - 10-94x faster for repeated reads of same file Benchmark results: - File caching: 10x speedup (10 reads), 94x speedup (100 reads) - Context extractor: 0.88ms avg per call with caching enabled

Add missing performance improvements from PRs #179 and #180: - Tier 1: O(n²)→O(1) dependency graph, O(n)→O(log n) line lookup, regex precompile - Tier 2: Vector search collection reset, context extractor file caching

tnm force-pushed the perf/tier2-optimizations branch from f0eafce to 51b4dd0 Compare January 7, 2026 06:40

tnm changed the title ~~perf: Tier 2 optimizations - caching and algorithm improvements~~ perf: Tier 2 optimizations - vector reset and file caching Jan 7, 2026

tnm merged commit 523acf6 into main Jan 7, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Tier 2 optimizations - vector reset and file caching #180

perf: Tier 2 optimizations - vector reset and file caching #180

Uh oh!

tnm commented Jan 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: Tier 2 optimizations - vector reset and file caching #180

perf: Tier 2 optimizations - vector reset and file caching #180

Uh oh!

Conversation

tnm commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Vector search collection reset (vector_searcher.py)

2. Context extractor file caching (context_extractor.py)

Benchmark Results

File Caching (synthetic)

Real-world (kit codebase)

Removed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tnm commented Jan 7, 2026 •

edited

Loading

1. Vector search collection reset (`vector_searcher.py`)

2. Context extractor file caching (`context_extractor.py`)