perf: cache file reads and AST parses during discovery pass#2135
Open
perf: cache file reads and AST parses during discovery pass#2135
Conversation
This was referenced May 7, 2026
5c1cc1e to
828496b
Compare
dcc5bea to
9164a8d
Compare
828496b to
36074a3
Compare
9164a8d to
120ea4b
Compare
36074a3 to
6bbf072
Compare
120ea4b to
beac23d
Compare
Moves path-based checks (test file detection, ignore paths, submodule paths, site-packages, outside module-root) to run BEFORE read_text() is called in get_all_files_and_functions() and get_functions_within_lines(). This avoids unnecessary file I/O for files that would be discarded by filter_functions() anyway. Also fixes pre-existing mypy error in _find_all_functions_via_language_support where discover_functions was called with wrong argument order. Signature changes (backward-compatible, all new params are optional): - get_all_files_and_functions: added tests_root, module_root params - get_functions_within_lines: added tests_root, ignore_paths, module_root - get_functions_within_git_diff: added tests_root, ignore_paths, module_root
Introduces a discovery-scoped cache (`discovery_cache()` context manager) that ensures each file is read from disk at most once and parsed into an AST at most once within a single discovery run. Key changes: - `read_file_cached()`: returns cached file content when `discovery_cache()` is active, falls back to normal read otherwise - `parse_ast_cached()`: returns cached `ast.Module` when active - `get_functions_to_optimize()`: wraps its body with `discovery_cache()` - `find_all_functions_in_file()`: uses `read_file_cached` - `inspect_top_level_functions_or_methods()`: uses `parse_ast_cached` - `get_all_replay_test_functions()`: uses `parse_ast_cached` - JS/TS export helpers: use `read_file_cached` - Removed dead code: `_find_all_functions_via_language_support()` (never called, had a type error passing Path as source str) Signature changes: None. All public function signatures are unchanged. The cache is transparent -- when not inside `discovery_cache()`, all functions behave identically to before (direct reads/parses).
The _is_js_ts_function_exported and _is_js_ts_function_exists_but_not_exported helpers each read the file from disk independently, causing up to 2 extra reads per file during get_functions_to_optimize. Add an optional `source` parameter to both functions so callers can pass pre-read content. The main call site now reads the file once and passes it to both helpers. Also fixes pre-existing mypy error in _find_all_functions_via_language_support where discover_functions was called with wrong argument order. Signatures changed: - _is_js_ts_function_exported(file_path, function_name, source=None) - _is_js_ts_function_exists_but_not_exported(file_path, function_name, source=None)
The --file path was calling file.read_text() directly even though find_all_functions_in_file() already primed the discovery cache. Now uses read_file_cached() to hit the cache with zero disk I/O.
The callers BLOB column was fetched and JSON-decoded for every row during FunctionRanker initialization, but the resulting data was never accessed — FunctionRanker.load_function_stats() discards it as `_callers`. This eliminates O(n) JSON parsing at ranking startup. Also fixes all pre-existing mypy errors in this file by declaring pstats.Stats attributes that the type stubs don't expose.
6bbf072 to
6527780
Compare
beac23d to
93763a3
Compare
aseembits93
reviewed
May 8, 2026
|
|
||
| class TestFiles(BaseModel): | ||
| test_files: list[TestFile] | ||
| _seen_paths: set[Path] = PrivateAttr(default_factory=set) |
Contributor
There was a problem hiding this comment.
wasn't this merged already?
aseembits93
previously approved these changes
May 8, 2026
perf: read JS/TS files once during discovery export checks
…decode perf: skip unused callers JSON decode in ProfileStats
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Parent PR
#2132 — perf: targeted performance improvements for E2E pipeline hot path
Summary
discovery_cache()context manager that caches file content and parsed ASTs for the duration of a single discovery runread_file_cached()ensures each file is read from disk at most onceparse_ast_cached()ensures each file's AST is parsed at most once_find_all_functions_via_language_support()(never called, had pre-existing type error)Changes
get_functions_to_optimize(): wraps body withdiscovery_cache()find_all_functions_in_file(): usesread_file_cachedinspect_top_level_functions_or_methods(): usesparse_ast_cachedget_all_replay_test_functions(): usesparse_ast_cachedread_file_cachedtests/test_discovery_cache.py(12 tests)Stack
mainperf/discovery-prefilter