Skip to content

perf: cache site-packages, shallow copy args, avoid double dict lookup#2143

Open
KRRT7 wants to merge 12 commits intomainfrom
perf/misc-hot-path
Open

perf: cache site-packages, shallow copy args, avoid double dict lookup#2143
KRRT7 wants to merge 12 commits intomainfrom
perf/misc-hot-path

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented May 7, 2026

Summary

  • Cache site.getsitepackages() to avoid repeated filesystem resolution
  • Use shallow copy for args/test_cfg in worktree mode instead of deep copy
  • Avoid double dict lookup pattern in line profiler results handling

Stack

Part of #2132 linear stack.

Test plan

  • Existing test suite passes
  • All 3 changes are mechanical micro-optimizations with no behavioral change

@KRRT7 KRRT7 force-pushed the cf-2132-skip-callers-json-decode branch from 426c0ff to 7c12c6c Compare May 7, 2026 17:30
@KRRT7 KRRT7 force-pushed the perf/misc-hot-path branch 2 times, most recently from eb15a7e to 8aaf33b Compare May 7, 2026 17:42
@KRRT7 KRRT7 force-pushed the cf-2132-skip-callers-json-decode branch from 7c12c6c to 89cc54f Compare May 7, 2026 17:42
KRRT7 added 10 commits May 7, 2026 17:46
Moves path-based checks (test file detection, ignore paths, submodule
paths, site-packages, outside module-root) to run BEFORE read_text() is
called in get_all_files_and_functions() and get_functions_within_lines().
This avoids unnecessary file I/O for files that would be discarded by
filter_functions() anyway.

Also fixes pre-existing mypy error in _find_all_functions_via_language_support
where discover_functions was called with wrong argument order.

Signature changes (backward-compatible, all new params are optional):
- get_all_files_and_functions: added tests_root, module_root params
- get_functions_within_lines: added tests_root, ignore_paths, module_root
- get_functions_within_git_diff: added tests_root, ignore_paths, module_root
Introduces a discovery-scoped cache (`discovery_cache()` context
manager) that ensures each file is read from disk at most once and
parsed into an AST at most once within a single discovery run.

Key changes:
- `read_file_cached()`: returns cached file content when
  `discovery_cache()` is active, falls back to normal read otherwise
- `parse_ast_cached()`: returns cached `ast.Module` when active
- `get_functions_to_optimize()`: wraps its body with `discovery_cache()`
- `find_all_functions_in_file()`: uses `read_file_cached`
- `inspect_top_level_functions_or_methods()`: uses `parse_ast_cached`
- `get_all_replay_test_functions()`: uses `parse_ast_cached`
- JS/TS export helpers: use `read_file_cached`
- Removed dead code: `_find_all_functions_via_language_support()` (never
  called, had a type error passing Path as source str)

Signature changes: None. All public function signatures are unchanged.
The cache is transparent -- when not inside `discovery_cache()`, all
functions behave identically to before (direct reads/parses).
The _is_js_ts_function_exported and _is_js_ts_function_exists_but_not_exported
helpers each read the file from disk independently, causing up to 2 extra reads
per file during get_functions_to_optimize. Add an optional `source` parameter
to both functions so callers can pass pre-read content. The main call site now
reads the file once and passes it to both helpers.

Also fixes pre-existing mypy error in _find_all_functions_via_language_support
where discover_functions was called with wrong argument order.

Signatures changed:
- _is_js_ts_function_exported(file_path, function_name, source=None)
- _is_js_ts_function_exists_but_not_exported(file_path, function_name, source=None)
The --file path was calling file.read_text() directly even though
find_all_functions_in_file() already primed the discovery cache.
Now uses read_file_cached() to hit the cache with zero disk I/O.
The callers BLOB column was fetched and JSON-decoded for every row
during FunctionRanker initialization, but the resulting data was
never accessed — FunctionRanker.load_function_stats() discards it
as `_callers`. This eliminates O(n) JSON parsing at ranking startup.

Also fixes all pre-existing mypy errors in this file by declaring
pstats.Stats attributes that the type stubs don't expose.
…ite.getsitepackages

The lru_cache on _get_site_packages_paths() retained stale results
across tests, causing 4 failures when monkeypatch replaced the
underlying site.getsitepackages function.
The `_extract_names_from_annotation` method accessed
`self.definitions[self.current_top_level_name]` without verifying
that `current_top_level_name` exists in the definitions dict. When
visiting methods inside built-in types (e.g., `str.removeprefix`),
the dotted name was set as `current_top_level_name` but never added
to `definitions`, causing a KeyError on CI.
@KRRT7 KRRT7 force-pushed the cf-2132-skip-callers-json-decode branch from 89cc54f to 448ff82 Compare May 7, 2026 22:48
@KRRT7 KRRT7 force-pushed the perf/misc-hot-path branch from 7af0ec7 to 5e4f73b Compare May 7, 2026 22:48
@KRRT7 KRRT7 marked this pull request as draft May 7, 2026 22:53
Base automatically changed from cf-2132-skip-callers-json-decode to perf/discovery-ast-cache May 8, 2026 00:35
@KRRT7 KRRT7 marked this pull request as ready for review May 8, 2026 00:36
@KRRT7 KRRT7 changed the base branch from perf/discovery-ast-cache to main May 8, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant