Skip to content

Commit 24c8957

Browse files
committed
refactor: drop CLDK-side analysis cache, rely on codeanalyzer-python
Remove the content-addressed two-tier cache (cache.py) and CLDK's own analysis.json read/write. codeanalyzer-python already caches the virtualenv, CodeQL database, and analysis result under its cache_dir with checksum-based invalidation, so the CLDK layer was a redundant second cache that also caused unbounded ~/.cldk growth and coupled the CodeQL DB to a dependency-hash key. cache_dir/analysis_json_path are now forwarded verbatim to the backend (None -> backend defaults to <project>/.codeanalyzer); eager maps to rebuild_analysis.
1 parent 3292902 commit 24c8957

5 files changed

Lines changed: 42 additions & 253 deletions

File tree

README.md

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -207,19 +207,17 @@ Each language has a dedicated analysis backend implemented under `cldk.analysis.
207207
- **Tools:** `codeanalyzer-python` (Jedi + CodeQL, default on), Tree-sitter for source-level parsing
208208
- **Capabilities:** Symbol table, call graph, class/method resolution, comments/docstrings
209209

210-
> **Note — analysis cache:** Analysis artifacts are cached under `~/.cldk`
211-
> (override with `$CLDK_CACHE_DIR`): the backend virtualenv and CodeQL
212-
> database under `~/.cldk/venvs/<dep_hash>/`, and `analysis.json` under
213-
> `~/.cldk/cache/<key>/`. **CodeQL is enabled by default**
214-
> (`use_codeql=True`), so the first analysis of a project builds a CodeQL
215-
> database and provisions the CodeQL CLI — expect a slow cold run; subsequent
216-
> runs on the same source tree are cache hits. Pass `use_codeql=False` for
217-
> Jedi-only analysis. The CodeQL flag is part of the analysis cache key, so
218-
> toggling it — or upgrading from a version that defaulted it off — triggers
219-
> a **one-time** rebuild under a new key (no stale data is served). If you
220-
> instead point `cache_dir` (the backend virtualenv / CodeQL database) or
221-
> `analysis_json_path` inside a project, add those directories to your
222-
> `.gitignore` — they are large and environment-specific.
210+
> **Note — analysis cache:** Caching is owned entirely by
211+
> `codeanalyzer-python`; CLDK keeps no cache of its own. Artifacts (the
212+
> backend virtualenv, CodeQL database, and `analysis_cache.json`) live under
213+
> the backend's `cache_dir`, which defaults to `<project>/.codeanalyzer` and
214+
> can be redirected with the `cache_dir` argument. **CodeQL is enabled by
215+
> default** (`use_codeql=True`), so the first analysis of a project builds a
216+
> CodeQL database and provisions the CodeQL CLI — expect a slow cold run;
217+
> subsequent runs reuse the backend's checksum-validated cache. Pass
218+
> `use_codeql=False` for Jedi-only analysis. Add the `cache_dir` location
219+
> (e.g. `.codeanalyzer/`) to your `.gitignore` — it is large and
220+
> environment-specific.
223221
224222
#### C
225223
- **Backend:** `cldk.analysis.c`

cldk/analysis/python/codeanalyzer/cache.py

Lines changed: 0 additions & 182 deletions
This file was deleted.

cldk/analysis/python/codeanalyzer/codeanalyzer.py

Lines changed: 19 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,9 @@
3333
from codeanalyzer.config import OutputFormat
3434
from codeanalyzer.core import Codeanalyzer
3535
from codeanalyzer.options import AnalysisOptions
36-
from codeanalyzer.schema import model_dump_json, model_validate_json
36+
from codeanalyzer.schema import model_dump_json
3737

3838
from cldk.analysis import AnalysisLevel
39-
from cldk.analysis.python.codeanalyzer.cache import (
40-
default_analysis_dir,
41-
default_backend_cache_dir,
42-
)
4339
from cldk.models.python import (
4440
PyApplication,
4541
PyCallEdge,
@@ -59,17 +55,15 @@ class PyCodeanalyzer:
5955
Args:
6056
project_dir: Path to the Python project root.
6157
analysis_level: Analysis level (symbol_table or call_graph).
62-
analysis_json_path: Directory to persist analysis.json. If the file
63-
exists and ``eager_analysis`` is False, it is loaded instead of
64-
re-running the analyzer. When omitted, a content-addressed
65-
location under the CLDK cache root is used (see
66-
:mod:`cldk.analysis.python.codeanalyzer.cache`).
67-
eager_analysis: If True, always re-runs the analyzer even when a
68-
cached analysis.json is available.
69-
cache_dir: Cache directory for the analyzer's virtualenv and CodeQL
70-
artifacts. Forwarded verbatim to ``AnalysisOptions.cache_dir``.
71-
When omitted, a dependency-hash-keyed location under the CLDK
72-
cache root is used so the virtualenv survives source edits.
58+
analysis_json_path: Forwarded verbatim to ``AnalysisOptions.output``.
59+
``codeanalyzer-python`` owns all caching; CLDK neither reads nor
60+
writes its own analysis.json.
61+
eager_analysis: If True, forces the backend to rebuild its analysis
62+
(``AnalysisOptions.rebuild_analysis``) rather than reuse its cache.
63+
cache_dir: Cache home for ``codeanalyzer-python`` (its virtualenv,
64+
CodeQL database, and ``analysis_cache.json``). Forwarded verbatim
65+
to ``AnalysisOptions.cache_dir``. When None, the backend defaults
66+
it to ``<project_dir>/.codeanalyzer``.
7367
target_files: Optional single target file (relative to project_dir).
7468
When provided, only that file is analyzed.
7569
"""
@@ -92,25 +86,13 @@ def __init__(
9286
self.target_files = target_files
9387
self.use_codeql = use_codeql
9488

95-
# Cache locations. Explicit args win; otherwise fall back to the
96-
# content-addressed CLDK cache (two independently-keyed tiers).
97-
if cache_dir:
98-
self.cache_dir = Path(cache_dir)
99-
else:
100-
self.cache_dir = default_backend_cache_dir(self.project_dir)
101-
if analysis_json_path:
102-
self.analysis_json_path = Path(analysis_json_path)
103-
else:
104-
self.analysis_json_path = default_analysis_dir(
105-
self.project_dir, analysis_level, use_codeql, target_files
106-
)
107-
logger.info(
108-
"CLDK cache — backend: %s | analysis: %s",
109-
self.cache_dir,
110-
self.analysis_json_path,
111-
)
89+
# codeanalyzer-python owns all caching. CLDK forwards these paths
90+
# verbatim; when cache_dir is None the backend defaults it to
91+
# <project_dir>/.codeanalyzer.
92+
self.cache_dir = Path(cache_dir) if cache_dir else None
93+
self.analysis_json_path = Path(analysis_json_path) if analysis_json_path else None
11294

113-
self.application: PyApplication = self._load_or_run_analyzer()
95+
self.application: PyApplication = self._run_analyzer()
11496
# Class-signature → file path lookup, built once.
11597
self._class_to_file: Dict[str, str] = {}
11698
for file_path, module in self.application.symbol_table.items():
@@ -123,13 +105,8 @@ def __init__(
123105
self.call_graph = None
124106

125107
# ----------------------------------------------------------------- core
126-
def _load_or_run_analyzer(self) -> PyApplication:
127-
"""Load a cached analysis.json when available, else run the analyzer."""
128-
cached_file = self.analysis_json_path / "analysis.json" if self.analysis_json_path else None
129-
if cached_file and cached_file.exists() and not self.eager_analysis:
130-
logger.info(f"Loading cached PyApplication from {cached_file}")
131-
return model_validate_json(PyApplication, cached_file.read_text())
132-
108+
def _run_analyzer(self) -> PyApplication:
109+
"""Run codeanalyzer-python; the backend handles its own caching."""
133110
target_file = None
134111
if self.target_files:
135112
if len(self.target_files) > 1:
@@ -151,12 +128,7 @@ def _load_or_run_analyzer(self) -> PyApplication:
151128
)
152129

153130
with Codeanalyzer(options) as analyzer:
154-
app = analyzer.analyze()
155-
156-
if self.analysis_json_path is not None:
157-
self.analysis_json_path.mkdir(parents=True, exist_ok=True)
158-
(self.analysis_json_path / "analysis.json").write_text(model_dump_json(app, indent=None))
159-
return app
131+
return analyzer.analyze()
160132

161133
@staticmethod
162134
def _build_call_graph(edges: List[PyCallEdge]) -> nx.DiGraph:

cldk/analysis/python/python_analysis.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,12 @@ class PythonAnalysis:
4747
4848
Args:
4949
project_dir: Directory path of the project (required).
50-
cache_dir: Writable directory where the ``codeanalyzer-python``
51-
backend provisions its virtualenv and CodeQL database (forwarded
52-
as the backend's ``cache_dir``). If None, a dependency-hash-keyed
53-
location under the CLDK cache root is used.
54-
analysis_json_path: Directory to persist analysis.json. If None, the
55-
analysis is not persisted across runs.
50+
cache_dir: Cache home for ``codeanalyzer-python`` — its virtualenv,
51+
CodeQL database, and ``analysis_cache.json`` (forwarded as the
52+
backend's ``cache_dir``). The backend owns all caching. If None,
53+
it defaults to ``<project_dir>/.codeanalyzer``.
54+
analysis_json_path: Forwarded to the backend's ``output``. CLDK keeps
55+
no cache of its own.
5656
analysis_level: Analysis level (symbol-table or call-graph).
5757
target_files: Optional list of target files to constrain analysis.
5858
eager_analysis: If True, regenerate analysis.json on each run.

cldk/core.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -77,11 +77,12 @@ def analysis(
7777
``cache_dir`` instead.
7878
analysis_json_path (str | Path | None): Path to persist the analysis
7979
database / ``analysis.json``.
80-
cache_dir (str | Path | None): Python only. Writable directory where
81-
the ``codeanalyzer-python`` backend provisions its virtualenv and
82-
CodeQL database (forwarded as the backend's ``cache_dir``). When
83-
omitted, a dependency-hash-keyed location under the CLDK cache
84-
root is used. Ignored for other languages.
80+
cache_dir (str | Path | None): Python only. Cache home for the
81+
``codeanalyzer-python`` backend — its virtualenv, CodeQL
82+
database, and ``analysis_cache.json`` (forwarded as the
83+
backend's ``cache_dir``). The backend owns all caching; when
84+
omitted it defaults to ``<project_path>/.codeanalyzer``.
85+
Ignored for other languages.
8586
use_codeql (bool): Python only, default True. Augments Jedi-resolved
8687
call edges with CodeQL-resolved edges; set False for a faster,
8788
Jedi-only analysis. Ignored for other languages.

0 commit comments

Comments
 (0)