feat(cll): pre-cached full CLL map with full_map API parameter by danyelf · Pull Request #1221 · DataRecce/recce

danyelf · 2026-03-19T06:42:09Z

Summary

Adds build_full_cll_map() to DbtAdapter: pre-computes CLL for every node in the manifest on first call, includes change analysis, caches on adapter instance
Refactors get_cll() to slice from the cached full map instead of computing per-request (no_cll path unchanged)
Adds full_map API parameter to POST /api/cll: returns the complete unfiltered map (99 nodes, 804 columns in jaffle_shop)
No frontend changes — existing API contract preserved exactly

Performance (jaffle_shop, 99 models)

Call	Main	This PR
Impact (cold)	1451ms	2607ms
Impact (warm)	377ms	21ms
Column CLL (warm)	10-81ms	~10ms
full_map	n/a	16ms

Cold impact is ~1.8x slower (computes all models upfront), but warm impact is 18x faster and all subsequent lookups are consistently ~10ms.

This is Part I of the unified CLL project. Part II will update the frontend to call full_map=true once on Impact click, making all column navigation instant (client-side).

Test plan

All 16 existing CLL tests pass unchanged
5 new tests: build_full_cll_map, caching, change analysis, unchanged model CLL, full_map API
Full test suite passes (682 tests)
Manual verification with jaffle_shop_duckdb (Snowflake): Impact button, column clicks match main branch exactly
POST /api/cll {"full_map": true} returns all models and columns
Side-by-side API comparison with main branch: identical output for all call patterns

🤖 Generated with Claude Code

danyelf · 2026-03-19T06:44:04Z

Addresses https://linear.app/recce/issue/DRC-3020/one-big-cll-on-the-backend

codecov · 2026-03-19T06:47:30Z

Codecov Report

❌ Patch coverage is 91.58879% with 18 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/adapter/dbt_adapter/__init__.py	88.37%	15 Missing ⚠️
recce/server.py	33.33%	2 Missing ⚠️
recce/cli.py	66.66%	1 Missing ⚠️

Files with missing lines	Coverage Δ
recce/core.py	`65.15% <ø> (-0.51%)`	⬇️
tests/adapter/dbt_adapter/conftest.py	`100.00% <100.00%> (ø)`
tests/adapter/dbt_adapter/test_dbt_cll.py	`98.94% <100.00%> (+0.24%)`	⬆️
recce/cli.py	`52.76% <66.66%> (+0.04%)`	⬆️
recce/server.py	`63.69% <33.33%> (-0.16%)`	⬇️
recce/adapter/dbt_adapter/__init__.py	`78.16% <88.37%> (+1.34%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

danyelf · 2026-03-23T19:15:49Z

Added updates in response to @wcchang1115 's feedback:

--cll-full-map CLI flag — opt-in to pre-cached full CLL map behavior
Feature flag plumbing — cll_full_map in server flags, wired through to
get_cll()
Dual code paths — flag off = original per-node CLL, flag on = slice from
cached full map
full_map=True API param — always works regardless of the flag
Cleaner refresh() — deduplicated cache invalidation logic
log_performance labels — [cached_full_map] and [full_map] for new paths,
unchanged for default
Parametrized tests — 13 CLL tests now run with both [cll_per_node] and
[cll_full_map]

- Add build_full_cll_map() to DbtAdapter: pre-computes CLL for every node in the manifest, includes change analysis, caches on instance - Refactor get_cll() to slice from cached full map instead of computing per-request (no_cll path unchanged) - Add full_map parameter to CLL API: returns complete unfiltered map - Add tests for full map building, caching, change analysis, and API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>

Signed-off-by: Danyel Fisher <danyel@gmail.com>

danyelf · 2026-03-23T19:27:19Z

Code Review — PR #1221

Summary

Pre-computes the full column-level lineage map on first access and caches it on the adapter instance. Adds full_map API parameter to return the complete map, and a cll_full_map_enabled flag that makes per-node get_cll() calls slice from the cached map instead of computing individually. Well-structured change with good test coverage (5 new tests, 11 existing tests parametrized across both paths). CI is green.

Findings

[Warning] Race condition on `_full_cll_map` cache — not thread-safe

File: recce/adapter/dbt_adapter/__init__.py:989-1006 (in diff)
Issue: build_full_cll_map() uses a check-then-set pattern (if self._full_cll_map is not None: return) without synchronization. If two concurrent requests hit this method before the cache is populated, both will compute the full map simultaneously — wasting CPU but not corrupting data since the final assignment is atomic in CPython (GIL). However, FastAPI runs on an async event loop and get_cll() is called from an async def endpoint without await or thread pool dispatch, so concurrent requests to the same endpoint will actually serialize on the GIL. Net risk is low in practice, but worth noting since the existing get_cll_cached uses @lru_cache which is also not thread-safe but at least idempotent by design.

[Warning] `cll_full_map_enabled` parameter baked into adapter method signature

File: recce/adapter/dbt_adapter/__init__.py:996 (in diff)
Issue: get_cll() now takes cll_full_map_enabled as a parameter threaded from the server flag through the API endpoint. This means the adapter method's behavior depends on a server-level configuration flag passed per-call. The adapter already has self access — it could read the flag from its own state or from the context, keeping the method signature stable. As-is, every caller of get_cll() must know to pass this flag. The test fixture works around this with a patched_get_cll wrapper that injects kwargs.setdefault("cll_full_map_enabled", enabled), which is clever but confirms the ergonomic friction. This isn't blocking but could become a maintenance burden as more callers appear.

[Warning] `child_map` type inconsistency: `set` vs `list`

File: recce/adapter/dbt_adapter/__init__.py (build_full_cll_map, lines 71-76 in diff)
Issue: build_full_cll_map() builds child_map with set() values (line 75: child_map[parent] = set()), while the existing child_map construction in get_cll() (line 1129-1131 of current main) also uses sets. However, CllData.child_map is typed as Dict[str, Set[str]] so this is consistent. Just confirming no issue here — the parent_map entries copied in the slice path (line 184: parent_map[key] = set(full_map_data.parent_map[key])) correctly wrap in set() which is important since get_cll_cached returns parent_map values as lists.

[Info] `full_map=True` ignores `change_analysis` flag silently

File: recce/adapter/dbt_adapter/__init__.py:1081-1090 (in diff)
Issue: The comment at line 1082 documents this: "change_analysis flag is ignored — full map always includes change metadata." This is an intentional design decision and well-documented in the code comment. No action needed, but worth noting that this creates a subtle API contract difference — full_map=True always returns change data regardless of the change_analysis parameter. The PR description could mention this for future consumers.

Verification

All CI checks pass (18/19 — smoke-test-cloud failure is unrelated infrastructure issue)
5 new tests cover: build_full_cll_map, caching, change analysis, unchanged model CLL, full_map API
11 existing CLL tests parametrized with cll_full_map_enabled fixture to run both code paths

Verdict

No critical issues. The code is well-structured, thoroughly tested, and the caching strategy is sound. The two warnings above are minor ergonomic/design observations that don't block merge. Approved.

Rename --cll-full-map to --disable-cll-cache with inverted semantics: the pre-cached full CLL map is now on by default. Users can opt out via --disable-cll-cache or DISABLE_CLL_CACHE envvar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>

danyelf · 2026-03-24T06:35:08Z

Code Review — PR #1221

Summary

Pre-cached full CLL map with full_map API parameter. Well-structured caching layer that builds the full column-level lineage map once and slices from it on subsequent requests. All 771 tests pass, TypeScript types clean.

Findings

No critical issues found.

Architecture notes (informational, not blocking):

The caching approach is sound: build_full_cll_map() computes once per adapter lifecycle, get_cll() slices from the cached map via deepcopy, and all three invalidation points (load_artifacts, refresh, artifact changes) correctly clear the cache.
The parametrized test fixture (disable_cll_cache) is a nice pattern — it runs all 16 existing CLL tests through both the cached and per-node paths, providing strong equivalence coverage.
The refresh() refactor (lines 1640–1662) simplifies the cache invalidation logic by always clearing get_change_analysis_cached and _full_cll_map regardless of which artifact type changed, which is slightly more aggressive but safer than the previous per-artifact-type clearing.

Verdict

Approved — clean implementation with good test coverage across both code paths.

danyelf requested a review from wcchang1115 March 19, 2026 06:43

danyelf self-assigned this Mar 19, 2026

danyelf and others added 2 commits March 23, 2026 12:16

feat(cll): gate full cll map behind --cll-full-map flag

fbbaacc

Signed-off-by: Danyel Fisher <danyel@gmail.com>

danyelf force-pushed the worktree-unified-cll-call branch from ccd558e to fbbaacc Compare March 23, 2026 19:16

Merge branch 'main' into worktree-unified-cll-call

b68e1d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cll): pre-cached full CLL map with full_map API parameter#1221

feat(cll): pre-cached full CLL map with full_map API parameter#1221
danyelf wants to merge 4 commits intomainfrom
worktree-unified-cll-call

danyelf commented Mar 19, 2026

Uh oh!

danyelf commented Mar 19, 2026

Uh oh!

codecov bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

danyelf commented Mar 23, 2026

Uh oh!

danyelf commented Mar 23, 2026

Uh oh!

danyelf commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danyelf commented Mar 19, 2026

Summary

Performance (jaffle_shop, 99 models)

Test plan

Uh oh!

danyelf commented Mar 19, 2026

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

danyelf commented Mar 23, 2026

Uh oh!

danyelf commented Mar 23, 2026

Code Review — PR #1221

Summary

Findings

[Warning] Race condition on _full_cll_map cache — not thread-safe

[Warning] cll_full_map_enabled parameter baked into adapter method signature

[Warning] child_map type inconsistency: set vs list

[Info] full_map=True ignores change_analysis flag silently

Verification

Verdict

Uh oh!

danyelf commented Mar 24, 2026

Code Review — PR #1221

Summary

Findings

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Mar 19, 2026 •

edited

Loading

[Warning] Race condition on `_full_cll_map` cache — not thread-safe

[Warning] `cll_full_map_enabled` parameter baked into adapter method signature

[Warning] `child_map` type inconsistency: `set` vs `list`

[Info] `full_map=True` ignores `change_analysis` flag silently