Skip to content

feat(cll): pre-cached full CLL map with full_map API parameter#1221

Open
danyelf wants to merge 4 commits intomainfrom
worktree-unified-cll-call
Open

feat(cll): pre-cached full CLL map with full_map API parameter#1221
danyelf wants to merge 4 commits intomainfrom
worktree-unified-cll-call

Conversation

@danyelf
Copy link
Contributor

@danyelf danyelf commented Mar 19, 2026

Summary

  • Adds build_full_cll_map() to DbtAdapter: pre-computes CLL for every node in the manifest on first call, includes change analysis, caches on adapter instance
  • Refactors get_cll() to slice from the cached full map instead of computing per-request (no_cll path unchanged)
  • Adds full_map API parameter to POST /api/cll: returns the complete unfiltered map (99 nodes, 804 columns in jaffle_shop)
  • No frontend changes — existing API contract preserved exactly

Performance (jaffle_shop, 99 models)

Call Main This PR
Impact (cold) 1451ms 2607ms
Impact (warm) 377ms 21ms
Column CLL (warm) 10-81ms ~10ms
full_map n/a 16ms

Cold impact is ~1.8x slower (computes all models upfront), but warm impact is 18x faster and all subsequent lookups are consistently ~10ms.

This is Part I of the unified CLL project. Part II will update the frontend to call full_map=true once on Impact click, making all column navigation instant (client-side).

Test plan

  • All 16 existing CLL tests pass unchanged
  • 5 new tests: build_full_cll_map, caching, change analysis, unchanged model CLL, full_map API
  • Full test suite passes (682 tests)
  • Manual verification with jaffle_shop_duckdb (Snowflake): Impact button, column clicks match main branch exactly
  • POST /api/cll {"full_map": true} returns all models and columns
  • Side-by-side API comparison with main branch: identical output for all call patterns

🤖 Generated with Claude Code

@danyelf danyelf requested a review from wcchang1115 March 19, 2026 06:43
@danyelf danyelf self-assigned this Mar 19, 2026
@danyelf
Copy link
Contributor Author

danyelf commented Mar 19, 2026

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 91.58879% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/adapter/dbt_adapter/__init__.py 88.37% 15 Missing ⚠️
recce/server.py 33.33% 2 Missing ⚠️
recce/cli.py 66.66% 1 Missing ⚠️
Files with missing lines Coverage Δ
recce/core.py 65.15% <ø> (-0.51%) ⬇️
tests/adapter/dbt_adapter/conftest.py 100.00% <100.00%> (ø)
tests/adapter/dbt_adapter/test_dbt_cll.py 98.94% <100.00%> (+0.24%) ⬆️
recce/cli.py 52.76% <66.66%> (+0.04%) ⬆️
recce/server.py 63.69% <33.33%> (-0.16%) ⬇️
recce/adapter/dbt_adapter/__init__.py 78.16% <88.37%> (+1.34%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@danyelf
Copy link
Contributor Author

danyelf commented Mar 23, 2026

Added updates in response to @wcchang1115 's feedback:

  1. --cll-full-map CLI flag — opt-in to pre-cached full CLL map behavior
  2. Feature flag plumbing — cll_full_map in server flags, wired through to
    get_cll()
  3. Dual code paths — flag off = original per-node CLL, flag on = slice from
    cached full map
  4. full_map=True API param — always works regardless of the flag
  5. Cleaner refresh() — deduplicated cache invalidation logic
  6. log_performance labels — [cached_full_map] and [full_map] for new paths,
    unchanged for default
  7. Parametrized tests — 13 CLL tests now run with both [cll_per_node] and
    [cll_full_map]

danyelf and others added 2 commits March 23, 2026 12:16
- Add build_full_cll_map() to DbtAdapter: pre-computes CLL for every
  node in the manifest, includes change analysis, caches on instance
- Refactor get_cll() to slice from cached full map instead of
  computing per-request (no_cll path unchanged)
- Add full_map parameter to CLL API: returns complete unfiltered map
- Add tests for full map building, caching, change analysis, and API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
@danyelf danyelf force-pushed the worktree-unified-cll-call branch from ccd558e to fbbaacc Compare March 23, 2026 19:16
@danyelf
Copy link
Contributor Author

danyelf commented Mar 23, 2026

Code Review — PR #1221

Summary

Pre-computes the full column-level lineage map on first access and caches it on the adapter instance. Adds full_map API parameter to return the complete map, and a cll_full_map_enabled flag that makes per-node get_cll() calls slice from the cached map instead of computing individually. Well-structured change with good test coverage (5 new tests, 11 existing tests parametrized across both paths). CI is green.

Findings

[Warning] Race condition on _full_cll_map cache — not thread-safe

File: recce/adapter/dbt_adapter/__init__.py:989-1006 (in diff)
Issue: build_full_cll_map() uses a check-then-set pattern (if self._full_cll_map is not None: return) without synchronization. If two concurrent requests hit this method before the cache is populated, both will compute the full map simultaneously — wasting CPU but not corrupting data since the final assignment is atomic in CPython (GIL). However, FastAPI runs on an async event loop and get_cll() is called from an async def endpoint without await or thread pool dispatch, so concurrent requests to the same endpoint will actually serialize on the GIL. Net risk is low in practice, but worth noting since the existing get_cll_cached uses @lru_cache which is also not thread-safe but at least idempotent by design.

[Warning] cll_full_map_enabled parameter baked into adapter method signature

File: recce/adapter/dbt_adapter/__init__.py:996 (in diff)
Issue: get_cll() now takes cll_full_map_enabled as a parameter threaded from the server flag through the API endpoint. This means the adapter method's behavior depends on a server-level configuration flag passed per-call. The adapter already has self access — it could read the flag from its own state or from the context, keeping the method signature stable. As-is, every caller of get_cll() must know to pass this flag. The test fixture works around this with a patched_get_cll wrapper that injects kwargs.setdefault("cll_full_map_enabled", enabled), which is clever but confirms the ergonomic friction. This isn't blocking but could become a maintenance burden as more callers appear.

[Warning] child_map type inconsistency: set vs list

File: recce/adapter/dbt_adapter/__init__.py (build_full_cll_map, lines 71-76 in diff)
Issue: build_full_cll_map() builds child_map with set() values (line 75: child_map[parent] = set()), while the existing child_map construction in get_cll() (line 1129-1131 of current main) also uses sets. However, CllData.child_map is typed as Dict[str, Set[str]] so this is consistent. Just confirming no issue here — the parent_map entries copied in the slice path (line 184: parent_map[key] = set(full_map_data.parent_map[key])) correctly wrap in set() which is important since get_cll_cached returns parent_map values as lists.

[Info] full_map=True ignores change_analysis flag silently

File: recce/adapter/dbt_adapter/__init__.py:1081-1090 (in diff)
Issue: The comment at line 1082 documents this: "change_analysis flag is ignored — full map always includes change metadata." This is an intentional design decision and well-documented in the code comment. No action needed, but worth noting that this creates a subtle API contract difference — full_map=True always returns change data regardless of the change_analysis parameter. The PR description could mention this for future consumers.

Verification

  • All CI checks pass (18/19 — smoke-test-cloud failure is unrelated infrastructure issue)
  • 5 new tests cover: build_full_cll_map, caching, change analysis, unchanged model CLL, full_map API
  • 11 existing CLL tests parametrized with cll_full_map_enabled fixture to run both code paths

Verdict

No critical issues. The code is well-structured, thoroughly tested, and the caching strategy is sound. The two warnings above are minor ergonomic/design observations that don't block merge. Approved.

Rename --cll-full-map to --disable-cll-cache with inverted semantics:
the pre-cached full CLL map is now on by default. Users can opt out
via --disable-cll-cache or DISABLE_CLL_CACHE envvar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
@danyelf
Copy link
Contributor Author

danyelf commented Mar 24, 2026

Code Review — PR #1221

Summary

Pre-cached full CLL map with full_map API parameter. Well-structured caching layer that builds the full column-level lineage map once and slices from it on subsequent requests. All 771 tests pass, TypeScript types clean.

Findings

No critical issues found.

Architecture notes (informational, not blocking):

  • The caching approach is sound: build_full_cll_map() computes once per adapter lifecycle, get_cll() slices from the cached map via deepcopy, and all three invalidation points (load_artifacts, refresh, artifact changes) correctly clear the cache.
  • The parametrized test fixture (disable_cll_cache) is a nice pattern — it runs all 16 existing CLL tests through both the cached and per-node paths, providing strong equivalence coverage.
  • The refresh() refactor (lines 1640–1662) simplifies the cache invalidation logic by always clearing get_change_analysis_cached and _full_cll_map regardless of which artifact type changed, which is slightly more aggressive but safer than the previous per-artifact-type clearing.

Verdict

Approved — clean implementation with good test coverage across both code paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant