feat(cll): client-side CLL filtering with full map API#1244
Draft
feat(cll): client-side CLL filtering with full map API#1244
Conversation
Pure function that replicates the backend's anchor + BFS filtering on a full CLL map, enabling client-side filtering without per-click API calls. Handles three modes: - Impact overview (no params): returns full map unchanged - Node-level: anchors based on change_analysis (changed columns as BFS seeds for partial_breaking, node itself for breaking/unknown) - Column-level: BFS from anchor column through parent/child maps Includes 45 tests: 15 unit tests + 19 no-change equivalence tests + 11 diff equivalence tests covering non_breaking (added column), partial_breaking (modified column def), and breaking (WHERE clause). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>
One API call fetches the complete CLL map (all nodes with column
lineage and change analysis). All subsequent navigation — impact
overview, node clicks, column clicks — slices client-side via
sliceCllMap() with zero additional API calls.
Backend changes:
- Add full_map param to CllIn, passed as no_filter to get_cll()
- When full_map=true and no node_id, compute CLL for ALL manifest
nodes (not just changed nodes) so the frontend can slice any path
Frontend changes:
- LineageViewOss: cache full map in ref, slice client-side on clicks,
invalidate only on external lineageGraph changes (not our own patch)
- sliceCllMap: add impact overview filtering (changed node anchors +
BFS), fix upstream/downstream BFS cross-contamination by using
separate visited sets per direction (bfsFromAnchors)
- Full map request uses only {change_analysis, full_map} — no node_id
or directional params
76 equivalence tests (impact + 7 nodes × 5 cols each) confirm
client-side slicing matches server responses exactly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
- buildSlice: shallow-clone nodes, set impacted flag (true for BFS-reachable, false for extra nodes), filter node.columns to only include columns in the reachable set - assertEquivalent: now checks impacted and node.columns per node - Extract fetchAndCacheFullMap helper in LineageViewOss (was duplicated in useLayoutEffect and refreshLayout) - Remove 8 orphaned diff fixtures (no-upstream/no-downstream variants and unreferenced impacted node fixtures) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>
as_manifest() was called inside get_cll_cached() for every node, converting WritableManifest→Manifest and triggering mashumaro deserialization of ALL manifest nodes each time. On a 1200-model project this made full_map CLL take 11+ minutes. Fix: use the already-deserialized self.manifest / self.previous_state.manifest instead of re-converting per node. Also replace deepcopy with shallow copy of only the nodes/columns that get mutated (change_status fields), and increase lru_cache from 128 to 4096 to cover large projects. Result on 1207-model anonymized project: - full_map cold: 694s → 1.8s (385x faster) - full_map warm: 713s → 0.23s (3100x faster) - impact overview: 233s → 0.43s (542x faster) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Every time a user clicks a node or column in the CLL (Column-Level Lineage) view, the frontend fires a separate API call to the backend. Each call computes CLL for that specific node — involving SQL parsing, dependency resolution, and manifest traversal. On large projects this creates noticeable latency on every click, and the UX feels sluggish as users explore their lineage graph.
Additionally, we discovered that
get_cll_cached()was callingas_manifest()on every invocation, which converts aWritableManifestto aManifestby deserializing every node in the manifest through mashumaro. On a 1200-model project, this made CLL computation take 11+ minutes for a full map request.Solution
One API call, instant navigation
A new
full_mapparameter on the CLL API tells the backend to compute CLL for all manifest nodes in a single request. The frontend caches this full map and uses a pure client-side function (sliceCllMap) to extract exactly the nodes/columns/parent_map needed for any given view — impact overview, node-level, or column-level.After the initial load, every subsequent click (changing selected node, drilling into a column, toggling upstream/downstream) is instant — zero API calls, pure JavaScript object slicing.
sliceCllMapfaithfully replicates the backend's anchor + BFS logic:385x performance fix
Replaced
as_manifest(self.get_manifest(base))insideget_cll_cached()with the already-deserializedself.manifest. The old code was converting the entire WritableManifest → Manifest (triggering mashumaro__mashumaro_from_dict__for every node in the manifest) on every single node in the CLL loop. For a 1200-model project with ~1800 nodes, that's 1800 full-manifest deserializations.Also replaced
deepcopy()of the entire CLL result with targeted shallowcopy()of only the node/column objects whosechange_statusfields get mutated, and bumpedlru_cachefrom 128 → 4096 to avoid cache thrashing on large projects.Perf results (1207-model anonymized project)
Results verified identical (0 diffs across 1796 nodes, 15520 columns, 17316 parent_map entries).
Changes
Backend
server.py:full_mapparam onCllIn, passed asno_filtertoget_cll()dbt_adapter/__init__.py:no_filter=Trueandnode_id=None, computes CLL for ALL manifest nodesas_manifest()→self.manifest/self.previous_state.manifestdeepcopy→ targetedcopy()on mutated fields onlylru_cache(128)→lru_cache(4096)Frontend
sliceCllMap.ts: Pure function — BFS from anchors,buildSlicefor shallow-clone + filteringLineageViewOss.tsx:fullCllMapRefcaches full map,fetchAndCacheFullMaphelper, cache invalidation guardcll.ts:full_mapfield onCllInputTests
sliceCllMap(fullMap, params)against real server responses_set_compiled_codetest helper to patch both WritableManifest and ManifestTest plan
Checklist
🤖 Generated with Claude Code