⚡️ Speed up method DbtAdapter.build_parent_map by 3,421%#861
⚡️ Speed up method DbtAdapter.build_parent_map by 3,421%#861misrasaurabh1 wants to merge 4 commits intoDataRecce:mainfrom
DbtAdapter.build_parent_map by 3,421%#861Conversation
The optimized code achieves a dramatic **3420% speedup** by addressing two critical performance bottlenecks: **Primary Optimization: Avoiding expensive `to_dict()` conversion** - The original code calls `manifest.to_dict()` which consumes 98.4% of the total runtime (499ms out of 507ms) - The optimized version attempts to access `manifest.parent_map` directly, falling back to `to_dict()` only if needed - This eliminates the massive serialization overhead when the parent_map is already available as an attribute **Secondary Optimization: Converting to set for O(1) lookups** - Changes `node_ids = nodes.keys()` to `node_ids = set(nodes)` - Set membership tests (`if k not in node_ids`) are O(1) instead of O(n) for dict_keys views - While this optimization shows smaller gains in the profiler, it provides consistent performance improvements for larger node sets **Performance Impact:** - Total runtime drops from 41.5ms to 1.18ms - The `manifest.to_dict()` bottleneck is completely eliminated in the common case - Set-based membership checks are more efficient, especially as the number of nodes scales **Test Case Benefits:** The optimizations are particularly effective for: - Large dbt projects with many nodes (where set lookups show greater advantage) - Repeated calls to `build_parent_map` (avoiding repeated expensive dict conversions) - Any scenario where the manifest object already has a `parent_map` attribute available The try/except ensures backward compatibility while maximizing performance gains in the common case.
Codecov Report❌ Patch coverage is
... and 9 files with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Hi @misrasaurabh1,
Thanks for the new optimization PR!
Except for the minor comment, others are good to me.
Could you please also fix the flake8 format issue and include your sign-off in the commit to pass the DCO check?
Thanks!
| parent_map_source = manifest.to_dict()["parent_map"] | ||
|
|
||
| node_ids = nodes.keys() | ||
| node_ids = set(nodes) |
There was a problem hiding this comment.
The keys view is set-like, so I think we don't need to change it to set to improve the lookup time.
There was a problem hiding this comment.
Sorry, I mean we can keep the original node_ids = nodes.keys()
|
should be ready to merge |
|
Hi misrasaurabh1, And we need a sign-off in the commit messages to pass the DCO check. Thanks! |
📄 3,421% (34.21x) speedup for
DbtAdapter.build_parent_mapinrecce/adapter/dbt_adapter/__init__.py⏱️ Runtime :
41.5 milliseconds→1.18 milliseconds(best of21runs)📝 Explanation and details
The optimized code achieves a dramatic 3420% speedup by addressing two critical performance bottlenecks:
Primary Optimization: Avoiding expensive
to_dict()conversionmanifest.to_dict()which consumes 98.4% of the total runtime (499ms out of 507ms)manifest.parent_mapdirectly, falling back toto_dict()only if neededSecondary Optimization: Converting to set for O(1) lookups
node_ids = nodes.keys()tonode_ids = set(nodes)if k not in node_ids) are O(1) instead of O(n) for dict_keys viewsPerformance Impact:
manifest.to_dict()bottleneck is completely eliminated in the common caseTest Case Benefits:
The optimizations are particularly effective for:
build_parent_map(avoiding repeated expensive dict conversions)parent_mapattribute availableThe try/except ensures backward compatibility while maximizing performance gains in the common case.
✅ Correctness verification report:
⏪ Replay Tests and Runtime
test_pytest_tests__replay_test_0.py::test_recce_adapter_dbt_adapter___init___DbtAdapter_build_parent_mapTo edit these changes
git checkout codeflash/optimize-DbtAdapter.build_parent_map-med9gwbsand push.