⚡️ Speed up function find_last_node by 22,046%
#224
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 22,046% (220.46x) speedup for
find_last_nodeinsrc/algorithms/graph.py⏱️ Runtime :
78.4 milliseconds→354 microseconds(best of250runs)📝 Explanation and details
The optimization achieves a 220x speedup by replacing a nested loop with a set-based lookup, dramatically reducing algorithmic complexity from O(n*m) to O(n+m) where n is the number of nodes and m is the number of edges.
Key Changes:
Pre-compute edge sources as a set: Instead of iterating through all edges for each node (nested loop), the optimized code builds a set of all source node IDs upfront:
edge_sources = {e["source"] for e in edges}. This single pass through edges takes O(m) time.O(1) membership test: Set membership (
n["id"] not in edge_sources) is O(1) average case, versus the original's O(m) check usingall(e["source"] != n["id"] for e in edges)which had to compare against every edge.Edge case handling: The code explicitly handles the empty edges case to match original behavior - when there are no edges, it returns the first node without attempting to access
n["id"], avoiding potential KeyErrors on malformed input.Why It's Faster:
The original code had quadratic-like behavior: for each node, it scanned all edges. With 1000 nodes and 999 edges (linear chain), this meant ~1 million comparisons. The optimized version does 999 edge scans + 1000 set lookups = ~2000 operations total.
Performance Characteristics:
This optimization is particularly valuable when
find_last_nodeis called frequently or on larger graphs, as the speedup scales superlinearly with input size.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-find_last_node-mjj73ysoand push.