⚡️ Speed up function `find_last_node` by 16,072% #214

codeflash-ai · 2025-12-23T18:00:57Z

📄 16,072% (160.72x) speedup for `find_last_node` in `src/algorithms/graph.py`

⏱️ Runtime : 91.0 milliseconds → 563 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 160x speedup by eliminating a nested loop that created O(n*m) complexity where n = number of nodes and m = number of edges.

Key Optimization

Original approach: For each node, iterate through ALL edges to check if that node is a source

For each of the n nodes, checks all m edges: all(e["source"] != n["id"] for e in edges)
Time complexity: O(n * m)

Optimized approach: Pre-compute all source node IDs once into a set, then do O(1) lookups

Build a set of all source IDs: sources = {e["source"] for e in edges} - O(m)
Check each node against the set: n["id"] not in sources - O(1) per node
Time complexity: O(n + m)

Why This Is Faster

Set lookup is O(1) vs iterating through all edges which is O(m)
Single pass through edges instead of scanning them repeatedly for each node
Hash-based membership testing (in operator on sets) is dramatically faster than list iteration

Performance Impact by Test Case

The optimization shines particularly well with:

Large graphs with many edges (linear chains: 33,000% faster, complete graphs: 8,857% faster)
Graphs where the last node appears late in the nodes list (forces the original code to check many nodes)
Test cases show consistent 50-100% speedups on small graphs, but exponential gains (thousands of percent) on graphs with 500+ nodes/edges

Even on tiny graphs (2-3 nodes), the optimization provides 25-100% speedups, demonstrating the overhead of nested iteration even at small scales.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 41 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# 1. Basic Test Cases


def test_single_node_no_edges():
    # Only one node, no edges. Should return the node itself.
    nodes = [{"id": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 958ns (26.1% faster)


def test_two_nodes_one_edge():
    # Two nodes, one edge from A to B. Only B is not a source.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.96μs -> 1.17μs (67.8% faster)


def test_three_nodes_chain():
    # A -> B -> C. Only C is not a source.
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.42μs -> 1.21μs (99.9% faster)


def test_multiple_possible_last_nodes():
    # A->B, C->D. B and D are both possible last nodes, but function returns the first one found.
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [{"source": "A", "target": "B"}, {"source": "C", "target": "D"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 1.21μs (58.7% faster)


def test_cycle_graph():
    # A->B->C->A (cycle). All nodes are sources, so should return None.
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
        {"source": "C", "target": "A"},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.42μs -> 1.29μs (87.2% faster)


# 2. Edge Test Cases


def test_empty_nodes_and_edges():
    # No nodes, no edges. Should return None.
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 791ns -> 875ns (9.60% slower)


def test_edges_with_nonexistent_nodes():
    # Edges refer to nodes not present in nodes list. Should return all nodes as last nodes.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "X", "target": "A"}, {"source": "Y", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 1.17μs (35.7% faster)


def test_node_with_multiple_incoming_edges():
    # D has multiple incoming edges, but is not a source, so should be last node.
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [
        {"source": "A", "target": "D"},
        {"source": "B", "target": "D"},
        {"source": "C", "target": "D"},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.79μs -> 1.33μs (109% faster)


def test_node_with_self_loop():
    # Node with a self-loop is a source, so should not be last node.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 1.08μs (77.0% faster)


def test_all_nodes_are_sources():
    # Every node is a source in at least one edge, so should return None.
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
        {"source": "C", "target": "A"},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.29μs -> 1.25μs (83.4% faster)


def test_duplicate_edges():
    # Multiple edges with same source/target. Should not affect result.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}, {"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.96μs -> 1.25μs (56.6% faster)


def test_nodes_with_extra_properties():
    # Nodes have additional properties, should still match by id only.
    nodes = [{"id": "A", "val": 1}, {"id": "B", "val": 2}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 1.12μs (70.4% faster)


def test_edges_with_extra_properties():
    # Edges have extra properties, should not affect result.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B", "weight": 10}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.12μs (63.0% faster)


def test_node_id_is_int():
    # Node ids are integers, not strings.
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.12μs (62.9% faster)


def test_node_id_is_tuple():
    # Node ids are tuples.
    nodes = [{"id": (1, 2)}, {"id": (3, 4)}]
    edges = [{"source": (1, 2), "target": (3, 4)}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.04μs -> 1.21μs (69.0% faster)


# 3. Large Scale Test Cases


def test_large_linear_chain():
    # Large chain: 0->1->2->...->999. Only node 999 is not a source.
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.3ms -> 55.2μs (33094% faster)


def test_large_star_graph():
    # Star: 0->1, 0->2, ..., 0->999. All except 0 are not sources. Should return 1.
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": 0, "target": i} for i in range(1, N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 37.5μs -> 19.8μs (88.9% faster)


def test_large_disconnected_graph():
    # 500 isolated nodes, 500 in a chain. First 500 are not sources, so should return node 0.
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(500, N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 20.5μs -> 13.8μs (48.8% faster)


def test_large_complete_graph():
    # Complete graph: every node is a source. Should return None.
    N = 100
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": j} for i in range(N) for j in range(N) if i != j]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 17.0ms -> 190μs (8857% faster)


def test_large_graph_with_multiple_last_nodes():
    # Two separate chains, each of length 500. Both last nodes are not sources, should return first one.
    N = 500
    nodes = [{"id": f"A{i}"} for i in range(N)] + [{"id": f"B{i}"} for i in range(N)]
    edges = [{"source": f"A{i}", "target": f"A{i+1}"} for i in range(N - 1)] + [
        {"source": f"B{i}", "target": f"B{i+1}"} for i in range(N - 1)
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.84ms -> 71.0μs (6722% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# --- Basic Test Cases ---


def test_single_node_no_edges():
    # A single node with no edges should be returned as the last node
    nodes = [{"id": 1, "name": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 1.00μs (25.0% faster)


def test_two_nodes_one_edge():
    # Node 2 is not a source, so it should be returned
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.96μs -> 1.17μs (67.8% faster)


def test_three_nodes_linear_chain():
    # Only the last node in the chain is not a source
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.42μs -> 1.29μs (87.1% faster)


def test_multiple_last_nodes_returns_first():
    # Both 2 and 3 are not sources; function should return the first such node
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 1.12μs (66.7% faster)


def test_no_edges_multiple_nodes():
    # All nodes are not sources; should return the first node
    nodes = [{"id": "x"}, {"id": "y"}, {"id": "z"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 1.00μs (25.0% faster)


# --- Edge Test Cases ---


def test_empty_nodes_and_edges():
    # No nodes, no edges: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 792ns -> 834ns (5.04% slower)


def test_edges_but_no_nodes():
    # Edges but no nodes: should return None
    nodes = []
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 750ns -> 959ns (21.8% slower)


def test_all_nodes_are_sources():
    # All nodes are sources; none should be returned
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.25μs (46.7% faster)


def test_node_with_multiple_incoming_edges():
    # Node 3 has two incoming edges, but is not a source
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 3}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.29μs -> 1.25μs (83.3% faster)


def test_node_with_self_loop():
    # Node with a self-loop is a source, so should not be returned
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.12μs (62.9% faster)


def test_node_with_nonexistent_source():
    # Edge with a source not in nodes; should ignore and return node 1
    nodes = [{"id": 1}]
    edges = [{"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 1.08μs (38.4% faster)


def test_nodes_with_non_integer_ids():
    # IDs are strings
    nodes = [{"id": "foo"}, {"id": "bar"}]
    edges = [{"source": "foo", "target": "bar"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 1.17μs (64.2% faster)


def test_nodes_with_dict_ids():
    # IDs are tuples
    nodes = [{"id": (1, 2)}, {"id": (3, 4)}]
    edges = [{"source": (1, 2), "target": (3, 4)}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.00μs -> 1.21μs (65.6% faster)


def test_edges_with_extra_keys():
    # Edges have extra keys, should be ignored
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 1.12μs (66.7% faster)


def test_nodes_with_extra_keys():
    # Nodes have extra keys, should be preserved in output
    nodes = [{"id": 1, "meta": "x"}, {"id": 2, "meta": "y"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.12μs (62.9% faster)


def test_duplicate_nodes():
    # Duplicate node definitions; should return the first non-source node
    nodes = [{"id": 1}, {"id": 2}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.12μs (59.3% faster)


# --- Large Scale Test Cases ---


def test_large_linear_chain():
    # Large chain of 1000 nodes, each points to the next
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.3ms -> 55.5μs (32931% faster)


def test_large_tree_with_multiple_leaves():
    # Binary tree with 10 levels (1023 nodes), all leaves are not sources
    def build_binary_tree(levels):
        nodes = [{"id": i} for i in range(2**levels - 1)]
        edges = []
        for i in range(2 ** (levels - 1) - 1):
            edges.append({"source": i, "target": 2 * i + 1})
            edges.append({"source": i, "target": 2 * i + 2})
        return nodes, edges

    nodes, edges = build_binary_tree(10)
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 9.47ms -> 38.4μs (24546% faster)


def test_large_graph_all_sources():
    # All nodes are sources, so result should be None
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.3ms -> 54.6μs (33478% faster)


def test_large_graph_no_edges():
    # All nodes are not sources; should return the first node
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.42μs -> 1.04μs (36.0% faster)


def test_large_graph_multiple_last_nodes():
    # First 500 nodes are sources, last 500 are not
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(500)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.54ms -> 27.9μs (16170% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mjiw54c4 and push.

The optimized code achieves a **160x speedup** by eliminating a nested loop that created **O(n*m) complexity** where n = number of nodes and m = number of edges. ## Key Optimization **Original approach**: For each node, iterate through ALL edges to check if that node is a source - For each of the n nodes, checks all m edges: `all(e["source"] != n["id"] for e in edges)` - Time complexity: **O(n * m)** **Optimized approach**: Pre-compute all source node IDs once into a set, then do O(1) lookups - Build a set of all source IDs: `sources = {e["source"] for e in edges}` - O(m) - Check each node against the set: `n["id"] not in sources` - O(1) per node - Time complexity: **O(n + m)** ## Why This Is Faster 1. **Set lookup is O(1)** vs iterating through all edges which is O(m) 2. **Single pass through edges** instead of scanning them repeatedly for each node 3. **Hash-based membership testing** (`in` operator on sets) is dramatically faster than list iteration ## Performance Impact by Test Case The optimization shines particularly well with: - **Large graphs with many edges** (linear chains: 33,000% faster, complete graphs: 8,857% faster) - **Graphs where the last node appears late** in the nodes list (forces the original code to check many nodes) - Test cases show consistent 50-100% speedups on small graphs, but **exponential gains** (thousands of percent) on graphs with 500+ nodes/edges Even on tiny graphs (2-3 nodes), the optimization provides 25-100% speedups, demonstrating the overhead of nested iteration even at small scales.

codeflash-ai bot requested a review from KRRT7 December 23, 2025 18:01

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `find_last_node` by 16,072% #214

⚡️ Speed up function `find_last_node` by 16,072% #214

Uh oh!

codeflash-ai bot commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function find_last_node by 16,072% #214

Are you sure you want to change the base?

⚡️ Speed up function find_last_node by 16,072% #214

Uh oh!

Conversation

codeflash-ai bot commented Dec 23, 2025

📄 16,072% (160.72x) speedup for find_last_node in src/algorithms/graph.py

📝 Explanation and details

Key Optimization

Why This Is Faster

Performance Impact by Test Case

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `find_last_node` by 16,072% #214

⚡️ Speed up function `find_last_node` by 16,072% #214

📄 16,072% (160.72x) speedup for `find_last_node` in `src/algorithms/graph.py`