Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 11, 2025

📄 199% (1.99x) speedup for _fallback_prompt_line in pdd/trace.py

⏱️ Runtime : 41.2 milliseconds 13.8 milliseconds (best of 175 runs)

📝 Explanation and details

The optimized code achieves a 198% speedup through three key optimizations that target the most expensive operations identified in the profiler:

1. Pre-compiled Regular Expressions

  • Moves re.compile() calls to module level (_ws_regex, _nonword_regex) instead of compiling patterns on every function call
  • Eliminates repeated regex compilation overhead, particularly impactful in _normalize_text which is called frequently

2. String Translation Table

  • Replaces 3 chained .replace() calls with a single .translate() operation using a Unicode codepoint mapping
  • translate() is significantly faster than multiple replace() calls as it processes the string in one pass
  • Reduces character replacement time from ~18.5ms to ~6.1ms in _normalize_text

3. Set-based Token Deduplication

  • Converts the tokens list to a set before the inner loop in _fallback_prompt_line
  • Eliminates duplicate token lookups when the same token appears multiple times in the code string
  • Maintains the original substring matching logic (tok in normalized_line) since exact word boundaries aren't required

Performance Impact by Test Type:

  • Large-scale tests with many empty lines: 35-42% faster due to reduced normalization overhead
  • Long code strings with repeated tokens: Up to 1624% faster due to set deduplication
  • Basic cases: Mixed results (some 2-12% slower, others 8-27% faster) as the optimization overhead is more noticeable on small inputs, but the benefits scale with input complexity

The optimizations are most effective for scenarios with large prompt lists or code strings containing repeated tokens, which aligns with the substantial overall speedup observed.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 61 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re
from typing import List, Optional

# imports
import pytest  # used for our unit tests
from pdd.trace import _fallback_prompt_line

# unit tests

# ----------------- BASIC TEST CASES -----------------

def test_basic_exact_match():
    # Single prompt line matches code_str exactly
    prompts = ["def foo(): pass"]
    code = "def foo(): pass"
    codeflash_output = _fallback_prompt_line(prompts, code) # 10.7μs -> 11.4μs (5.60% slower)

def test_basic_partial_token_match():
    # Code string shares a token with the second prompt line
    prompts = ["print('hello')", "def bar(): pass"]
    code = "def bar(): pass"
    codeflash_output = _fallback_prompt_line(prompts, code) # 10.9μs -> 12.2μs (11.2% slower)

def test_basic_no_token_match_returns_first_nonempty():
    # No tokens match, should return first non-empty line
    prompts = ["", "   ", "something here", ""]
    code = "xyz"
    codeflash_output = _fallback_prompt_line(prompts, code) # 13.3μs -> 12.8μs (3.53% faster)

def test_basic_multiple_token_hits_prefers_most_hits():
    # Multiple lines match, but one has more token hits
    prompts = ["foo bar baz", "foo bar", "baz bar foo"]
    code = "foo bar baz"
    # All lines match all tokens, but first line comes first with max hits
    codeflash_output = _fallback_prompt_line(prompts, code) # 12.0μs -> 11.9μs (0.783% faster)

def test_basic_case_insensitive_matching():
    # Matching should be case-insensitive
    prompts = ["Print('HELLO')", "def Bar(): pass"]
    code = "def bar(): pass"
    codeflash_output = _fallback_prompt_line(prompts, code) # 10.8μs -> 11.3μs (4.68% slower)

# ----------------- EDGE TEST CASES -----------------

def test_edge_empty_prompt_lines_returns_1():
    # Empty prompt_lines should return 1
    prompts = []
    code = "anything"
    codeflash_output = _fallback_prompt_line(prompts, code) # 5.21μs -> 4.86μs (7.20% faster)

def test_edge_all_empty_prompt_lines_returns_1():
    # All prompt lines empty or whitespace
    prompts = ["", "   ", "\t", "\n"]
    code = "something"
    codeflash_output = _fallback_prompt_line(prompts, code) # 11.9μs -> 11.4μs (4.13% faster)

def test_edge_code_str_empty_returns_first_nonempty_prompt():
    # code_str is empty, should return first non-empty prompt line
    prompts = ["", "foo", "bar"]
    code = ""
    codeflash_output = _fallback_prompt_line(prompts, code) # 6.40μs -> 5.04μs (26.9% faster)

def test_edge_code_str_none_returns_first_nonempty_prompt():
    # code_str is None, should return first non-empty prompt line
    prompts = ["", "foo", "bar"]
    code = None
    codeflash_output = _fallback_prompt_line(prompts, code) # 5.78μs -> 4.61μs (25.3% faster)

def test_edge_prompt_lines_with_unicode_and_spaces():
    # Prompt lines include unicode quotes and non-breaking spaces
    prompts = ["\u201cfoo\u201d", "\u2018bar\u2019", "\u00A0baz\u00A0"]
    code = "foo"
    codeflash_output = _fallback_prompt_line(prompts, code) # 11.6μs -> 10.2μs (13.6% faster)
    code = "bar"
    codeflash_output = _fallback_prompt_line(prompts, code) # 6.11μs -> 5.63μs (8.58% faster)
    code = "baz"
    codeflash_output = _fallback_prompt_line(prompts, code) # 4.99μs -> 4.77μs (4.53% faster)

def test_edge_code_str_with_special_characters():
    # Code string contains special characters, should tokenize correctly
    prompts = ["foo_bar-baz", "bar.baz_foo"]
    code = "foo_bar-baz"
    codeflash_output = _fallback_prompt_line(prompts, code) # 9.70μs -> 9.85μs (1.53% slower)

def test_edge_prompt_lines_with_only_short_tokens():
    # Tokens shorter than 3 chars are ignored
    prompts = ["a b c", "de fg hi", "xyz"]
    code = "de fg hi"
    # Only 'xyz' is a token >=3 chars, but not in code_str, so return first non-empty
    codeflash_output = _fallback_prompt_line(prompts, code) # 8.09μs -> 7.60μs (6.35% faster)

def test_edge_prompt_lines_with_duplicate_tokens():
    # Multiple prompt lines share the same tokens, should pick first with max hits
    prompts = ["foo foo bar", "foo bar bar", "bar foo foo"]
    code = "foo bar"
    codeflash_output = _fallback_prompt_line(prompts, code) # 11.4μs -> 11.7μs (2.67% slower)

def test_edge_prompt_lines_with_leading_trailing_whitespace():
    # Prompt lines have leading/trailing whitespace
    prompts = ["   foo bar   ", "\tbar foo\n"]
    code = "foo bar"
    codeflash_output = _fallback_prompt_line(prompts, code) # 9.51μs -> 9.51μs (0.032% faster)

def test_edge_code_str_with_mixed_whitespace():
    # Code string has mixed whitespace, should normalize
    prompts = ["foo bar", "bar foo"]
    code = "  foo   bar  "
    codeflash_output = _fallback_prompt_line(prompts, code) # 9.68μs -> 9.47μs (2.19% faster)

def test_edge_no_nonempty_prompt_lines_returns_1():
    # All prompt lines are empty after normalization
    prompts = ["", "   ", "\u00A0", "\n"]
    code = "irrelevant"
    codeflash_output = _fallback_prompt_line(prompts, code) # 13.0μs -> 11.8μs (10.2% faster)

def test_edge_tokens_with_numbers_and_underscores():
    # Tokens with numbers and underscores should be split and matched
    prompts = ["foo_123_bar", "bar_456_foo"]
    code = "foo_123_bar"
    codeflash_output = _fallback_prompt_line(prompts, code) # 8.99μs -> 9.23μs (2.64% slower)

def test_edge_code_str_with_only_short_tokens():
    # Code string only has tokens < 3 chars, so fallback to first non-empty
    prompts = ["abc", "def"]
    code = "a b c"
    codeflash_output = _fallback_prompt_line(prompts, code) # 7.28μs -> 6.59μs (10.4% faster)

def test_edge_prompt_lines_with_none_values():
    # Prompt lines contain None values (should normalize to empty string)
    prompts = [None, "foo", "bar"]
    code = "foo"
    codeflash_output = _fallback_prompt_line(prompts, code) # 8.45μs -> 8.13μs (3.96% faster)


def test_large_scale_many_prompt_lines_token_match():
    # Large number of prompt lines, only one matches the code_str tokens
    prompts = ["line{}".format(i) for i in range(1, 1001)]
    # Pick a line in the middle to match
    code = "line500"
    codeflash_output = _fallback_prompt_line(prompts, code) # 846μs -> 940μs (10.0% slower)

def test_large_scale_many_prompt_lines_no_token_match():
    # Large number of prompt lines, none match code_str tokens
    prompts = ["foo{}".format(i) for i in range(1, 1001)]
    code = "bar"
    # Should return first non-empty prompt line
    codeflash_output = _fallback_prompt_line(prompts, code) # 838μs -> 866μs (3.25% slower)

def test_large_scale_all_empty_prompt_lines():
    # Large number of empty prompt lines
    prompts = [""] * 1000
    code = "anything"
    codeflash_output = _fallback_prompt_line(prompts, code) # 1.13ms -> 807μs (40.2% faster)

def test_large_scale_multiple_max_token_hits():
    # Several prompt lines have the same max token hits, should pick the first
    prompts = ["foo bar baz"] * 1000
    code = "foo bar baz"
    codeflash_output = _fallback_prompt_line(prompts, code) # 1.21ms -> 1.30ms (7.02% slower)

def test_large_scale_first_nonempty_prompt_line():
    # First non-empty prompt line is far into the list
    prompts = [""] * 999 + ["foo"]
    code = "no match"
    codeflash_output = _fallback_prompt_line(prompts, code) # 1.14ms -> 806μs (41.0% faster)

def test_large_scale_long_code_str_token_match():
    # Very long code_str matches a prompt line
    prompts = ["foo bar baz"] * 999 + ["verylongtoken"]
    code = "verylongtoken"
    codeflash_output = _fallback_prompt_line(prompts, code) # 1.08ms -> 1.19ms (9.54% slower)

def test_large_scale_performance():
    # Performance test: large prompt_lines, should not hang
    prompts = ["foo bar"] * 1000
    code = "foo bar"
    codeflash_output = _fallback_prompt_line(prompts, code); result = codeflash_output # 1.03ms -> 1.07ms (3.70% slower)

# ----------------- ADDITIONAL EDGE CASES -----------------

def test_edge_prompt_lines_with_only_whitespace_and_code_str_empty():
    # All prompt lines empty/whitespace and code_str empty
    prompts = [" ", "\t", "\n"]
    code = ""
    codeflash_output = _fallback_prompt_line(prompts, code) # 6.65μs -> 5.41μs (22.8% faster)

def test_edge_prompt_lines_with_special_unicode_and_code_str():
    # Prompt lines with unicode quotes and code_str with unicode
    prompts = ["\u201cfoo\u201d", "\u2018bar\u2019"]
    code = "\u201cfoo\u201d"
    codeflash_output = _fallback_prompt_line(prompts, code) # 10.7μs -> 9.55μs (12.4% faster)

def test_edge_code_str_with_nonbreaking_space():
    # Code string contains non-breaking space, should normalize
    prompts = ["foo bar", "bar foo"]
    code = "foo\u00A0bar"
    codeflash_output = _fallback_prompt_line(prompts, code) # 10.2μs -> 9.53μs (7.20% faster)

def test_edge_code_str_with_mixed_case_and_unicode():
    # Code string with mixed case and unicode quotes
    prompts = ["foo bar", "bar foo"]
    code = "\u201cFOO BAR\u201d"
    codeflash_output = _fallback_prompt_line(prompts, code) # 10.8μs -> 10.2μs (6.21% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
from typing import List, Optional

# imports
import pytest  # used for our unit tests
from pdd.trace import _fallback_prompt_line

# unit tests

# ------------------ Basic Test Cases ------------------

def test_basic_exact_match():
    # Test when code_str matches a prompt line exactly
    prompt_lines = ["def foo():", "return bar"]
    code_str = "return bar"
    # "return" and "bar" are tokens, both in line 2
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 10.2μs -> 10.5μs (2.42% slower)

def test_basic_partial_match():
    # Test when code_str partially matches a prompt line
    prompt_lines = ["def foo():", "return bar"]
    code_str = "bar"
    # Only "bar" matches line 2
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 8.97μs -> 8.90μs (0.753% faster)

def test_basic_no_match_returns_first_nonempty():
    # Test when code_str matches none, returns first non-empty line
    prompt_lines = ["", "   ", "def foo():", "return bar"]
    code_str = "baz"
    # No token matches, so first non-empty line is "def foo():", index 3
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 12.9μs -> 13.3μs (3.38% slower)

def test_basic_empty_code_str():
    # Test with empty code_str, returns first non-empty line
    prompt_lines = ["", "def foo():", "return bar"]
    code_str = ""
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 6.85μs -> 5.86μs (16.8% faster)

def test_basic_case_insensitivity():
    # Test that matching is case-insensitive
    prompt_lines = ["def Foo():", "Return Bar"]
    code_str = "return bar"
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 9.99μs -> 10.4μs (3.52% slower)

# ------------------ Edge Test Cases ------------------

def test_edge_all_empty_prompt_lines():
    # All prompt lines are empty or whitespace
    prompt_lines = ["", "   ", "\t"]
    code_str = "anything"
    # All lines normalize to empty, so returns 1
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 10.9μs -> 10.2μs (7.31% faster)

def test_edge_code_str_none():
    # code_str is None, should behave as empty string
    prompt_lines = ["def foo():", "return bar"]
    code_str = None
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 5.61μs -> 4.65μs (20.6% faster)

def test_edge_prompt_lines_none():
    # prompt_lines contains None values
    prompt_lines = [None, "def foo():", "return bar"]
    code_str = "foo"
    # None normalizes to "", so first non-empty is index 2
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 9.63μs -> 9.59μs (0.407% faster)

def test_edge_unicode_normalization():
    # Prompt lines and code_str with unicode quotes and non-breaking spaces
    prompt_lines = ["def foo():", "return\u00A0bar", "“quoted”"]
    code_str = "return bar"
    # "return bar" matches line 2 after normalization
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 12.7μs -> 12.6μs (1.19% faster)

def test_edge_short_tokens_ignored():
    # Tokens shorter than 3 chars are ignored
    prompt_lines = ["def foo():", "do it"]
    code_str = "do it"
    # "do" and "it" are <3 chars, so no tokens, returns first non-empty line
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 7.26μs -> 6.70μs (8.50% faster)

def test_edge_punctuation_only_code_str():
    # code_str is only punctuation, no tokens
    prompt_lines = ["def foo():", "return bar"]
    code_str = "!!!"
    # No tokens, returns first non-empty line
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 6.95μs -> 6.04μs (15.0% faster)

def test_edge_multiple_lines_with_same_hits():
    # Multiple prompt lines have same number of token hits, should pick first
    prompt_lines = ["foo bar baz", "bar baz foo", "baz foo bar"]
    code_str = "foo bar"
    # "foo" and "bar" in all lines, but should pick first with max hits
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 11.4μs -> 12.1μs (6.08% slower)

def test_edge_empty_prompt_lines_list():
    # prompt_lines is an empty list
    prompt_lines = []
    code_str = "foo"
    # No lines, so should default to 1
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 4.89μs -> 4.28μs (14.1% faster)

def test_edge_prompt_lines_with_whitespace_and_none():
    # prompt_lines with mix of whitespace and None
    prompt_lines = ["   ", None, "\n", "\t", "foo"]
    code_str = "bar"
    # First non-empty normalized line is "foo", index 5
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 12.8μs -> 11.7μs (9.14% faster)

def test_edge_tokens_with_numbers():
    # Tokens with numbers, should match as usual
    prompt_lines = ["foo123 bar456", "baz789"]
    code_str = "bar456"
    # "bar456" matches line 1
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 9.29μs -> 9.10μs (2.04% faster)

def test_edge_tokens_with_non_ascii_characters():
    # Non-ascii tokens, should be split and matched
    prompt_lines = ["façade résumé", "naïve café"]
    code_str = "résumé"
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 11.4μs -> 12.3μs (7.69% slower)

def test_edge_tokens_with_apostrophes_and_quotes():
    # Unicode apostrophes/quotes normalized
    prompt_lines = ["‘foo’", '“bar”']
    code_str = "foo"
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 9.50μs -> 8.44μs (12.6% faster)

# ------------------ Large Scale Test Cases ------------------

def test_large_scale_many_prompt_lines_one_match():
    # 1000 prompt lines, only one matches
    prompt_lines = ["line {}".format(i) for i in range(1000)]
    code_str = "line 789"
    # Should match line 790 (1-based index)
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 1.03ms -> 1.15ms (10.3% slower)

def test_large_scale_many_prompt_lines_none_match():
    # 1000 prompt lines, none match, first non-empty line is returned
    prompt_lines = [""] * 500 + ["first nonempty"] + [""] * 499
    code_str = "no match here"
    # Should return index 501
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 915μs -> 675μs (35.5% faster)

def test_large_scale_long_code_str():
    # Very long code_str with many tokens
    prompt_lines = ["foo bar baz"] * 10 + ["special token"] + ["foo bar baz"] * 989
    code_str = " ".join(["foo", "bar", "baz"] * 300) + " special token"
    # "special token" matches line 11
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 28.0ms -> 1.63ms (1624% faster)

def test_large_scale_all_empty_prompt_lines():
    # 1000 empty prompt lines
    prompt_lines = [""] * 1000
    code_str = "foo"
    # All lines empty, returns 1
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 1.14ms -> 804μs (41.5% faster)

def test_large_scale_first_nonempty_at_end():
    # First non-empty line is at the end
    prompt_lines = [""] * 999 + ["foo"]
    code_str = "bar"
    # Should return index 1000
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 1.14ms -> 803μs (41.6% faster)

def test_large_scale_multiple_max_hits():
    # Multiple lines with same max hits, should pick first
    prompt_lines = ["foo bar baz"] * 1000
    code_str = "foo bar baz"
    # All lines have 3 hits, should pick first
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 1.21ms -> 1.30ms (7.17% slower)

# ------------------ Determinism Test ------------------

def test_determinism_repeated_calls():
    # Repeated calls with same input should yield same output
    prompt_lines = ["def foo():", "return bar"]
    code_str = "return bar"
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str); result1 = codeflash_output # 10.00μs -> 10.5μs (5.15% slower)
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str); result2 = codeflash_output # 5.53μs -> 5.37μs (3.07% faster)

# ------------------ Miscellaneous ------------------

def test_mixed_empty_and_nonempty_lines():
    # Mix of empty and non-empty lines, no token match
    prompt_lines = ["", "", "foo", "", "bar"]
    code_str = "baz"
    # First non-empty line is "foo", index 3
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 12.3μs -> 11.3μs (9.16% faster)

def test_code_str_with_leading_trailing_spaces():
    # code_str with spaces should be normalized
    prompt_lines = ["def foo():", "return bar"]
    code_str = "  return bar   "
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 9.96μs -> 10.2μs (2.43% slower)

def test_prompt_lines_with_leading_trailing_spaces():
    # prompt_lines with spaces should be normalized
    prompt_lines = ["   def foo():   ", "   return bar   "]
    code_str = "return bar"
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 9.74μs -> 10.1μs (3.96% slower)

def test_code_str_with_newlines_and_tabs():
    # code_str with newlines and tabs should be normalized
    prompt_lines = ["def foo():", "return bar"]
    code_str = "\nreturn\tbar\n"
    codeflash_output = _fallback_prompt_line(prompt_lines, code_str) # 9.84μs -> 9.76μs (0.799% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from pdd.trace import _fallback_prompt_line

def test__fallback_prompt_line():
    _fallback_prompt_line(['', '“'], '')

def test__fallback_prompt_line_2():
    _fallback_prompt_line([], '')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_diinpk0o/tmphyp2u3c7/test_concolic_coverage.py::test__fallback_prompt_line 7.32μs 5.60μs 30.7%✅
codeflash_concolic_diinpk0o/tmphyp2u3c7/test_concolic_coverage.py::test__fallback_prompt_line_2 4.15μs 2.97μs 39.7%✅

To edit these changes git checkout codeflash/optimize-_fallback_prompt_line-mgmximsn and push.

Codeflash

The optimized code achieves a **198% speedup** through three key optimizations that target the most expensive operations identified in the profiler:

**1. Pre-compiled Regular Expressions**
- Moves `re.compile()` calls to module level (`_ws_regex`, `_nonword_regex`) instead of compiling patterns on every function call
- Eliminates repeated regex compilation overhead, particularly impactful in `_normalize_text` which is called frequently

**2. String Translation Table**
- Replaces 3 chained `.replace()` calls with a single `.translate()` operation using a Unicode codepoint mapping
- `translate()` is significantly faster than multiple `replace()` calls as it processes the string in one pass
- Reduces character replacement time from ~18.5ms to ~6.1ms in `_normalize_text`

**3. Set-based Token Deduplication**
- Converts the tokens list to a `set` before the inner loop in `_fallback_prompt_line`
- Eliminates duplicate token lookups when the same token appears multiple times in the code string
- Maintains the original substring matching logic (`tok in normalized_line`) since exact word boundaries aren't required

**Performance Impact by Test Type:**
- **Large-scale tests with many empty lines**: 35-42% faster due to reduced normalization overhead
- **Long code strings with repeated tokens**: Up to 1624% faster due to set deduplication
- **Basic cases**: Mixed results (some 2-12% slower, others 8-27% faster) as the optimization overhead is more noticeable on small inputs, but the benefits scale with input complexity

The optimizations are most effective for scenarios with large prompt lists or code strings containing repeated tokens, which aligns with the substantial overall speedup observed.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 11, 2025 23:51
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant