⚡️ Speed up function `get_extension` by 316% #11

codeflash-ai · 2025-10-12T00:40:42Z

📄 316% (3.16x) speedup for `get_extension` in `pdd/sync_determine_operation.py`

⏱️ Runtime : 3.10 milliseconds → 745 microseconds (best of 163 runs)

📝 Explanation and details

The key optimization is moving the dictionary definition from inside the function to module scope as _EXTENSIONS. This eliminates the overhead of recreating a 24-key dictionary on every function call.

What changed:

Moved the extensions dictionary outside the function as a module-level constant _EXTENSIONS
The function now simply performs a dictionary lookup without recreating the dictionary

Why this is faster:
In the original code, Python had to allocate memory and construct a dictionary with 24 key-value pairs every time get_extension() was called. The line profiler shows this dictionary creation took 61.5% of the total execution time (22.1ms out of 36ms). With the optimization, the dictionary is created once at module import time and reused for all function calls.

Performance characteristics:

Provides consistent 2-3x speedup across all test cases (200-380% faster)
Particularly effective for high-frequency calls - the more times the function is called, the greater the cumulative benefit
Speedup is independent of whether the language is known or unknown, since the bottleneck was dictionary creation, not lookup
Works well for both single calls and batch processing scenarios (large scale tests show 300-380% improvements)

This optimization is most beneficial in scenarios where get_extension() is called repeatedly, such as processing multiple files or batch operations, which is evident from the large-scale test results.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 3320 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 1 Passed
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import pytest  # used for our unit tests
from pdd.sync_determine_operation import get_extension

# =========================
# Basic Test Cases
# =========================

def test_basic_python():
    # Test basic known language
    codeflash_output = get_extension('python') # 2.81μs -> 869ns (224% faster)

def test_basic_javascript():
    # Test another known language
    codeflash_output = get_extension('javascript') # 2.17μs -> 676ns (221% faster)

def test_basic_java():
    # Test another known language
    codeflash_output = get_extension('java') # 2.10μs -> 629ns (233% faster)

def test_basic_case_insensitivity():
    # Test case insensitivity
    codeflash_output = get_extension('Python') # 2.24μs -> 680ns (230% faster)
    codeflash_output = get_extension('PYTHON') # 1.31μs -> 350ns (276% faster)
    codeflash_output = get_extension('JaVaScRiPt') # 1.09μs -> 326ns (236% faster)

def test_basic_multiple_known_languages():
    # Test several known languages in one go
    known_languages = {
        'typescript': 'ts',
        'cpp': 'cpp',
        'c': 'c',
        'ruby': 'rb',
        'go': 'go',
        'rust': 'rs',
        'php': 'php',
        'swift': 'swift',
        'kotlin': 'kt',
        'scala': 'scala',
        'csharp': 'cs',
        'css': 'css',
        'html': 'html',
        'sql': 'sql',
        'shell': 'sh',
        'bash': 'sh',
        'powershell': 'ps1',
        'r': 'r',
        'matlab': 'm',
        'lua': 'lua',
        'perl': 'pl',
    }
    for lang, ext in known_languages.items():
        codeflash_output = get_extension(lang) # 21.4μs -> 5.66μs (278% faster)

# =========================
# Edge Test Cases
# =========================

def test_edge_unknown_language():
    # Unknown language returns the lowercased input
    codeflash_output = get_extension('unknownlang') # 2.22μs -> 689ns (222% faster)
    codeflash_output = get_extension('foo') # 1.31μs -> 402ns (227% faster)

def test_edge_case_insensitive_unknown():
    # Unknown language with mixed case returns lowercased input
    codeflash_output = get_extension('UnKnOwNlAnG') # 2.00μs -> 631ns (216% faster)
    codeflash_output = get_extension('FOO') # 1.30μs -> 376ns (245% faster)

def test_edge_empty_string():
    # Empty string returns empty string
    codeflash_output = get_extension('') # 1.97μs -> 579ns (241% faster)

def test_edge_whitespace_string():
    # Whitespace string returns whitespace string (lowercased, but whitespace unaffected)
    codeflash_output = get_extension('   ') # 2.03μs -> 566ns (258% faster)

def test_edge_numeric_string():
    # Numeric string returns numeric string
    codeflash_output = get_extension('12345') # 2.15μs -> 601ns (257% faster)

def test_edge_special_characters():
    # Special characters string returns lowercased special characters
    codeflash_output = get_extension('!@#$%^&*()') # 2.19μs -> 595ns (268% faster)
    codeflash_output = get_extension('PyTh0n!') # 1.42μs -> 402ns (252% faster)

def test_edge_language_with_spaces():
    # Language name with spaces returns lowercased input
    codeflash_output = get_extension('python script') # 2.05μs -> 612ns (235% faster)
    codeflash_output = get_extension('C Sharp') # 1.41μs -> 387ns (263% faster)
    codeflash_output = get_extension('C#') # 1.11μs -> 301ns (270% faster)

def test_edge_language_with_leading_trailing_spaces():
    # Leading/trailing spaces are not stripped
    codeflash_output = get_extension(' python ') # 2.21μs -> 607ns (264% faster)
    codeflash_output = get_extension('  java  ') # 1.28μs -> 301ns (325% faster)

def test_edge_language_with_underscore():
    # Language name with underscore returns lowercased input
    codeflash_output = get_extension('not_a_language') # 2.22μs -> 685ns (224% faster)

def test_edge_language_with_dash():
    # Language name with dash returns lowercased input
    codeflash_output = get_extension('not-a-language') # 2.15μs -> 629ns (243% faster)

def test_edge_language_with_dot():
    # Language name with dot returns lowercased input
    codeflash_output = get_extension('python.script') # 2.38μs -> 652ns (265% faster)

def test_edge_language_is_none():
    # None is not a valid input, should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension(None) # 2.92μs -> 1.30μs (124% faster)

def test_edge_language_is_int():
    # Integer is not a valid input, should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension(123) # 2.89μs -> 1.22μs (136% faster)

def test_edge_language_is_list():
    # List is not a valid input, should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension(['python']) # 2.88μs -> 1.12μs (156% faster)

def test_edge_language_is_dict():
    # Dict is not a valid input, should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension({'language': 'python'}) # 2.77μs -> 1.10μs (152% faster)

# =========================
# Large Scale Test Cases
# =========================

def test_large_scale_known_languages():
    # Test all known languages in upper, lower, and mixed case
    known_languages = [
        'python', 'javascript', 'typescript', 'java', 'cpp', 'c', 'ruby', 'go', 'rust', 'php',
        'swift', 'kotlin', 'scala', 'csharp', 'css', 'html', 'sql', 'shell', 'bash', 'powershell',
        'r', 'matlab', 'lua', 'perl'
    ]
    for lang in known_languages:
        # Lowercase
        codeflash_output = get_extension(lang) # 24.2μs -> 6.88μs (251% faster)
        # Uppercase
        codeflash_output = get_extension(lang.upper())
        # Mixed case
        mixed = ''.join([c.upper() if i % 2 == 0 else c.lower() for i, c in enumerate(lang)]) # 22.3μs -> 5.36μs (317% faster)
        codeflash_output = get_extension(mixed)

def test_large_scale_many_unknown_languages():
    # Test a large number of unknown languages
    for i in range(1000):
        fake_lang = f"unknownlang{i}"
        codeflash_output = get_extension(fake_lang) # 894μs -> 213μs (320% faster)

def test_large_scale_long_strings():
    # Test very long language names
    long_lang = "python" * 100  # 600 chars
    codeflash_output = get_extension(long_lang) # 3.56μs -> 1.80μs (97.8% faster)
    long_unknown = "unknown" * 120  # 840 chars
    codeflash_output = get_extension(long_unknown) # 2.36μs -> 1.34μs (75.7% faster)

def test_large_scale_random_mixed_inputs():
    # Mix known and unknown, various cases
    known = ['python', 'java', 'cpp', 'php']
    for i in range(500):
        if i % 2 == 0:
            # Known language, random case
            lang = known[i % len(known)]
            mixed = ''.join([c.upper() if j % 2 == 0 else c.lower() for j, c in enumerate(lang)])
            codeflash_output = get_extension(mixed)
        else:
            # Unknown language
            fake_lang = f"lang{i}X"
            codeflash_output = get_extension(fake_lang)

def test_large_scale_all_ascii():
    # Test all printable ASCII characters as language names
    import string
    chars = string.printable
    for c in chars:
        codeflash_output = get_extension(c) # 91.9μs -> 21.9μs (320% faster)


#------------------------------------------------
import pytest  # used for our unit tests
from pdd.sync_determine_operation import get_extension

# ------------------------
# Unit Tests for get_extension
# ------------------------

# 1. BASIC TEST CASES

def test_python_lowercase():
    # Basic: Standard language, lowercase
    codeflash_output = get_extension('python') # 3.22μs -> 914ns (252% faster)

def test_python_uppercase():
    # Basic: Standard language, uppercase
    codeflash_output = get_extension('PYTHON') # 2.80μs -> 753ns (272% faster)

def test_python_mixedcase():
    # Basic: Standard language, mixed case
    codeflash_output = get_extension('PyThOn') # 2.68μs -> 700ns (283% faster)

def test_javascript():
    # Basic: Another standard language
    codeflash_output = get_extension('javascript') # 2.65μs -> 746ns (255% faster)

def test_java():
    # Basic: Another standard language
    codeflash_output = get_extension('java') # 2.54μs -> 677ns (274% faster)

def test_cpp():
    # Basic: C++
    codeflash_output = get_extension('cpp') # 2.53μs -> 674ns (275% faster)

def test_c():
    # Basic: C
    codeflash_output = get_extension('c') # 2.58μs -> 722ns (258% faster)

def test_ruby():
    # Basic: Ruby
    codeflash_output = get_extension('ruby') # 2.64μs -> 720ns (267% faster)

def test_go():
    # Basic: Go
    codeflash_output = get_extension('go') # 2.56μs -> 693ns (269% faster)

def test_rust():
    # Basic: Rust
    codeflash_output = get_extension('rust') # 2.53μs -> 676ns (274% faster)

def test_php():
    # Basic: PHP
    codeflash_output = get_extension('php') # 2.53μs -> 706ns (258% faster)

def test_swift():
    # Basic: Swift
    codeflash_output = get_extension('swift') # 2.58μs -> 709ns (263% faster)

def test_kotlin():
    # Basic: Kotlin
    codeflash_output = get_extension('kotlin') # 2.44μs -> 692ns (252% faster)

def test_scala():
    # Basic: Scala
    codeflash_output = get_extension('scala') # 2.41μs -> 667ns (261% faster)

def test_csharp():
    # Basic: C#
    codeflash_output = get_extension('csharp') # 2.39μs -> 683ns (251% faster)

def test_css():
    # Basic: CSS
    codeflash_output = get_extension('css') # 2.49μs -> 665ns (275% faster)

def test_html():
    # Basic: HTML
    codeflash_output = get_extension('html') # 2.46μs -> 700ns (252% faster)

def test_sql():
    # Basic: SQL
    codeflash_output = get_extension('sql') # 2.45μs -> 692ns (254% faster)

def test_shell():
    # Basic: Shell
    codeflash_output = get_extension('shell') # 2.46μs -> 699ns (252% faster)

def test_bash():
    # Basic: Bash (alias for shell)
    codeflash_output = get_extension('bash') # 2.41μs -> 670ns (259% faster)

def test_powershell():
    # Basic: Powershell
    codeflash_output = get_extension('powershell') # 2.43μs -> 687ns (254% faster)

def test_r():
    # Basic: R
    codeflash_output = get_extension('r') # 2.50μs -> 681ns (267% faster)

def test_matlab():
    # Basic: Matlab
    codeflash_output = get_extension('matlab') # 2.45μs -> 689ns (256% faster)

def test_lua():
    # Basic: Lua
    codeflash_output = get_extension('lua') # 2.45μs -> 645ns (279% faster)

def test_perl():
    # Basic: Perl
    codeflash_output = get_extension('perl') # 2.50μs -> 688ns (263% faster)

# 2. EDGE TEST CASES

def test_unknown_language():
    # Edge: Unknown language returns its lowercase name
    codeflash_output = get_extension('unknownlang') # 2.58μs -> 712ns (262% faster)

def test_unknown_language_uppercase():
    # Edge: Unknown language, uppercase, returns lowercase
    codeflash_output = get_extension('FOOBAR') # 2.52μs -> 634ns (298% faster)

def test_empty_string():
    # Edge: Empty string returns empty string
    codeflash_output = get_extension('') # 2.44μs -> 605ns (303% faster)

def test_whitespace_string():
    # Edge: Only whitespace string returns whitespace string
    codeflash_output = get_extension('   ') # 2.33μs -> 566ns (312% faster)

def test_language_with_spaces():
    # Edge: Language name with spaces returns lowercased name with spaces
    codeflash_output = get_extension('My Language') # 2.39μs -> 660ns (263% faster)

def test_language_with_special_chars():
    # Edge: Language name with special characters returns lowercased name
    codeflash_output = get_extension('C++') # 2.42μs -> 639ns (279% faster)
    codeflash_output = get_extension('F#') # 1.56μs -> 362ns (332% faster)
    codeflash_output = get_extension('Objective-C') # 1.14μs -> 315ns (261% faster)

def test_language_with_numbers():
    # Edge: Language name with numbers returns lowercased name
    codeflash_output = get_extension('Python3') # 2.17μs -> 594ns (266% faster)
    codeflash_output = get_extension('C2') # 1.38μs -> 344ns (300% faster)

def test_language_with_leading_trailing_spaces():
    # Edge: Leading/trailing spaces are not stripped
    codeflash_output = get_extension('  python  ') # 2.18μs -> 583ns (274% faster)

def test_language_is_none():
    # Edge: None input should raise TypeError
    with pytest.raises(AttributeError):
        get_extension(None) # 2.75μs -> 1.28μs (115% faster)

def test_language_is_integer():
    # Edge: Integer input should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension(123) # 2.94μs -> 1.23μs (139% faster)

def test_language_is_list():
    # Edge: List input should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension(['python']) # 2.80μs -> 1.16μs (142% faster)


def test_language_is_bool():
    # Edge: Boolean input should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension(True) # 3.84μs -> 1.44μs (167% faster)

def test_language_is_dict():
    # Edge: Dict input should raise AttributeError
    with pytest.raises(AttributeError):
        get_extension({'language': 'python'}) # 3.22μs -> 1.15μs (180% faster)

def test_case_insensitivity():
    # Edge: All case variations map to correct extension
    codeflash_output = get_extension('PYTHON') # 3.03μs -> 981ns (209% faster)
    codeflash_output = get_extension('Python') # 1.48μs -> 352ns (320% faster)
    codeflash_output = get_extension('pYtHoN') # 966ns -> 223ns (333% faster)

def test_aliases():
    # Edge: Bash and shell both map to 'sh'
    codeflash_output = get_extension('bash') # 2.44μs -> 716ns (241% faster)
    codeflash_output = get_extension('shell') # 1.38μs -> 436ns (217% faster)

def test_powershell_case():
    # Edge: Powershell case insensitivity
    codeflash_output = get_extension('PowerShell') # 2.40μs -> 703ns (241% faster)
    codeflash_output = get_extension('POWERSHELL') # 1.23μs -> 310ns (295% faster)

# 3. LARGE SCALE TEST CASES

def test_large_batch_known_languages():
    # Large Scale: All known languages with various cases
    languages = [
        'python', 'PYTHON', 'Python', 'javascript', 'JAVASCRIPT', 'JavaScript',
        'typescript', 'TypeScript', 'java', 'JAVA', 'cpp', 'CPP', 'c', 'C',
        'ruby', 'RUBY', 'go', 'GO', 'rust', 'RUST', 'php', 'PHP', 'swift', 'SWIFT',
        'kotlin', 'KOTLIN', 'scala', 'SCALA', 'csharp', 'CSHARP', 'css', 'CSS',
        'html', 'HTML', 'sql', 'SQL', 'shell', 'SHELL', 'bash', 'BASH',
        'powershell', 'POWERSHELL', 'r', 'R', 'matlab', 'MATLAB', 'lua', 'LUA',
        'perl', 'PERL'
    ]
    expected = [
        'py', 'py', 'py', 'js', 'js', 'js',
        'ts', 'ts', 'java', 'java', 'cpp', 'cpp', 'c', 'c',
        'rb', 'rb', 'go', 'go', 'rs', 'rs', 'php', 'php', 'swift', 'swift',
        'kt', 'kt', 'scala', 'scala', 'cs', 'cs', 'css', 'css',
        'html', 'html', 'sql', 'sql', 'sh', 'sh', 'sh', 'sh',
        'ps1', 'ps1', 'r', 'r', 'm', 'm', 'lua', 'lua',
        'pl', 'pl'
    ]
    for lang, ext in zip(languages, expected):
        codeflash_output = get_extension(lang) # 48.5μs -> 12.0μs (304% faster)

def test_large_batch_unknown_languages():
    # Large Scale: Many unknown languages, should return lowercased names
    unknowns = [f'Lang{i}' for i in range(100)]
    for i in range(100):
        codeflash_output = get_extension(unknowns[i]) # 91.3μs -> 21.2μs (331% faster)

def test_large_batch_special_char_languages():
    # Large Scale: Unknown languages with special characters
    specials = [f'Lang_{i}-Test' for i in range(100)]
    for i in range(100):
        codeflash_output = get_extension(specials[i]) # 91.8μs -> 21.9μs (319% faster)

def test_large_batch_empty_strings():
    # Large Scale: Many empty strings
    for _ in range(100):
        codeflash_output = get_extension('') # 86.6μs -> 18.0μs (381% faster)

def test_large_batch_whitespace_strings():
    # Large Scale: Many whitespace strings
    for _ in range(100):
        codeflash_output = get_extension('   ') # 89.0μs -> 20.0μs (344% faster)

def test_large_batch_leading_trailing_spaces():
    # Large Scale: Many strings with leading/trailing spaces
    for i in range(100):
        lang = f'  python{i}  '
        codeflash_output = get_extension(lang) # 92.0μs -> 21.4μs (330% faster)

def test_performance_large_known_and_unknown():
    # Large Scale: Mix of known and unknown languages
    known = ['python', 'java', 'cpp', 'go', 'rust', 'php', 'swift', 'kotlin', 'scala', 'csharp']
    unknown = [f'unknownlang{i}' for i in range(990)]
    all_langs = known + unknown
    expected = ['py', 'java', 'cpp', 'go', 'rs', 'php', 'swift', 'kt', 'scala', 'cs'] + [f'unknownlang{i}' for i in range(990)]
    for lang, ext in zip(all_langs, expected):
        codeflash_output = get_extension(lang) # 886μs -> 209μs (323% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from pdd.sync_determine_operation import get_extension

def test_get_extension():
    get_extension('')

🔎 Concolic Coverage Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`codeflash_concolic_diinpk0o/tmpuue_ekul/test_concolic_coverage.py::test_get_extension`	3.57μs	947ns	277%✅

To edit these changes git checkout codeflash/optimize-get_extension-mgmza0gh and push.

The key optimization is moving the dictionary definition from inside the function to module scope as `_EXTENSIONS`. This eliminates the overhead of recreating a 24-key dictionary on every function call. **What changed:** - Moved the `extensions` dictionary outside the function as a module-level constant `_EXTENSIONS` - The function now simply performs a dictionary lookup without recreating the dictionary **Why this is faster:** In the original code, Python had to allocate memory and construct a dictionary with 24 key-value pairs every time `get_extension()` was called. The line profiler shows this dictionary creation took 61.5% of the total execution time (22.1ms out of 36ms). With the optimization, the dictionary is created once at module import time and reused for all function calls. **Performance characteristics:** - Provides consistent 2-3x speedup across all test cases (200-380% faster) - Particularly effective for high-frequency calls - the more times the function is called, the greater the cumulative benefit - Speedup is independent of whether the language is known or unknown, since the bottleneck was dictionary creation, not lookup - Works well for both single calls and batch processing scenarios (large scale tests show 300-380% improvements) This optimization is most beneficial in scenarios where `get_extension()` is called repeatedly, such as processing multiple files or batch operations, which is evident from the large-scale test results.

codeflash-ai bot requested a review from mashraf-222 October 12, 2025 00:40

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `get_extension` by 316% #11

⚡️ Speed up function `get_extension` by 316% #11

Uh oh!

codeflash-ai bot commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function get_extension by 316% #11

Are you sure you want to change the base?

⚡️ Speed up function get_extension by 316% #11

Uh oh!

Conversation

codeflash-ai bot commented Oct 12, 2025

📄 316% (3.16x) speedup for get_extension in pdd/sync_determine_operation.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `get_extension` by 316% #11

⚡️ Speed up function `get_extension` by 316% #11

📄 316% (3.16x) speedup for `get_extension` in `pdd/sync_determine_operation.py`