⚡️ Speed up function _gridmake2 by 1,039%
#997
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 1,039% (10.39x) speedup for
_gridmake2incode_to_optimize/discrete_riccati.py⏱️ Runtime :
1.06 milliseconds→93.3 microseconds(best of96runs)📝 Explanation and details
The optimized code achieves a 10x speedup (1038%) by replacing NumPy's high-level array operations with JIT-compiled explicit loops via Numba's
@njitdecorator.Key Optimizations
1. Numba JIT Compilation with
@njit(cache=True)cache=Trueflag stores compiled code between runs, avoiding recompilation costtile,repeat, andcolumn_stackuse internally but with Python overhead2. Preallocated Output Arrays with Explicit Loops
np.column_stack([np.tile(x1, x2.shape[0]), np.repeat(x2, x1.shape[0])])creates three temporary arrays (tile result, repeat result, then column_stack result)(x1.shape[0] * x2.shape[0], 2)and fills it directly via nested loops3. Direct Memory Access
np.column_stackand related operationsout[idx, 0] = x1[i]), which Numba compiles to efficient memory writesPerformance Context
From
function_references,_gridmake2is called recursively withingridmake()when building cartesian products of multiple arrays. Ford > 2dimensions, the function is calledd-1times in a loop. This means:Test Case Suitability
The optimization excels when:
gridmakecase)The line profiler confirms the bottleneck was NumPy's high-level operations, which this optimization directly addresses through low-level compiled code.
✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
test_gridmake2.py::TestGridmake2EdgeCases.test_both_empty_arraystest_gridmake2.py::TestGridmake2EdgeCases.test_empty_arrays_raise_or_return_emptytest_gridmake2.py::TestGridmake2EdgeCases.test_float_dtype_preservedtest_gridmake2.py::TestGridmake2EdgeCases.test_integer_dtype_preservedtest_gridmake2.py::TestGridmake2NotImplemented.test_1d_first_2d_second_raisestest_gridmake2.py::TestGridmake2NotImplemented.test_both_2d_raisestest_gridmake2.py::TestGridmake2With1DArrays.test_basic_two_element_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_different_length_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_float_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_larger_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_negative_valuestest_gridmake2.py::TestGridmake2With1DArrays.test_result_shapetest_gridmake2.py::TestGridmake2With1DArrays.test_single_element_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_single_element_with_multi_elementtest_gridmake2.py::TestGridmake2With2DFirst.test_2d_first_1d_secondtest_gridmake2.py::TestGridmake2With2DFirst.test_2d_multiple_columnstest_gridmake2.py::TestGridmake2With2DFirst.test_2d_single_columntest_gridmake2_torch.py::TestGridmake2TorchCPU.test_2d_and_1d_matches_numpytest_gridmake2_torch.py::TestGridmake2TorchCPU.test_both_1d_matches_numpyTo edit these changes
git checkout codeflash/optimize-_gridmake2-mjq1m0q5and push.