⚡️ Speed up function _gridmake2 by 884%
#998
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 884% (8.84x) speedup for
_gridmake2incode_to_optimize/discrete_riccati.py⏱️ Runtime :
1.07 milliseconds→109 microseconds(best of85runs)📝 Explanation and details
Performance Optimization Summary
The optimized code achieves an 884% speedup (from 1.07ms to 109μs) by replacing NumPy's high-level array operations with Numba JIT-compiled explicit loops.
Key Optimizations
1. Numba JIT Compilation (
@njit(cache=True))cache=Trueflag stores the compiled version, avoiding recompilation costs on subsequent runs2. Explicit Loop-Based Construction vs. NumPy Broadcasting
np.tile(),np.repeat(), andnp.column_stack()which create multiple intermediate arrays and perform memory allocationsnp.empty()and fills it directly using nested loops3. Why This Works
From the line profiler, the original code spent:
np.column_stack([np.tile(...)])np.repeat()np.tile()for the 2D caseThese NumPy operations, while convenient, involve:
Numba's compiled loops avoid all of this by directly computing each output element in place.
Impact on Workloads
Based on
function_references,_gridmake2is called fromgridmake()which:For multi-array scenarios (3+ inputs), the speedup compounds significantly since
_gridmake2is called multiple times pergridmake()invocation. The nearly 9x speedup per call translates to substantial gains in computational economics applications where Cartesian products are frequently computed for state space expansions.Trade-offs
cache=Truemitigates this for subsequent calls✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
test_gridmake2.py::TestGridmake2EdgeCases.test_both_empty_arraystest_gridmake2.py::TestGridmake2EdgeCases.test_empty_arrays_raise_or_return_emptytest_gridmake2.py::TestGridmake2EdgeCases.test_float_dtype_preservedtest_gridmake2.py::TestGridmake2EdgeCases.test_integer_dtype_preservedtest_gridmake2.py::TestGridmake2NotImplemented.test_1d_first_2d_second_raisestest_gridmake2.py::TestGridmake2NotImplemented.test_both_2d_raisestest_gridmake2.py::TestGridmake2With1DArrays.test_basic_two_element_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_different_length_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_float_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_larger_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_negative_valuestest_gridmake2.py::TestGridmake2With1DArrays.test_result_shapetest_gridmake2.py::TestGridmake2With1DArrays.test_single_element_arraystest_gridmake2.py::TestGridmake2With1DArrays.test_single_element_with_multi_elementtest_gridmake2.py::TestGridmake2With2DFirst.test_2d_first_1d_secondtest_gridmake2.py::TestGridmake2With2DFirst.test_2d_multiple_columnstest_gridmake2.py::TestGridmake2With2DFirst.test_2d_single_columntest_gridmake2_torch.py::TestGridmake2TorchCPU.test_2d_and_1d_matches_numpytest_gridmake2_torch.py::TestGridmake2TorchCPU.test_both_1d_matches_numpyTo edit these changes
git checkout codeflash/optimize-_gridmake2-mjq2prhvand push.