⚡️ Speed up function `_gridmake2` by 524% #1001

codeflash-ai · 2025-12-30T21:48:09Z

📄 524% (5.24x) speedup for `_gridmake2` in `code_to_optimize/discrete_riccati.py`

⏱️ Runtime : 3.50 milliseconds → 561 microseconds (best of 82 runs)

📝 Explanation and details

Optimization Explanation:
The original implementation uses np.tile, np.repeat, and np.column_stack which create intermediate arrays and involve multiple memory allocations. By using Numba's JIT compilation with nopython mode, we can pre-allocate the output array and fill it directly with efficient loops, eliminating intermediate allocations and leveraging Numba's optimized code generation for significant speedup, especially for large inputs.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 17 Passed
🌀 Generated Regression Tests	✅ 27 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_gridmake2.py::TestGridmake2EdgeCases.test_both_empty_arrays`	64.5μs	2.67μs	2320%✅
`test_gridmake2.py::TestGridmake2EdgeCases.test_empty_arrays_raise_or_return_empty`	65.3μs	3.38μs	1835%✅
`test_gridmake2.py::TestGridmake2EdgeCases.test_float_dtype_preserved`	65.0μs	2.62μs	2376%✅
`test_gridmake2.py::TestGridmake2EdgeCases.test_integer_dtype_preserved`	65.5μs	2.67μs	2358%✅
`test_gridmake2.py::TestGridmake2NotImplemented.test_1d_first_2d_second_raises`	48.9μs	48.7μs	0.514%✅
`test_gridmake2.py::TestGridmake2NotImplemented.test_both_2d_raises`	49.1μs	49.1μs	0.000%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_basic_two_element_arrays`	66.2μs	3.08μs	2046%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_different_length_arrays`	65.7μs	2.75μs	2288%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_float_arrays`	65.5μs	2.62μs	2395%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_larger_arrays`	66.0μs	2.71μs	2335%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_negative_values`	65.1μs	2.67μs	2342%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_result_shape`	65.4μs	3.00μs	2079%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_single_element_arrays`	38.7μs	2.62μs	1373%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_single_element_with_multi_element`	65.7μs	2.62μs	2403%✅
`test_gridmake2.py::TestGridmake2With2DFirst.test_2d_first_1d_second`	41.2μs	3.17μs	1201%✅
`test_gridmake2.py::TestGridmake2With2DFirst.test_2d_multiple_columns`	12.5μs	2.67μs	370%✅
`test_gridmake2.py::TestGridmake2With2DFirst.test_2d_single_column`	40.9μs	2.79μs	1364%✅

🌀 Click to see Generated Regression Tests

import numpy as np

# imports
import pytest  # used for our unit tests

from code_to_optimize.discrete_riccati import _gridmake2

# unit tests


class TestGridmake2Basic:
    """Basic test cases for fundamental functionality"""

    def test_simple_two_element_vectors(self):
        """Test with two simple 2-element 1D arrays"""
        # Create simple input vectors
        x1 = np.array([1.0, 2.0])
        x2 = np.array([3.0, 4.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 67.1μs -> 2.92μs (2202% faster)

        # Expected output: cartesian product
        # (1,3), (2,3), (1,4), (2,4)
        expected = np.array([[1.0, 3.0], [2.0, 3.0], [1.0, 4.0], [2.0, 4.0]])

    def test_three_element_vectors(self):
        """Test with 3-element 1D arrays"""
        # Create input vectors with 3 elements each
        x1 = np.array([1.0, 2.0, 3.0])
        x2 = np.array([4.0, 5.0, 6.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 66.0μs -> 2.96μs (2131% faster)

        # Expected: 3x3=9 combinations
        expected = np.array(
            [[1.0, 4.0], [2.0, 4.0], [3.0, 4.0], [1.0, 5.0], [2.0, 5.0], [3.0, 5.0], [1.0, 6.0], [2.0, 6.0], [3.0, 6.0]]
        )

    def test_different_length_vectors(self):
        """Test with 1D arrays of different lengths"""
        # Create vectors of different lengths
        x1 = np.array([1.0, 2.0])
        x2 = np.array([10.0, 20.0, 30.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 65.2μs -> 2.67μs (2345% faster)

        # Expected: 2x3=6 combinations
        expected = np.array([[1.0, 10.0], [2.0, 10.0], [1.0, 20.0], [2.0, 20.0], [1.0, 30.0], [2.0, 30.0]])

    def test_matrix_and_vector(self):
        """Test with 2D matrix as x1 and 1D vector as x2"""
        # Create a 2x2 matrix and a 2-element vector
        x1 = np.array([[1.0, 2.0], [3.0, 4.0]])
        x2 = np.array([5.0, 6.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 41.2μs -> 3.21μs (1185% faster)

        # Expected: 2 rows from x1 repeated for each element in x2
        # Result should have 4 rows (2*2) and 3 columns (2 from x1 + 1 from x2)
        expected = np.array([[1.0, 2.0, 5.0], [3.0, 4.0, 5.0], [1.0, 2.0, 6.0], [3.0, 4.0, 6.0]])

    def test_integer_arrays(self):
        """Test with integer arrays"""
        # Create integer arrays
        x1 = np.array([1, 2], dtype=np.int64)
        x2 = np.array([3, 4], dtype=np.int64)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 65.2μs -> 2.79μs (2237% faster)

        # Expected output
        expected = np.array([[1, 3], [2, 3], [1, 4], [2, 4]])


class TestGridmake2Edge:
    """Edge case tests for unusual conditions"""

    def test_single_element_vectors(self):
        """Test with single-element 1D arrays"""
        # Create single-element vectors
        x1 = np.array([1.0])
        x2 = np.array([2.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 38.2μs -> 2.67μs (1335% faster)

        # Expected: only one combination
        expected = np.array([[1.0, 2.0]])

    def test_single_element_x1_multiple_x2(self):
        """Test with single-element x1 and multiple-element x2"""
        # Create arrays
        x1 = np.array([5.0])
        x2 = np.array([1.0, 2.0, 3.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 65.1μs -> 2.54μs (2460% faster)

        # Expected: x1 repeated for each x2 element
        expected = np.array([[5.0, 1.0], [5.0, 2.0], [5.0, 3.0]])

    def test_multiple_x1_single_x2(self):
        """Test with multiple-element x1 and single-element x2"""
        # Create arrays
        x1 = np.array([1.0, 2.0, 3.0])
        x2 = np.array([10.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 37.5μs -> 2.50μs (1398% faster)

        # Expected: all x1 elements paired with single x2
        expected = np.array([[1.0, 10.0], [2.0, 10.0], [3.0, 10.0]])

    def test_negative_values(self):
        """Test with negative values"""
        # Create arrays with negative values
        x1 = np.array([-1.0, -2.0])
        x2 = np.array([-3.0, -4.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 65.2μs -> 2.54μs (2467% faster)

        # Expected output
        expected = np.array([[-1.0, -3.0], [-2.0, -3.0], [-1.0, -4.0], [-2.0, -4.0]])

    def test_zero_values(self):
        """Test with zero values"""
        # Create arrays with zeros
        x1 = np.array([0.0, 1.0])
        x2 = np.array([0.0, 2.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 64.8μs -> 2.54μs (2449% faster)

        # Expected output
        expected = np.array([[0.0, 0.0], [1.0, 0.0], [0.0, 2.0], [1.0, 2.0]])

    def test_very_small_values(self):
        """Test with very small floating point values"""
        # Create arrays with very small values
        x1 = np.array([1e-10, 2e-10])
        x2 = np.array([3e-10, 4e-10])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 65.0μs -> 2.54μs (2458% faster)

        # Expected output
        expected = np.array([[1e-10, 3e-10], [2e-10, 3e-10], [1e-10, 4e-10], [2e-10, 4e-10]])

    def test_very_large_values(self):
        """Test with very large floating point values"""
        # Create arrays with very large values
        x1 = np.array([1e10, 2e10])
        x2 = np.array([3e10, 4e10])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 64.8μs -> 2.54μs (2447% faster)

        # Expected output
        expected = np.array([[1e10, 3e10], [2e10, 3e10], [1e10, 4e10], [2e10, 4e10]])

    def test_matrix_single_row(self):
        """Test with matrix having single row and vector"""
        # Create a 1x3 matrix and a 2-element vector
        x1 = np.array([[1.0, 2.0, 3.0]])
        x2 = np.array([4.0, 5.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 41.1μs -> 2.79μs (1371% faster)

        # Expected: single row repeated for each x2 element
        expected = np.array([[1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 3.0, 5.0]])

    def test_matrix_single_column(self):
        """Test with matrix having single column and vector"""
        # Create a 3x1 matrix and a 2-element vector
        x1 = np.array([[1.0], [2.0], [3.0]])
        x2 = np.array([4.0, 5.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 40.8μs -> 2.75μs (1383% faster)

        # Expected: 3 rows repeated for each x2 element
        expected = np.array([[1.0, 4.0], [2.0, 4.0], [3.0, 4.0], [1.0, 5.0], [2.0, 5.0], [3.0, 5.0]])

    def test_matrix_with_multiple_columns(self):
        """Test with matrix having multiple columns and vector"""
        # Create a 3x3 matrix and a 2-element vector
        x1 = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])
        x2 = np.array([10.0, 20.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 40.8μs -> 3.08μs (1224% faster)

        # Expected: 3 rows repeated for each x2 element
        expected = np.array(
            [
                [1.0, 2.0, 3.0, 10.0],
                [4.0, 5.0, 6.0, 10.0],
                [7.0, 8.0, 9.0, 10.0],
                [1.0, 2.0, 3.0, 20.0],
                [4.0, 5.0, 6.0, 20.0],
                [7.0, 8.0, 9.0, 20.0],
            ]
        )

    def test_not_implemented_both_matrices(self):
        """Test that NotImplementedError is raised for two matrices"""
        # Create two 2D matrices
        x1 = np.array([[1.0, 2.0], [3.0, 4.0]])
        x2 = np.array([[5.0, 6.0], [7.0, 8.0]])

        # Should raise NotImplementedError
        with pytest.raises(NotImplementedError):
            _gridmake2(x1, x2)  # 49.0μs -> 49.0μs (0.167% faster)

    def test_not_implemented_vector_matrix(self):
        """Test that NotImplementedError is raised for vector then matrix"""
        # Create a vector and a matrix
        x1 = np.array([1.0, 2.0])
        x2 = np.array([[3.0, 4.0], [5.0, 6.0]])

        # Should raise NotImplementedError
        with pytest.raises(NotImplementedError):
            _gridmake2(x1, x2)  # 49.0μs -> 48.7μs (0.771% faster)

    def test_mixed_positive_negative(self):
        """Test with mix of positive and negative values"""
        # Create arrays with mixed signs
        x1 = np.array([-1.0, 0.0, 1.0])
        x2 = np.array([-2.0, 2.0])

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 66.1μs -> 2.71μs (2341% faster)

        # Expected output
        expected = np.array([[-1.0, -2.0], [0.0, -2.0], [1.0, -2.0], [-1.0, 2.0], [0.0, 2.0], [1.0, 2.0]])


class TestGridmake2LargeScale:
    """Large scale tests for performance and scalability"""

    def test_large_vectors_100_elements(self):
        """Test with two 100-element vectors"""
        # Create large vectors
        x1 = np.arange(100.0)
        x2 = np.arange(100.0, 200.0)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 80.9μs -> 4.62μs (1650% faster)

        # Verify all x1 values appear 100 times
        unique_x1, counts_x1 = np.unique(result[:, 0], return_counts=True)

        # Verify all x2 values appear 100 times
        unique_x2, counts_x2 = np.unique(result[:, 1], return_counts=True)

    def test_large_asymmetric_vectors(self):
        """Test with asymmetric large vectors (200 and 50 elements)"""
        # Create asymmetric vectors
        x1 = np.arange(200.0)
        x2 = np.arange(50.0)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 79.2μs -> 4.33μs (1728% faster)

    def test_large_matrix_and_vector(self):
        """Test with large matrix (100x5) and vector (100)"""
        # Create large matrix and vector
        x1 = np.arange(500.0).reshape(100, 5)
        x2 = np.arange(100.0)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 113μs -> 33.0μs (245% faster)

    def test_wide_matrix(self):
        """Test with wide matrix (10x50) and vector (20)"""
        # Create wide matrix and vector
        x1 = np.arange(500.0).reshape(10, 50)
        x2 = np.arange(20.0)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 46.5μs -> 4.17μs (1015% faster)

    def test_performance_moderate_size(self):
        """Test performance with moderate size arrays (500 elements each)"""
        # Create moderate size vectors
        x1 = np.linspace(0, 1, 500)
        x2 = np.linspace(1, 2, 500)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 807μs -> 191μs (322% faster)

    def test_large_matrix_many_columns(self):
        """Test with matrix having many columns (50x20) and vector (30)"""
        # Create matrix with many columns
        x1 = np.arange(1000.0).reshape(50, 20)
        x2 = np.arange(30.0)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 65.0μs -> 8.04μs (708% faster)

    def test_sequential_values_large(self):
        """Test that sequential values are correctly arranged in large output"""
        # Create sequential vectors
        x1 = np.arange(100.0)
        x2 = np.arange(100.0)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 79.4μs -> 4.83μs (1543% faster)

        # Verify that x1 values cycle correctly
        for i in range(100):
            # For each x2 value, x1 should go from 0 to 99
            block = result[i * 100 : (i + 1) * 100, 0]

    def test_floating_point_precision_large(self):
        """Test floating point precision with large number of operations"""
        # Create vectors with values that might cause precision issues
        x1 = np.linspace(0.1, 0.9, 100)
        x2 = np.linspace(0.01, 0.09, 100)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 77.5μs -> 4.62μs (1577% faster)

    def test_memory_efficiency_check(self):
        """Test that function handles memory efficiently with large arrays"""
        # Create large arrays that would use significant memory
        x1 = np.arange(300.0)
        x2 = np.arange(300.0)

        # Call the function
        codeflash_output = _gridmake2(x1, x2)
        result = codeflash_output  # 211μs -> 25.2μs (739% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_gridmake2-mjt4c9bf and push.

**Optimization Explanation:** The original implementation uses `np.tile`, `np.repeat`, and `np.column_stack` which create intermediate arrays and involve multiple memory allocations. By using Numba's JIT compilation with nopython mode, we can pre-allocate the output array and fill it directly with efficient loops, eliminating intermediate allocations and leveraging Numba's optimized code generation for significant speedup, especially for large inputs.

codeflash-ai bot requested a review from aseembits93 December 30, 2025 21:48

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_gridmake2` by 524% #1001

⚡️ Speed up function `_gridmake2` by 524% #1001

Uh oh!

codeflash-ai bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function _gridmake2 by 524% #1001

Are you sure you want to change the base?

⚡️ Speed up function _gridmake2 by 524% #1001

Uh oh!

Conversation

codeflash-ai bot commented Dec 30, 2025

📄 524% (5.24x) speedup for _gridmake2 in code_to_optimize/discrete_riccati.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `_gridmake2` by 524% #1001

⚡️ Speed up function `_gridmake2` by 524% #1001

📄 524% (5.24x) speedup for `_gridmake2` in `code_to_optimize/discrete_riccati.py`