⚡️ Speed up function `_gridmake2` by 424% #1000

codeflash-ai · 2025-12-30T18:50:56Z

📄 424% (4.24x) speedup for `_gridmake2` in `code_to_optimize/discrete_riccati.py`

⏱️ Runtime : 6.72 milliseconds → 1.28 milliseconds (best of 34 runs)

📝 Explanation and details

Looking at the test failures, I can identify several issues with the optimized code:

Type handling: The optimized code always uses x1.dtype for the output array, but the original code uses np.column_stack which performs type promotion. When x1 and x2 have different dtypes, the output should use the promoted type.
3D array handling: The optimized code fails with a Numba typing error when x1 is 3D because x1.shape has 3 elements but the code tries to unpack it into only 2 variables (rows, cols). The original code would fail with a different error (ValueError from np.column_stack), but we need to ensure the same error path is followed.
Empty array handling: When arrays are empty, the dtype needs to match the promoted type from both inputs.

Here's the fix:

<replace_in_file>
code_to_optimize/discrete_riccati.py

<<<<<<< SEARCH
@njit(cache=True)
def _gridmake2(x1: np.ndarray, x2: np.ndarray) -> np.ndarray:
"""
Expands two vectors (or matrices) into a matrix where rows span the
cartesian product of combinations of the input arrays. Each column of the
input arrays will correspond to one column of the output matrix.

Parameters
----------
x1 : np.ndarray
     First vector to be expanded.

x2 : np.ndarray
     Second vector to be expanded.

Returns
-------
out : np.ndarray
      The cartesian product of combinations of the input arrays.

Notes
-----
Based of original function ``gridmake2`` in CompEcon toolbox by
Miranda and Fackler.

References
----------
Miranda, Mario J, and Paul L Fackler. Applied Computational Economics
and Finance, MIT Press, 2002.

"""
if x1.ndim == 1 and x2.ndim == 1:
    col1 = np.empty(x1.shape[0] * x2.shape[0], dtype=x1.dtype)
    col2 = np.empty(x1.shape[0] * x2.shape[0], dtype=x2.dtype)
    idx = 0
    for j in range(x2.shape[0]):
        for i in range(x1.shape[0]):
            col1[idx] = x1[i]
            col2[idx] = x2[j]
            idx += 1
    out = np.empty((x1.shape[0] * x2.shape[0], 2), dtype=x1.dtype)
    out[:, 0] = col1
    out[:, 1] = col2
    return out
elif x1.ndim > 1 and x2.ndim == 1:
    rows, cols = x1.shape
    out_rows = rows * x2.shape[0]
    out = np.empty((out_rows, cols + 1), dtype=x1.dtype)
    for j in range(x2.shape[0]):
        for i in range(rows):
            out[j * rows + i, :cols] = x1[i]
            out[j * rows + i, cols] = x2[j]
    return out
else:
    raise NotImplementedError("Come back here")

=======
@njit(cache=True)
def _gridmake2_optimized_1d_1d(x1: np.ndarray, x2: np.ndarray, out_dtype) -> np.ndarray:
"""Optimized version for 1D x 1D case."""
n1 = x1.shape[0]
n2 = x2.shape[0]
total = n1 * n2

out = np.empty((total, 2), dtype=out_dtype)
idx = 0
for j in range(n2):
    for i in range(n1):
        out[idx, 0] = x1[i]
        out[idx, 1] = x2[j]
        idx += 1
return out

@njit(cache=True)
def _gridmake2_optimized_2d_1d(x1: np.ndarray, x2: np.ndarray, out_dtype) -> np.ndarray:
"""Optimized version for 2D x 1D case."""
rows = x1.shape[0]
cols = x1.shape[1]
n2 = x2.shape[0]
out_rows = rows * n2

out = np.empty((out_rows, cols + 1), dtype=out_dtype)
for j in range(n2):
    for i in range(rows):
        for k in range(cols):
            out[j * rows + i, k] = x1[i, k]
        out[j * rows + i, cols] = x2[j]
return out

def _gridmake2(x1: np.ndarray, x2: np.ndarray) -> np.ndarray:
"""
Expands two vectors (or matrices) into a matrix where rows span the
cartesian product of combinations of the input arrays. Each column of the
input arrays will correspond to one column of the output matrix.

Parameters
----------
x1 : np.ndarray
     First vector to be expanded.

x2 : np.ndarray
     Second vector to be expanded.

Returns
-------
out : np.ndarray
      The cartesian product of combinations of the input arrays.

Notes
-----
Based of original function ``gridmake2`` in CompEcon toolbox by
Miranda and Fackler.

References
----------
Miranda, Mario J, and Paul L Fackler. Applied Computational Economics
and Finance, MIT Press, 2002.

"""
if x1.ndim == 1 and x2.ndim == 1:
    # Determine output dtype using numpy's type promotion
    out_dtype = np.result_type(x1.dtype, x2.dtype)
    return _gridmake2_optimized_1d_1d(x1, x2, out_dtype)
elif x1.ndim > 1 and x2.ndim == 1:
    # For 2D case, use optimized version
    if x1.ndim == 2:
        out_dtype = np.result_type(x1.dtype, x2.dtype)
        return _gridmake2_optimized_2d_1d(x1, x2, out_dtype)
    else:
        # For 3D or higher, fall back to original implementation
        # This will raise the appropriate error
        first = np.tile(x1, (x2.shape[0], 1))
        second = np.repeat(x2, x1.shape[0])
        return np.column_stack([first, second])
else:
    raise NotImplementedError("Come back here")

REPLACE

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 17 Passed
🌀 Generated Regression Tests	✅ 44 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_gridmake2.py::TestGridmake2EdgeCases.test_both_empty_arrays`	65.0μs	4.46μs	1357%✅
`test_gridmake2.py::TestGridmake2EdgeCases.test_empty_arrays_raise_or_return_empty`	66.0μs	5.83μs	1031%✅
`test_gridmake2.py::TestGridmake2EdgeCases.test_float_dtype_preserved`	67.5μs	4.29μs	1474%✅
`test_gridmake2.py::TestGridmake2EdgeCases.test_integer_dtype_preserved`	66.2μs	4.29μs	1442%✅
`test_gridmake2.py::TestGridmake2NotImplemented.test_1d_first_2d_second_raises`	49.1μs	49.2μs	-0.254%⚠️
`test_gridmake2.py::TestGridmake2NotImplemented.test_both_2d_raises`	49.5μs	50.9μs	-2.79%⚠️
`test_gridmake2.py::TestGridmake2With1DArrays.test_basic_two_element_arrays`	66.4μs	4.42μs	1403%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_different_length_arrays`	65.9μs	4.42μs	1392%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_float_arrays`	66.4μs	4.29μs	1446%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_larger_arrays`	66.9μs	4.17μs	1505%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_negative_values`	66.2μs	4.25μs	1457%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_result_shape`	66.3μs	4.46μs	1388%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_single_element_arrays`	38.2μs	4.29μs	790%✅
`test_gridmake2.py::TestGridmake2With1DArrays.test_single_element_with_multi_element`	66.4μs	4.25μs	1463%✅
`test_gridmake2.py::TestGridmake2With2DFirst.test_2d_first_1d_second`	41.9μs	4.67μs	798%✅
`test_gridmake2.py::TestGridmake2With2DFirst.test_2d_multiple_columns`	12.8μs	4.46μs	186%✅
`test_gridmake2.py::TestGridmake2With2DFirst.test_2d_single_column`	41.3μs	4.38μs	845%✅

🌀 Click to see Generated Regression Tests

import numpy as np

# imports
import pytest  # used for our unit tests

from code_to_optimize.discrete_riccati import _gridmake2

# unit tests

# ---- Basic Test Cases ----


def test_basic_1d_vectors():
    # Test with simple 1D arrays
    x1 = np.array([1, 2])
    x2 = np.array([3, 4])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 67.2μs -> 4.58μs (1367% faster)
    # Expected: [[1, 3], [2, 3], [1, 4], [2, 4]]
    expected = np.array([[1, 3], [2, 3], [1, 4], [2, 4]])


def test_basic_1d_vectors_different_lengths():
    # Test with 1D arrays of different lengths
    x1 = np.array([5, 6, 7])
    x2 = np.array([8, 9])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 65.8μs -> 4.33μs (1418% faster)
    # Expected: [[5, 8], [6, 8], [7, 8], [5, 9], [6, 9], [7, 9]]
    expected = np.array([[5, 8], [6, 8], [7, 8], [5, 9], [6, 9], [7, 9]])


def test_basic_2d_matrix_and_1d_vector():
    # Test with 2D matrix and 1D vector
    x1 = np.array([[1, 2], [3, 4]])
    x2 = np.array([5, 6])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 41.4μs -> 4.58μs (803% faster)
    # Expected: [[1, 2, 5], [3, 4, 5], [1, 2, 6], [3, 4, 6]]
    expected = np.array([[1, 2, 5], [3, 4, 5], [1, 2, 6], [3, 4, 6]])


def test_basic_2d_matrix_and_1d_vector_three_rows():
    # Test with 2D matrix (3 rows) and 1D vector (2 elements)
    x1 = np.array([[10, 20], [30, 40], [50, 60]])
    x2 = np.array([70, 80])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 41.8μs -> 4.50μs (828% faster)
    # Expected: each row of x1 paired with each value of x2
    expected = np.array([[10, 20, 70], [30, 40, 70], [50, 60, 70], [10, 20, 80], [30, 40, 80], [50, 60, 80]])


# ---- Edge Test Cases ----


def test_edge_empty_x1():
    # x1 is empty 1D, x2 is non-empty 1D
    x1 = np.array([])
    x2 = np.array([1, 2])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 65.8μs -> 4.50μs (1361% faster)
    # Should produce empty array with shape (0, 2)
    expected = np.empty((0, 2))


def test_edge_empty_x2():
    # x1 is non-empty 1D, x2 is empty 1D
    x1 = np.array([1, 2])
    x2 = np.array([])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 66.1μs -> 5.38μs (1129% faster)
    # Should produce empty array with shape (0, 2)
    expected = np.empty((0, 2))


def test_edge_both_empty():
    # Both x1 and x2 are empty
    x1 = np.array([])
    x2 = np.array([])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 64.8μs -> 4.25μs (1424% faster)
    # Should produce empty array with shape (0, 2)
    expected = np.empty((0, 2))


def test_edge_single_element_1d():
    # x1 and x2 are 1D arrays with one element each
    x1 = np.array([42])
    x2 = np.array([99])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 39.3μs -> 4.21μs (834% faster)
    expected = np.array([[42, 99]])


def test_edge_single_row_matrix_and_single_element_vector():
    # x1 is 2D with one row, x2 is 1D with one element
    x1 = np.array([[1, 2]])
    x2 = np.array([3])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 12.5μs -> 4.42μs (184% faster)
    expected = np.array([[1, 2, 3]])


def test_edge_single_column_matrix_and_vector():
    # x1 is 2D (Nx1), x2 is 1D
    x1 = np.array([[5], [6], [7]])
    x2 = np.array([8, 9])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 41.6μs -> 4.29μs (870% faster)
    expected = np.array([[5, 8], [6, 8], [7, 8], [5, 9], [6, 9], [7, 9]])


def test_edge_not_implemented_case():
    # x2 is 2D, should raise NotImplementedError
    x1 = np.array([1, 2])
    x2 = np.array([[3, 4], [5, 6]])
    with pytest.raises(NotImplementedError):
        _gridmake2(x1, x2)  # 49.5μs -> 50.9μs (2.70% slower)


def test_edge_x1_2d_x2_empty():
    # x1 is 2D, x2 is empty
    x1 = np.array([[1, 2], [3, 4]])
    x2 = np.array([])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 42.3μs -> 5.67μs (646% faster)
    expected = np.empty((0, 3))


def test_edge_x1_empty_x2_2d():
    # x1 is empty, x2 is 2D
    x1 = np.array([])
    x2 = np.array([[1, 2], [3, 4]])
    with pytest.raises(NotImplementedError):
        _gridmake2(x1, x2)  # 49.0μs -> 49.3μs (0.675% slower)


# ---- Large Scale Test Cases ----


def test_large_1d_vectors():
    # Large 1D arrays (1000 elements each)
    x1 = np.arange(1000)
    x2 = np.arange(1000, 2000)
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 3.09ms -> 579μs (433% faster)


def test_large_2d_matrix_and_1d_vector():
    # x1 is 2D (1000x2), x2 is 1D (10 elements)
    x1 = np.column_stack([np.arange(1000), np.arange(1000, 2000)])
    x2 = np.arange(10)
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 103μs -> 18.6μs (456% faster)


def test_large_single_row_matrix_and_large_vector():
    # x1 is 2D (1x3), x2 is 1D (1000 elements)
    x1 = np.array([[1, 2, 3]])
    x2 = np.arange(1000)
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 52.4μs -> 7.38μs (610% faster)
    # Should have 1000 rows, 4 columns
    expected = np.column_stack([np.tile(x1, (1000, 1)), x2])


def test_large_vector_and_single_element_vector():
    # x1 is 1D (1000 elements), x2 is 1D (1 element)
    x1 = np.arange(1000)
    x2 = np.array([42])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 40.7μs -> 4.92μs (728% faster)
    # Should have 1000 rows, 2 columns
    expected = np.column_stack([x1, np.full(1000, 42)])


def test_large_matrix_and_single_element_vector():
    # x1 is 2D (1000x2), x2 is 1D (1 element)
    x1 = np.column_stack([np.arange(1000), np.arange(1000, 2000)])
    x2 = np.array([99])
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 17.3μs -> 5.67μs (205% faster)
    expected = np.column_stack([x1, np.full(1000, 99)])


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np

# imports
import pytest  # used for our unit tests

from code_to_optimize.discrete_riccati import _gridmake2

# unit tests

# ============================================================================
# BASIC TEST CASES - Testing fundamental functionality under normal conditions
# ============================================================================


def test_two_1d_vectors_basic():
    """Test cartesian product of two simple 1D vectors."""
    # Create two simple 1D arrays
    x1 = np.array([1, 2])
    x2 = np.array([3, 4])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 67.4μs -> 4.96μs (1260% faster)

    # Expected output: all combinations of x1 and x2
    # (1,3), (2,3), (1,4), (2,4)
    expected = np.array([[1, 3], [2, 3], [1, 4], [2, 4]])


def test_two_1d_vectors_different_sizes():
    """Test cartesian product with vectors of different lengths."""
    # Create vectors of different sizes
    x1 = np.array([1, 2, 3])
    x2 = np.array([10, 20])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 66.8μs -> 4.58μs (1356% faster)

    # Expected: 3 * 2 = 6 combinations
    expected = np.array([[1, 10], [2, 10], [3, 10], [1, 20], [2, 20], [3, 20]])


def test_2d_and_1d_basic():
    """Test cartesian product of 2D matrix with 1D vector."""
    # Create a 2D matrix (2 rows, 2 columns)
    x1 = np.array([[1, 2], [3, 4]])
    # Create a 1D vector
    x2 = np.array([10, 20])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 41.6μs -> 4.58μs (807% faster)

    # Expected: each row of x1 paired with each element of x2
    # Row [1,2] with 10, Row [3,4] with 10, Row [1,2] with 20, Row [3,4] with 20
    expected = np.array([[1, 2, 10], [3, 4, 10], [1, 2, 20], [3, 4, 20]])


def test_floating_point_values():
    """Test with floating point values to ensure type preservation."""
    # Create float arrays
    x1 = np.array([1.5, 2.5])
    x2 = np.array([0.1, 0.2])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 65.9μs -> 5.12μs (1186% faster)

    # Expected output
    expected = np.array([[1.5, 0.1], [2.5, 0.1], [1.5, 0.2], [2.5, 0.2]])


def test_negative_values():
    """Test with negative values."""
    # Create arrays with negative values
    x1 = np.array([-1, 0, 1])
    x2 = np.array([-5, 5])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 65.5μs -> 4.29μs (1427% faster)

    # Expected output
    expected = np.array([[-1, -5], [0, -5], [1, -5], [-1, 5], [0, 5], [1, 5]])


# ============================================================================
# EDGE TEST CASES - Testing extreme or unusual conditions
# ============================================================================


def test_single_element_vectors():
    """Test with single-element vectors (edge case for minimal input)."""
    # Create single-element arrays
    x1 = np.array([42])
    x2 = np.array([99])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 38.2μs -> 4.04μs (846% faster)

    # Expected: single combination
    expected = np.array([[42, 99]])


def test_single_element_in_x1_multiple_in_x2():
    """Test with single element in x1 and multiple in x2."""
    # Single element in x1
    x1 = np.array([5])
    # Multiple elements in x2
    x2 = np.array([1, 2, 3, 4])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 65.5μs -> 4.21μs (1456% faster)

    # Expected: x1 value repeated with each x2 value
    expected = np.array([[5, 1], [5, 2], [5, 3], [5, 4]])


def test_single_element_in_x2_multiple_in_x1():
    """Test with multiple elements in x1 and single in x2."""
    # Multiple elements in x1
    x1 = np.array([1, 2, 3, 4])
    # Single element in x2
    x2 = np.array([10])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 38.0μs -> 4.25μs (793% faster)

    # Expected: all x1 values paired with single x2 value
    expected = np.array([[1, 10], [2, 10], [3, 10], [4, 10]])


def test_zero_values():
    """Test with arrays containing zeros."""
    # Arrays with zeros
    x1 = np.array([0, 0, 0])
    x2 = np.array([0, 1])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 65.7μs -> 4.21μs (1461% faster)

    # Expected output
    expected = np.array([[0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]])


def test_2d_matrix_single_column():
    """Test 2D matrix with single column (edge case for matrix shape)."""
    # 2D matrix with single column
    x1 = np.array([[1], [2], [3]])
    # 1D vector
    x2 = np.array([10, 20])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 41.2μs -> 4.42μs (832% faster)

    # Expected: each row of x1 paired with each element of x2
    expected = np.array([[1, 10], [2, 10], [3, 10], [1, 20], [2, 20], [3, 20]])


def test_2d_matrix_single_row():
    """Test 2D matrix with single row."""
    # 2D matrix with single row
    x1 = np.array([[1, 2, 3]])
    # 1D vector
    x2 = np.array([10, 20])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 41.0μs -> 4.29μs (855% faster)

    # Expected: single row repeated with each x2 element
    expected = np.array([[1, 2, 3, 10], [1, 2, 3, 20]])


def test_not_implemented_error_both_2d():
    """Test that NotImplementedError is raised when both inputs are 2D."""
    # Create two 2D matrices
    x1 = np.array([[1, 2], [3, 4]])
    x2 = np.array([[5, 6], [7, 8]])

    # Should raise NotImplementedError
    with pytest.raises(NotImplementedError):
        _gridmake2(x1, x2)  # 49.1μs -> 49.7μs (1.26% slower)


def test_not_implemented_error_1d_and_2d():
    """Test that NotImplementedError is raised when x1 is 1D and x2 is 2D."""
    # x1 is 1D
    x1 = np.array([1, 2, 3])
    # x2 is 2D
    x2 = np.array([[4, 5], [6, 7]])

    # Should raise NotImplementedError
    with pytest.raises(NotImplementedError):
        _gridmake2(x1, x2)  # 49.0μs -> 49.4μs (0.844% slower)


def test_identical_values():
    """Test with arrays containing identical values."""
    # Arrays with repeated identical values
    x1 = np.array([5, 5, 5])
    x2 = np.array([7, 7])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 74.0μs -> 10.6μs (596% faster)

    # Expected: all combinations even though values are identical
    expected = np.array([[5, 7], [5, 7], [5, 7], [5, 7], [5, 7], [5, 7]])


def test_very_small_float_values():
    """Test with very small floating point values."""
    # Very small float values
    x1 = np.array([1e-10, 2e-10])
    x2 = np.array([3e-10, 4e-10])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 67.0μs -> 5.54μs (1108% faster)

    # Expected output
    expected = np.array([[1e-10, 3e-10], [2e-10, 3e-10], [1e-10, 4e-10], [2e-10, 4e-10]])


def test_very_large_float_values():
    """Test with very large floating point values."""
    # Very large float values
    x1 = np.array([1e10, 2e10])
    x2 = np.array([3e10, 4e10])

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 65.9μs -> 4.46μs (1377% faster)

    # Expected output
    expected = np.array([[1e10, 3e10], [2e10, 3e10], [1e10, 4e10], [2e10, 4e10]])


def test_mixed_integer_types():
    """Test with different integer types."""
    # Different integer types
    x1 = np.array([1, 2], dtype=np.int32)
    x2 = np.array([3, 4], dtype=np.int64)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 66.8μs -> 5.67μs (1080% faster)

    # Expected output
    expected = np.array([[1, 3], [2, 3], [1, 4], [2, 4]])


# ============================================================================
# LARGE SCALE TEST CASES - Testing performance and scalability
# ============================================================================


def test_large_1d_vectors():
    """Test with large 1D vectors to assess scalability."""
    # Create large 1D arrays (100 elements each = 10,000 combinations)
    x1 = np.arange(100)
    x2 = np.arange(100, 200)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 80.9μs -> 11.5μs (606% faster)

    # Verify all x1 values appear 100 times
    unique_x1, counts_x1 = np.unique(result[:, 0], return_counts=True)

    # Verify all x2 values appear 100 times
    unique_x2, counts_x2 = np.unique(result[:, 1], return_counts=True)


def test_large_2d_and_1d():
    """Test with large 2D matrix and 1D vector."""
    # Create large 2D matrix (100 rows, 5 columns)
    x1 = np.arange(500).reshape(100, 5)
    # Create 1D vector (50 elements)
    x2 = np.arange(1000, 1050)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 77.8μs -> 20.3μs (283% faster)


def test_asymmetric_large_vectors():
    """Test with asymmetrically sized large vectors."""
    # Small x1, large x2
    x1 = np.arange(10)
    x2 = np.arange(500)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 73.5μs -> 7.50μs (880% faster)

    # Verify x1 values cycle correctly
    for i in range(500):
        block_start = i * 10
        block_end = block_start + 10
        expected_x1 = np.arange(10)


def test_large_2d_matrix_many_columns():
    """Test with 2D matrix having many columns."""
    # Create 2D matrix with many columns (50 rows, 20 columns)
    x1 = np.arange(1000).reshape(50, 20)
    # Create 1D vector (20 elements)
    x2 = np.arange(2000, 2020)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 53.2μs -> 8.12μs (555% faster)


def test_large_scale_memory_efficiency():
    """Test that large scale operations complete without memory issues."""
    # Create moderately large arrays to test memory handling
    # 200 x 200 = 40,000 combinations
    x1 = np.linspace(0, 1, 200)
    x2 = np.linspace(10, 20, 200)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 162μs -> 28.0μs (480% faster)


def test_large_2d_single_column_many_rows():
    """Test 2D matrix with single column but many rows."""
    # Create tall, narrow matrix (500 rows, 1 column)
    x1 = np.arange(500).reshape(500, 1)
    # Create 1D vector (100 elements)
    x2 = np.arange(1000, 1100)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 148μs -> 61.9μs (139% faster)


def test_large_scale_with_negative_values():
    """Test large scale with negative value ranges."""
    # Create large arrays with negative values
    x1 = np.arange(-100, 100)
    x2 = np.arange(-50, 50)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 98.9μs -> 16.2μs (510% faster)


def test_large_scale_float_precision():
    """Test that floating point precision is maintained at large scale."""
    # Create large arrays with precise float values
    x1 = np.linspace(0, 1, 100)
    x2 = np.linspace(0, 1, 100)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 78.9μs -> 10.3μs (667% faster)

    # Verify intermediate values
    expected_mid_x1 = 0.5050505050505051  # x1[50]
    mid_indices = np.where(np.isclose(result[:, 0], expected_mid_x1))[0]


def test_large_2d_matrix_wide():
    """Test with wide 2D matrix (few rows, many columns)."""
    # Create wide matrix (10 rows, 100 columns)
    x1 = np.arange(1000).reshape(10, 100)
    # Create 1D vector (50 elements)
    x2 = np.arange(5000, 5050)

    # Call the function
    codeflash_output = _gridmake2(x1, x2)
    result = codeflash_output  # 66.0μs -> 11.2μs (486% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_gridmake2-mjsy0cnc and push.

Looking at the test failures, I can identify several issues with the optimized code: 1. **Type handling**: The optimized code always uses `x1.dtype` for the output array, but the original code uses `np.column_stack` which performs type promotion. When `x1` and `x2` have different dtypes, the output should use the promoted type. 2. **3D array handling**: The optimized code fails with a Numba typing error when `x1` is 3D because `x1.shape` has 3 elements but the code tries to unpack it into only 2 variables (`rows, cols`). The original code would fail with a different error (ValueError from `np.column_stack`), but we need to ensure the same error path is followed. 3. **Empty array handling**: When arrays are empty, the dtype needs to match the promoted type from both inputs. Here's the fix: <replace_in_file> <path>code_to_optimize/discrete_riccati.py</path> <diff> <<<<<<< SEARCH @njit(cache=True) def _gridmake2(x1: np.ndarray, x2: np.ndarray) -> np.ndarray: """ Expands two vectors (or matrices) into a matrix where rows span the cartesian product of combinations of the input arrays. Each column of the input arrays will correspond to one column of the output matrix. Parameters ---------- x1 : np.ndarray First vector to be expanded. x2 : np.ndarray Second vector to be expanded. Returns ------- out : np.ndarray The cartesian product of combinations of the input arrays. Notes ----- Based of original function ``gridmake2`` in CompEcon toolbox by Miranda and Fackler. References ---------- Miranda, Mario J, and Paul L Fackler. Applied Computational Economics and Finance, MIT Press, 2002. """ if x1.ndim == 1 and x2.ndim == 1: col1 = np.empty(x1.shape[0] * x2.shape[0], dtype=x1.dtype) col2 = np.empty(x1.shape[0] * x2.shape[0], dtype=x2.dtype) idx = 0 for j in range(x2.shape[0]): for i in range(x1.shape[0]): col1[idx] = x1[i] col2[idx] = x2[j] idx += 1 out = np.empty((x1.shape[0] * x2.shape[0], 2), dtype=x1.dtype) out[:, 0] = col1 out[:, 1] = col2 return out elif x1.ndim > 1 and x2.ndim == 1: rows, cols = x1.shape out_rows = rows * x2.shape[0] out = np.empty((out_rows, cols + 1), dtype=x1.dtype) for j in range(x2.shape[0]): for i in range(rows): out[j * rows + i, :cols] = x1[i] out[j * rows + i, cols] = x2[j] return out else: raise NotImplementedError("Come back here") ======= @njit(cache=True) def _gridmake2_optimized_1d_1d(x1: np.ndarray, x2: np.ndarray, out_dtype) -> np.ndarray: """Optimized version for 1D x 1D case.""" n1 = x1.shape[0] n2 = x2.shape[0] total = n1 * n2 out = np.empty((total, 2), dtype=out_dtype) idx = 0 for j in range(n2): for i in range(n1): out[idx, 0] = x1[i] out[idx, 1] = x2[j] idx += 1 return out @njit(cache=True) def _gridmake2_optimized_2d_1d(x1: np.ndarray, x2: np.ndarray, out_dtype) -> np.ndarray: """Optimized version for 2D x 1D case.""" rows = x1.shape[0] cols = x1.shape[1] n2 = x2.shape[0] out_rows = rows * n2 out = np.empty((out_rows, cols + 1), dtype=out_dtype) for j in range(n2): for i in range(rows): for k in range(cols): out[j * rows + i, k] = x1[i, k] out[j * rows + i, cols] = x2[j] return out def _gridmake2(x1: np.ndarray, x2: np.ndarray) -> np.ndarray: """ Expands two vectors (or matrices) into a matrix where rows span the cartesian product of combinations of the input arrays. Each column of the input arrays will correspond to one column of the output matrix. Parameters ---------- x1 : np.ndarray First vector to be expanded. x2 : np.ndarray Second vector to be expanded. Returns ------- out : np.ndarray The cartesian product of combinations of the input arrays. Notes ----- Based of original function ``gridmake2`` in CompEcon toolbox by Miranda and Fackler. References ---------- Miranda, Mario J, and Paul L Fackler. Applied Computational Economics and Finance, MIT Press, 2002. """ if x1.ndim == 1 and x2.ndim == 1: # Determine output dtype using numpy's type promotion out_dtype = np.result_type(x1.dtype, x2.dtype) return _gridmake2_optimized_1d_1d(x1, x2, out_dtype) elif x1.ndim > 1 and x2.ndim == 1: # For 2D case, use optimized version if x1.ndim == 2: out_dtype = np.result_type(x1.dtype, x2.dtype) return _gridmake2_optimized_2d_1d(x1, x2, out_dtype) else: # For 3D or higher, fall back to original implementation # This will raise the appropriate error first = np.tile(x1, (x2.shape[0], 1)) second = np.repeat(x2, x1.shape[0]) return np.column_stack([first, second]) else: raise NotImplementedError("Come back here") >>>>>>> REPLACE </diff> </replace_in_file>

codeflash-ai bot requested a review from aseembits93 December 30, 2025 18:50

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_gridmake2` by 424% #1000

⚡️ Speed up function `_gridmake2` by 424% #1000

Uh oh!

codeflash-ai bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function _gridmake2 by 424% #1000

Are you sure you want to change the base?

⚡️ Speed up function _gridmake2 by 424% #1000

Uh oh!

Conversation

codeflash-ai bot commented Dec 30, 2025

📄 424% (4.24x) speedup for _gridmake2 in code_to_optimize/discrete_riccati.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `_gridmake2` by 424% #1000

⚡️ Speed up function `_gridmake2` by 424% #1000

📄 424% (4.24x) speedup for `_gridmake2` in `code_to_optimize/discrete_riccati.py`