Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 175% (1.75x) speedup for LinearStateSpace.impulse_response in quantecon/_lss.py

⏱️ Runtime : 1.62 milliseconds 590 microseconds (best of 250 runs)

📝 Explanation and details

The optimization achieves a 174% speedup by leveraging Numba's Just-In-Time (JIT) compilation to accelerate matrix operations in the core computational loop of impulse_response.

Key optimizations applied:

  1. Numba JIT compilation: The main computational logic is extracted into a separate _impulse_response_numba method decorated with @njit(cache=True). This compiles the matrix multiplication-heavy loop to optimized machine code, dramatically improving performance of the repeated @ operations.

  2. Efficient memory operations: The numba implementation uses explicit .copy() calls and optimized matrix multiplication patterns that JIT can better optimize compared to Python's interpreted execution.

  3. Input validation preservation: The optimized version maintains identical error behavior by validating the j parameter with range(j) before delegating to the numba function, ensuring TypeError is raised for invalid types like floats, strings, or None.

Performance characteristics:

  • Small systems (scalar/2D): 50-100% speedup due to JIT compilation overhead being offset by faster matrix ops
  • Larger systems (50x50 matrices): 42-100% speedup as JIT benefits scale with computation complexity
  • High iteration counts (j=100-500): 180-250% speedup where the loop-heavy computation sees maximum JIT benefit
  • Error cases: 280-320% speedup for invalid inputs due to early validation without numba overhead

The optimization is particularly effective for the common use cases in linear state space models where impulse response calculations are performed repeatedly with varying matrix sizes and iteration counts. The cached JIT compilation ensures subsequent calls avoid compilation overhead, making this ideal for hot path usage in econometric modeling workflows.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 76 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from textwrap import dedent

# function to test
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon._lss import LinearStateSpace

# unit tests

# ========== BASIC TEST CASES ==========

def test_impulse_response_basic_scalar():
    # Scalar system: A, C, G all 1x1
    lss = LinearStateSpace(A=[[2]], C=[[1]], G=[[3]])
    xcoef, ycoef = lss.impulse_response(j=3) # 9.04μs -> 4.38μs (107% faster)
    # xcoef: [C, AC, A^2C, A^3C] = [1, 2*1=2, 2^2*1=4, 2^3*1=8]
    expected_x = [np.array([[1.0]]), np.array([[2.0]]), np.array([[4.0]]), np.array([[8.0]])]
    # ycoef: [G*C, G*AC, G*A^2C, G*A^3C] = [3*1=3, 3*2=6, 3*4=12, 3*8=24]
    expected_y = [np.array([[3.0]]), np.array([[6.0]]), np.array([[12.0]]), np.array([[24.0]])]
    for a, b in zip(xcoef, expected_x):
        pass
    for a, b in zip(ycoef, expected_y):
        pass

def test_impulse_response_basic_2d():
    # 2D system
    A = [[1, 0], [0, 2]]
    C = [[1, 0], [0, 1]]
    G = [[1, 0], [0, 1]]
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=2) # 7.58μs -> 4.12μs (83.9% faster)

def test_impulse_response_basic_non_square_C():
    # C is n x m, m > 1
    A = [[1, 0], [0, 1]]
    C = [[1, 2], [3, 4]]
    G = [[1, 0], [0, 1]]
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=1) # 5.08μs -> 3.33μs (52.5% faster)

def test_impulse_response_basic_j_zero():
    # j=0 should return only C and G@C
    lss = LinearStateSpace(A=[[1]], C=[[2]], G=[[3]])
    xcoef, ycoef = lss.impulse_response(j=0) # 2.46μs -> 2.17μs (13.4% faster)

# ========== EDGE TEST CASES ==========

def test_impulse_response_edge_zero_matrix_A():
    # A is zero matrix
    A = [[0, 0], [0, 0]]
    C = [[1, 2], [3, 4]]
    G = [[1, 0], [0, 1]]
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=2) # 7.42μs -> 3.92μs (89.4% faster)

def test_impulse_response_edge_identity_A():
    # A is identity, so A^k @ C == C for all k
    A = [[1, 0], [0, 1]]
    C = [[5, 6], [7, 8]]
    G = [[1, 0], [0, 1]]
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=4) # 11.9μs -> 5.42μs (119% faster)
    for i in range(1, 5):
        pass

def test_impulse_response_edge_zero_C():
    # C is zero matrix
    A = [[2, 0], [0, 3]]
    C = [[0, 0], [0, 0]]
    G = [[1, 0], [0, 1]]
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=2) # 7.42μs -> 3.92μs (89.4% faster)
    for arr in xcoef:
        pass
    for arr in ycoef:
        pass

def test_impulse_response_edge_zero_G():
    # G is zero, so ycoef always zero
    A = [[2, 0], [0, 3]]
    C = [[1, 2], [3, 4]]
    G = [[0, 0], [0, 0]]
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=2) # 7.38μs -> 3.96μs (86.3% faster)
    for arr in ycoef:
        pass

def test_impulse_response_edge_non_square_A_raises():
    # A must be square
    with pytest.raises(ValueError):
        LinearStateSpace(A=[[1,2,3],[4,5,6]], C=[[1,2],[3,4]], G=[[1,0],[0,1]])

def test_impulse_response_edge_incompatible_C_raises():
    # C must have compatible dimensions with A
    with pytest.raises(ValueError):
        LinearStateSpace(A=[[1,0],[0,1]], C=[[1,2,3],[4,5,6],[7,8,9]], G=[[1,0],[0,1]])

def test_impulse_response_edge_incompatible_G_raises():
    # G must have compatible dimensions with A
    with pytest.raises(ValueError):
        LinearStateSpace(A=[[1,0],[0,1]], C=[[1,2],[3,4]], G=[[1,0,0],[0,1,0]])

def test_impulse_response_edge_negative_j():
    # Negative j should still return at least C and G@C, and then loop zero times
    lss = LinearStateSpace(A=[[1]], C=[[2]], G=[[3]])
    xcoef, ycoef = lss.impulse_response(j=-1) # 2.58μs -> 2.29μs (12.7% faster)

def test_impulse_response_edge_j_is_float():
    # j is float, should raise TypeError or behave as int
    lss = LinearStateSpace(A=[[1]], C=[[2]], G=[[3]])
    # Accepts float if it can be used as range
    with pytest.raises(TypeError):
        lss.impulse_response(j=2.5) # 2.71μs -> 709ns (282% faster)

def test_impulse_response_edge_j_is_string():
    # j is string, should raise TypeError
    lss = LinearStateSpace(A=[[1]], C=[[2]], G=[[3]])
    with pytest.raises(TypeError):
        lss.impulse_response(j="2") # 2.62μs -> 666ns (294% faster)

def test_impulse_response_edge_j_is_none():
    # j is None, should raise TypeError
    lss = LinearStateSpace(A=[[1]], C=[[2]], G=[[3]])
    with pytest.raises(TypeError):
        lss.impulse_response(j=None) # 2.62μs -> 625ns (320% faster)

def test_impulse_response_edge_j_is_large_zero():
    # j = 0, should return only C and G@C
    lss = LinearStateSpace(A=[[1]], C=[[2]], G=[[3]])
    xcoef, ycoef = lss.impulse_response(j=0) # 2.42μs -> 2.29μs (5.50% faster)

# ========== LARGE SCALE TEST CASES ==========

def test_impulse_response_large_scale_10x10():
    # Large 10x10 matrices
    n = 10
    A = np.eye(n)
    C = np.ones((n, n))
    G = np.eye(n)
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=10) # 30.2μs -> 14.0μs (115% faster)
    # All xcoef should be ones
    for arr in xcoef:
        pass
    # All ycoef should be ones
    for arr in ycoef:
        pass

def test_impulse_response_large_scale_random():
    # Random 50x50 matrices
    np.random.seed(42)
    n = 50
    A = np.eye(n) * 0.9 + np.random.randn(n, n) * 0.01
    C = np.random.randn(n, n)
    G = np.eye(n)
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=5) # 65.5μs -> 46.0μs (42.5% faster)
    # Check shapes
    for arr in xcoef:
        pass
    for arr in ycoef:
        pass

def test_impulse_response_large_scale_high_j():
    # Large j, but not exceeding 1000 steps
    n = 5
    A = np.eye(n)
    C = np.ones((n, n))
    G = np.eye(n)
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=100) # 239μs -> 85.2μs (181% faster)
    for arr in xcoef:
        pass
    for arr in ycoef:
        pass

def test_impulse_response_large_scale_nontrivial():
    # Large, nontrivial system
    n = 20
    m = 20
    k = 20
    np.random.seed(1)
    A = np.eye(n) * 0.5 + np.random.randn(n, n) * 0.01
    C = np.random.randn(n, m)
    G = np.random.randn(k, n)
    lss = LinearStateSpace(A=A, C=C, G=G)
    xcoef, ycoef = lss.impulse_response(j=10) # 53.2μs -> 29.9μs (77.9% faster)
    # Check shapes
    for arr in xcoef:
        pass
    for arr in ycoef:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from textwrap import dedent

# function to test
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon._lss import LinearStateSpace

# unit tests

# ========== BASIC TESTS ==========

def test_impulse_response_scalar_A_C_G():
    # Test with scalar A, C, G (all 1x1 matrices)
    A = [[2.0]]
    C = [[1.0]]
    G = [[3.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=3) # 9.17μs -> 4.42μs (108% faster)
    # xcoef should be: [C, A*C, A^2*C, A^3*C]
    expected_x = [
        [[1.0]],
        [[2.0]],
        [[4.0]],
        [[8.0]]
    ]
    # ycoef should be: [G*C, G*A*C, G*A^2*C, G*A^3*C]
    expected_y = [
        [[3.0]],
        [[6.0]],
        [[12.0]],
        [[24.0]]
    ]
    for i in range(4):
        pass

def test_impulse_response_2d_A_C_G():
    # Test with 2x2 A, 2x1 C, 1x2 G
    A = [[1.0, 2.0], [0.0, 1.0]]
    C = [[1.0], [0.0]]
    G = [[1.0, 1.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=2) # 7.96μs -> 4.29μs (85.4% faster)
    # xcoef[0] = C
    # xcoef[1] = A*C = [[1.0],[0.0]]
    # xcoef[2] = A^2*C = A*(A*C) = [[1.0],[0.0]]
    expected_x = [
        [[1.0],[0.0]],
        [[1.0],[0.0]],
        [[1.0],[0.0]]
    ]
    # ycoef[0] = G*C = [[1.0]]
    # ycoef[1] = G*A*C = G*[[1.0],[0.0]] = [[1.0]]
    # ycoef[2] = G*A^2*C = [[1.0]]
    expected_y = [
        [[1.0]],
        [[1.0]],
        [[1.0]]
    ]
    for i in range(3):
        pass

def test_impulse_response_j_zero():
    # Test with j=0 (should return C and G*C only)
    A = [[1.0]]
    C = [[2.0]]
    G = [[3.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=0) # 2.46μs -> 2.17μs (13.5% faster)

def test_impulse_response_j_one():
    # Test with j=1 (should return C, A*C for x; G*C, G*A*C for y)
    A = [[2.0]]
    C = [[1.0]]
    G = [[1.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=1) # 4.79μs -> 3.08μs (55.4% faster)

def test_impulse_response_non_square_A_raises():
    # Test that non-square A raises ValueError
    with pytest.raises(ValueError):
        LinearStateSpace([[1,2]], [[1],[1]], [[1,1]])

def test_impulse_response_incompatible_C_raises():
    # Test that C with wrong shape raises ValueError
    with pytest.raises(ValueError):
        LinearStateSpace([[1,0],[0,1]], [[1,2]], [[1,1]])

def test_impulse_response_incompatible_G_raises():
    # Test that G with wrong shape raises ValueError
    with pytest.raises(ValueError):
        LinearStateSpace([[1,0],[0,1]], [[1],[0]], [[1,1,1]])

def test_impulse_response_default_j():
    # Test that default j=5 returns 6 elements in xcoef/ycoef
    lss = LinearStateSpace([[1.0]], [[1.0]], [[1.0]])
    xcoef, ycoef = lss.impulse_response() # 13.1μs -> 5.50μs (138% faster)

# ========== EDGE TESTS ==========

def test_impulse_response_j_negative():
    # Test that negative j returns only the initial coefficients
    lss = LinearStateSpace([[1.0]], [[1.0]], [[1.0]])
    xcoef, ycoef = lss.impulse_response(j=-1) # 2.42μs -> 2.21μs (9.47% faster)

def test_impulse_response_j_large_but_zero_A():
    # Test with A = zero matrix, so all powers of A*C are zero
    A = [[0.0]]
    C = [[5.0]]
    G = [[2.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=3) # 8.92μs -> 4.21μs (112% faster)
    expected_x = [
        [[5.0]],
        [[0.0]],
        [[0.0]],
        [[0.0]]
    ]
    expected_y = [
        [[10.0]],
        [[0.0]],
        [[0.0]],
        [[0.0]]
    ]
    for i in range(4):
        pass

def test_impulse_response_C_zero():
    # Test with C = zero matrix, so all coefficients are zero
    A = [[2.0]]
    C = [[0.0]]
    G = [[3.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=2) # 6.83μs -> 3.54μs (92.9% faster)
    for arr in xcoef + ycoef:
        pass

def test_impulse_response_G_zero():
    # Test with G = zero matrix, so all ycoef are zero
    A = [[2.0]]
    C = [[1.0]]
    G = [[0.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=2) # 6.83μs -> 3.50μs (95.2% faster)
    for arr in ycoef:
        pass
    # xcoef should be as usual
    expected_x = [
        [[1.0]],
        [[2.0]],
        [[4.0]]
    ]
    for i in range(3):
        pass

def test_impulse_response_multiple_innovations():
    # Test with C as n x m with m > 1 (multiple innovations)
    A = [[1.0,0.0],[0.0,1.0]]
    C = [[1.0,0.0],[0.0,2.0]]
    G = [[1.0,0.0],[0.0,1.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=1) # 5.33μs -> 3.46μs (54.3% faster)

def test_impulse_response_G_multiple_outputs():
    # Test with G as k x n with k > 1 (multiple outputs)
    A = [[1.0,0.0],[0.0,1.0]]
    C = [[1.0],[2.0]]
    G = [[1.0,0.0],[0.0,1.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=1) # 5.29μs -> 3.50μs (51.2% faster)

def test_impulse_response_A_identity():
    # Test with A as identity matrix
    A = np.eye(2)
    C = [[1.0],[2.0]]
    G = [[1.0,0.0],[0.0,1.0]]
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=2) # 7.42μs -> 4.04μs (83.5% faster)
    # xcoef should all be C
    for arr in xcoef:
        pass
    for arr in ycoef:
        pass

def test_impulse_response_float_inputs():
    # Test with float inputs (not lists)
    lss = LinearStateSpace(2.0, 1.0, 3.0)
    xcoef, ycoef = lss.impulse_response(j=2) # 6.96μs -> 3.62μs (91.9% faster)
    expected_x = [
        [[1.0]],
        [[2.0]],
        [[4.0]]
    ]
    expected_y = [
        [[3.0]],
        [[6.0]],
        [[12.0]]
    ]
    for i in range(3):
        pass

# ========== LARGE SCALE TESTS ==========

def test_impulse_response_large_j():
    # Test with large j (but not exceeding 1000)
    lss = LinearStateSpace([[1.1]], [[1.0]], [[2.0]])
    xcoef, ycoef = lss.impulse_response(j=500) # 1.02ms -> 295μs (246% faster)

def test_impulse_response_large_dimension():
    # Test with large n (state dimension), but keep under 1000
    n = 50
    A = np.eye(n)
    C = np.ones((n,1))
    G = np.eye(n)
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=3) # 22.6μs -> 15.5μs (45.6% faster)
    # All xcoef should be ones
    for arr in xcoef:
        pass
    for arr in ycoef:
        pass

def test_impulse_response_large_m():
    # Test with large m (number of innovations), but keep under 1000
    n = 10
    m = 20
    A = np.eye(n)
    C = np.ones((n,m))
    G = np.eye(n)
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=2) # 9.83μs -> 5.92μs (66.2% faster)
    for arr in xcoef:
        pass
    for arr in ycoef:
        pass

def test_impulse_response_large_k():
    # Test with large k (number of outputs), but keep under 1000
    n = 10
    k = 20
    A = np.eye(n)
    C = np.ones((n,1))
    G = np.ones((k,n))
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=2) # 7.92μs -> 4.75μs (66.7% faster)
    # xcoef: all ones
    for arr in xcoef:
        pass
    # ycoef: each entry is sum of n ones = n
    for arr in ycoef:
        pass

def test_impulse_response_large_all():
    # Test with all dimensions large but under 1000
    n = 20
    m = 30
    k = 40
    A = np.eye(n)
    C = np.ones((n,m))
    G = np.ones((k,n))
    lss = LinearStateSpace(A, C, G)
    xcoef, ycoef = lss.impulse_response(j=1) # 10.4μs -> 7.46μs (39.1% faster)
    # xcoef: all ones
    for arr in xcoef:
        pass
    # ycoef: each entry is sum of n ones = n
    for arr in ycoef:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-LinearStateSpace.impulse_response-mja2kt97 and push.

Codeflash Static Badge

The optimization achieves a **174% speedup** by leveraging Numba's Just-In-Time (JIT) compilation to accelerate matrix operations in the core computational loop of `impulse_response`.

**Key optimizations applied:**

1. **Numba JIT compilation**: The main computational logic is extracted into a separate `_impulse_response_numba` method decorated with `@njit(cache=True)`. This compiles the matrix multiplication-heavy loop to optimized machine code, dramatically improving performance of the repeated `@` operations.

2. **Efficient memory operations**: The numba implementation uses explicit `.copy()` calls and optimized matrix multiplication patterns that JIT can better optimize compared to Python's interpreted execution.

3. **Input validation preservation**: The optimized version maintains identical error behavior by validating the `j` parameter with `range(j)` before delegating to the numba function, ensuring TypeError is raised for invalid types like floats, strings, or None.

**Performance characteristics:**
- **Small systems** (scalar/2D): 50-100% speedup due to JIT compilation overhead being offset by faster matrix ops
- **Larger systems** (50x50 matrices): 42-100% speedup as JIT benefits scale with computation complexity  
- **High iteration counts** (j=100-500): 180-250% speedup where the loop-heavy computation sees maximum JIT benefit
- **Error cases**: 280-320% speedup for invalid inputs due to early validation without numba overhead

The optimization is particularly effective for the common use cases in linear state space models where impulse response calculations are performed repeatedly with varying matrix sizes and iteration counts. The cached JIT compilation ensures subsequent calls avoid compilation overhead, making this ideal for hot path usage in econometric modeling workflows.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 December 17, 2025 13:51
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Copy link

@misrasaurabh1 misrasaurabh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, i would maybe remove the range typeerror checks but otherwise looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants