⚡️ Speed up method `DiscreteDP.evaluate_policy` by 28% #74

codeflash-ai · 2026-01-01T23:14:09Z

📄 28% (0.28x) speedup for `DiscreteDP.evaluate_policy` in `quantecon/markov/ddp.py`

⏱️ Runtime : 4.60 milliseconds → 3.58 milliseconds (best of 110 runs)

📝 Explanation and details

The optimized code achieves a 28% speedup by introducing Numba JIT compilation to accelerate key computational bottlenecks in the RQ_sigma and evaluate_policy methods.

Key Optimizations

1. Numba-accelerated indexing operations

Three JIT-compiled functions replace Python/NumPy indexing:

_rq_sigma_sa_pair_numba: Replaces the call to _find_indices followed by array indexing for state-action pair formulation. This eliminates Python overhead in the index lookup loop and subsequent fancy indexing operations.
_rq_sigma_regular_numba: Replaces fancy indexing R[np.arange(num_states), sigma] and Q[np.arange(num_states), sigma] with explicit loops. While NumPy's fancy indexing involves overhead for index validation and temporary array creation, Numba's JIT compilation produces optimized machine code that directly accesses memory locations.
_I_minus_beta_Q_sigma: JIT-compiles the matrix subtraction operation I - beta * Q_sigma, converting the Python/NumPy operation into compiled code with eliminated interpreter overhead.

2. Why this leads to speedup

For dense arrays (non-sparse case):

Line profiler shows _find_indices taking ~79% of time in RQ_sigma (6.37ms out of 8.05ms)
The Numba version combines index finding and array indexing in a single compiled function, eliminating:
- Python function call overhead
- Intermediate array allocations
- NumPy's indexing validation overhead
For the regular case, test results show 18-24% speedup on small to medium problems

For matrix operations:

The I - beta * Q_sigma operation, while only 15.1% of runtime, benefits from Numba's compiled arithmetic, especially visible in the 61.6% speedup on the large-scale performance test (300 states)

3. Impact on workloads

The optimization particularly benefits:

Dense formulations (product form): 8-20% faster on typical cases, up to 61% on large problems
Repeated evaluations: Numba's caching means the compilation cost is paid once, making subsequent calls very fast
State-action pair formulations with dense Q: Benefits from combined index finding and selection

The optimization has minimal impact on sparse matrices (0.7-1.5% slower), as scipy.sparse operations are already optimized and cannot be accelerated by Numba, so those code paths fall back to the original implementation.

4. Test case performance patterns

Small problems (2-3 states): 2-24% faster - dominated by reduced function call overhead
Medium problems (100 states): 8-9% faster - benefits from compiled loops
Large problems (200-300 states): Up to 61% faster - Numba's compiled code scales better than interpreted NumPy operations
Sparse matrices: Negligible change (±1.5%) - as expected, since sparse operations aren't JIT-compiled

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 28 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import numpy as np
# imports
import pytest
import scipy.sparse as sp
from quantecon.markov.ddp import DiscreteDP

# --- Function to test: DiscreteDP.evaluate_policy ---
# (Assume DiscreteDP and its dependencies are defined above, as per the provided code.)

# ------------------------
# 1. Basic Test Cases
# ------------------------

def test_basic_two_state_two_action_product_form():
    # Product form, 2 states, 2 actions, as in docstring example
    R = np.array([[5, 10], [-1, -float('inf')]])
    Q = np.array([
        [[0.5, 0.5], [0, 1]],
        [[0, 1], [0.5, 0.5]]  # Q[1,1] is arbitrary
    ])
    beta = 0.95
    ddp = DiscreteDP(R, Q, beta)
    # Policy: always take action 0
    sigma = [0, 0]
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 53.6μs -> 46.0μs (16.7% faster)

def test_basic_two_state_two_action_sa_pair_form():
    # State-action pair form, as in docstring example
    s_indices = [0, 0, 1]
    a_indices = [0, 1, 0]
    R = np.array([5, 10, -1])
    Q = np.array([
        [0.5, 0.5],
        [0, 1],
        [0, 1]
    ])
    beta = 0.95
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    sigma = [0, 0]  # always action 0
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 48.5μs -> 47.5μs (2.16% faster)

def test_basic_three_state_three_action_product_form():
    # 3 states, 3 actions, product form, simple transition
    R = np.array([
        [1, 2, 3],
        [0, 0, 0],
        [4, 5, 6]
    ])
    # Transition: always go to next state mod 3
    Q = np.zeros((3,3,3))
    for s in range(3):
        for a in range(3):
            Q[s,a,(s+1)%3] = 1
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta)
    sigma = [2, 0, 1]  # actions: s0->2, s1->0, s2->1
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 55.9μs -> 47.3μs (18.3% faster)

# ------------------------
# 2. Edge Test Cases
# ------------------------

def test_edge_zero_discount():
    # Beta = 0, should return immediate rewards
    R = np.array([[1, 2], [3, 4]])
    Q = np.zeros((2,2,2))
    beta = 0.0
    ddp = DiscreteDP(R, Q, beta)
    sigma = [1, 0]
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 53.8μs -> 46.2μs (16.5% faster)

def test_edge_negative_rewards():
    # Negative rewards, check computation
    R = np.array([[-5, -10], [-1, -2]])
    Q = np.array([
        [[1, 0], [0, 1]],
        [[0, 1], [1, 0]]
    ])
    beta = 0.5
    ddp = DiscreteDP(R, Q, beta)
    sigma = [0, 1]
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 56.3μs -> 46.8μs (20.1% faster)

def test_edge_single_state_single_action():
    # Only one state and one action
    R = np.array([[42]])
    Q = np.array([[[1.0]]])
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta)
    sigma = [0]
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 51.6μs -> 43.1μs (19.8% faster)

def test_edge_all_inf_rewards():
    # All rewards -inf except one feasible action
    R = np.array([[1, -np.inf], [-np.inf, 2]])
    Q = np.array([
        [[1, 0], [0, 1]],
        [[1, 0], [0, 1]]
    ])
    beta = 0.5
    ddp = DiscreteDP(R, Q, beta)
    sigma = [0, 1]
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 56.1μs -> 45.7μs (22.8% faster)

def test_edge_sparse_Q_sa_pair():
    # Test with sparse Q in state-action pair form
    s_indices = [0, 1]
    a_indices = [0, 1]
    R = np.array([1, 2])
    Q_dense = np.array([[0.7, 0.3], [0.4, 0.6]])
    Q_sparse = sp.csr_matrix(Q_dense)
    beta = 0.8
    ddp = DiscreteDP(R, Q_sparse, beta, s_indices, a_indices)
    sigma = [0, 1]
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 328μs -> 330μs (0.745% slower)
    # v0 = 1 + 0.8*(0.7*v0 + 0.3*v1)
    # v1 = 2 + 0.8*(0.4*v0 + 0.6*v1)
    # Solve:
    # v0 = 1 + 0.56*v0 + 0.24*v1
    # v1 = 2 + 0.32*v0 + 0.48*v1
    # (1-0.56)v0 - 0.24*v1 = 1
    # -0.32*v0 + (1-0.48)*v1 = 2
    # 0.44*v0 - 0.24*v1 = 1
    # -0.32*v0 + 0.52*v1 = 2
    # Solve manually or numerically:
    # Use numpy.linalg.solve for expected value:
    A = np.array([[0.44, -0.24], [-0.32, 0.52]])
    b = np.array([1, 2])
    expected_v = np.linalg.solve(A, b)

def test_edge_beta_one_raises():
    # Beta = 1 should raise NotImplementedError
    R = np.array([[1, 2], [3, 4]])
    Q = np.zeros((2,2,2))
    beta = 1.0
    ddp = DiscreteDP(R, Q, beta)
    sigma = [0, 1]
    with pytest.raises(NotImplementedError):
        ddp.evaluate_policy(sigma) # 1.91μs -> 2.06μs (7.41% slower)

def test_edge_policy_with_invalid_action():
    # Policy selects an action not feasible for a state
    R = np.array([[1, -np.inf], [2, 3]])
    Q = np.array([
        [[1,0], [0,1]],
        [[1,0], [0,1]]
    ])
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta)
    # Action 1 for state 0 is not feasible
    sigma = [1, 0]
    # Should still run, but value for state 0 should be -inf
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 56.7μs -> 45.6μs (24.2% faster)

# ------------------------
# 3. Large Scale Test Cases
# ------------------------

def test_large_scale_dense_product_form():
    # Large n, m, product form
    n = 100
    m = 5
    np.random.seed(42)
    R = np.random.rand(n, m)
    Q = np.zeros((n, m, n))
    for s in range(n):
        for a in range(m):
            # Random transition probabilities
            probs = np.random.rand(n)
            probs /= probs.sum()
            Q[s, a, :] = probs
    beta = 0.95
    ddp = DiscreteDP(R, Q, beta)
    # Policy: always action 0
    sigma = [0]*n
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 218μs -> 201μs (8.43% faster)

def test_large_scale_sparse_sa_pair():
    # Large n, state-action pair form, sparse Q
    n = 100
    m = 3
    np.random.seed(123)
    s_indices = []
    a_indices = []
    R = []
    Q_rows = []
    for s in range(n):
        for a in range(m):
            s_indices.append(s)
            a_indices.append(a)
            R.append(np.random.rand())
            # Random transition, but sparse
            row = np.zeros(n)
            idxs = np.random.choice(n, 2, replace=False)
            vals = np.random.rand(2)
            vals /= vals.sum()
            row[idxs] = vals
            Q_rows.append(row)
    R = np.array(R)
    Q_sparse = sp.csr_matrix(np.vstack(Q_rows))
    beta = 0.9
    ddp = DiscreteDP(R, Q_sparse, beta, s_indices, a_indices)
    # Policy: action 1 for all states
    sigma = [1]*n
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 537μs -> 545μs (1.54% slower)

def test_large_scale_random_policy():
    # Large n, random policy, product form
    n = 200
    m = 4
    np.random.seed(456)
    R = np.random.rand(n, m)
    Q = np.zeros((n, m, n))
    for s in range(n):
        for a in range(m):
            probs = np.random.rand(n)
            probs /= probs.sum()
            Q[s, a, :] = probs
    beta = 0.85
    ddp = DiscreteDP(R, Q, beta)
    sigma = np.random.randint(0, m, size=n)
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 636μs -> 623μs (2.10% faster)

def test_large_scale_performance():
    # Performance test: should not take excessive time
    n = 300
    m = 2
    np.random.seed(789)
    R = np.random.rand(n, m)
    Q = np.zeros((n, m, n))
    for s in range(n):
        for a in range(m):
            probs = np.random.rand(n)
            probs /= probs.sum()
            Q[s, a, :] = probs
    beta = 0.7
    ddp = DiscreteDP(R, Q, beta)
    sigma = [1]*n
    import time
    start = time.time()
    codeflash_output = ddp.evaluate_policy(sigma); v_sigma = codeflash_output # 2.44ms -> 1.51ms (61.6% faster)
    elapsed = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-DiscreteDP.evaluate_policy-mjw2ak48 and push.

The optimized code achieves a **28% speedup** by introducing **Numba JIT compilation** to accelerate key computational bottlenecks in the `RQ_sigma` and `evaluate_policy` methods. ## Key Optimizations ### 1. **Numba-accelerated indexing operations** Three JIT-compiled functions replace Python/NumPy indexing: - **`_rq_sigma_sa_pair_numba`**: Replaces the call to `_find_indices` followed by array indexing for state-action pair formulation. This eliminates Python overhead in the index lookup loop and subsequent fancy indexing operations. - **`_rq_sigma_regular_numba`**: Replaces fancy indexing `R[np.arange(num_states), sigma]` and `Q[np.arange(num_states), sigma]` with explicit loops. While NumPy's fancy indexing involves overhead for index validation and temporary array creation, Numba's JIT compilation produces optimized machine code that directly accesses memory locations. - **`_I_minus_beta_Q_sigma`**: JIT-compiles the matrix subtraction operation `I - beta * Q_sigma`, converting the Python/NumPy operation into compiled code with eliminated interpreter overhead. ### 2. **Why this leads to speedup** **For dense arrays (non-sparse case):** - Line profiler shows `_find_indices` taking ~79% of time in `RQ_sigma` (6.37ms out of 8.05ms) - The Numba version combines index finding and array indexing in a single compiled function, eliminating: - Python function call overhead - Intermediate array allocations - NumPy's indexing validation overhead - For the regular case, test results show **18-24% speedup** on small to medium problems **For matrix operations:** - The `I - beta * Q_sigma` operation, while only 15.1% of runtime, benefits from Numba's compiled arithmetic, especially visible in the **61.6% speedup** on the large-scale performance test (300 states) ### 3. **Impact on workloads** The optimization particularly benefits: - **Dense formulations** (product form): 8-20% faster on typical cases, up to 61% on large problems - **Repeated evaluations**: Numba's caching means the compilation cost is paid once, making subsequent calls very fast - **State-action pair formulations with dense Q**: Benefits from combined index finding and selection The optimization has **minimal impact on sparse matrices** (0.7-1.5% slower), as scipy.sparse operations are already optimized and cannot be accelerated by Numba, so those code paths fall back to the original implementation. ### 4. **Test case performance patterns** - **Small problems** (2-3 states): 2-24% faster - dominated by reduced function call overhead - **Medium problems** (100 states): 8-9% faster - benefits from compiled loops - **Large problems** (200-300 states): Up to 61% faster - Numba's compiled code scales better than interpreted NumPy operations - **Sparse matrices**: Negligible change (±1.5%) - as expected, since sparse operations aren't JIT-compiled

codeflash-ai bot requested a review from aseembits93 January 1, 2026 23:14

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `DiscreteDP.evaluate_policy` by 28% #74

⚡️ Speed up method `DiscreteDP.evaluate_policy` by 28% #74

Uh oh!

codeflash-ai bot commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method DiscreteDP.evaluate_policy by 28% #74

Are you sure you want to change the base?

⚡️ Speed up method DiscreteDP.evaluate_policy by 28% #74

Uh oh!

Conversation

codeflash-ai bot commented Jan 1, 2026

📄 28% (0.28x) speedup for DiscreteDP.evaluate_policy in quantecon/markov/ddp.py

📝 Explanation and details

Key Optimizations

1. Numba-accelerated indexing operations

2. Why this leads to speedup

3. Impact on workloads

4. Test case performance patterns

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `DiscreteDP.evaluate_policy` by 28% #74

⚡️ Speed up method `DiscreteDP.evaluate_policy` by 28% #74

📄 28% (0.28x) speedup for `DiscreteDP.evaluate_policy` in `quantecon/markov/ddp.py`