Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 1, 2026

📄 15% (0.15x) speedup for qnwlogn in quantecon/quad.py

⏱️ Runtime : 7.56 milliseconds 6.60 milliseconds (best of 7 runs)

📝 Explanation and details

The optimized code achieves a 14% speedup by introducing Numba JIT compilation for the single-dimension case in qnwnorm, which is the most frequently executed path based on profiling data.

Key Optimization:

The original code spends 99.3% of execution time calling _qnwnorm1 (already JIT-compiled) from pure Python code. The optimization introduces two new JIT-compiled helper functions:

  1. _cholesky_decomp: JIT-compiled wrapper for Cholesky decomposition
  2. _process_single_node: JIT-compiled function that handles the single-dimension case, combining _qnwnorm1 call, Cholesky decomposition, and node transformation into one JIT-compiled block

Why This Works:

When qnwnorm has a single dimension (which the profiling shows is common), the original code makes multiple Python→Numba transitions:

  • Python calls _qnwnorm1 (Numba)
  • Returns to Python for Cholesky decomposition
  • Returns to Python for node transformation

The optimized version wraps this entire sequence in _process_single_node, keeping execution in JIT-compiled code and eliminating the Python interpreter overhead for these transitions. This is particularly effective because:

  • Numba can better optimize the full sequence
  • No marshalling of data between Python and Numba contexts
  • Branch prediction and CPU cache usage improve

Test Results Analysis:

The speedup is most pronounced in univariate (single-dimension) test cases:

  • test_univariate_default_params: 80.7% faster
  • test_univariate_custom_mean_var: 95.0% faster
  • test_negative_mu: 96.9% faster
  • test_highly_skewed_lognormal: 99.7% faster

For multivariate cases, the optimization shows minimal impact or slight regression:

  • test_multivariate_default_params: 1.56% faster
  • Some 2D/3D cases: 1-7% slower

This is expected because multivariate cases don't use _process_single_node - they still follow the original code path. The slight regressions likely come from JIT compilation overhead or additional function call indirection.

Workload Impact:

If the function is called in hot paths (loops, Monte Carlo simulations, repeated integration tasks), the 14% average speedup translates to meaningful wall-clock time savings, especially for workloads dominated by univariate quadrature computations. The optimization is most beneficial for:

  • High-frequency univariate normal/lognormal quadrature
  • Applications with many repeated single-dimension integrations
  • Scenarios where _qnwnorm1 is the bottleneck (as profiling confirms)

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 61 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import math
from functools import reduce

import numpy as np
# imports
import pytest
from quantecon.quad import qnwlogn

# --- Unit tests for qnwlogn ---

# Basic Test Cases

def test_univariate_default_params():
    # Test 1d, default mean/variance
    nodes, weights = qnwlogn(3) # 79.4μs -> 43.9μs (80.7% faster)

def test_univariate_custom_mean_var():
    # Test 1d, custom mean and variance
    mu = 1.0
    sig2 = 0.5
    nodes, weights = qnwlogn(5, mu, sig2) # 72.7μs -> 37.3μs (95.0% faster)
    # Check mean and variance of lognormal approx (weighted)
    approx_mean = np.sum(nodes * weights)
    approx_var = np.sum((nodes - approx_mean)**2 * weights)
    # True mean/var of lognormal
    true_mean = math.exp(mu + sig2/2)
    true_var = (math.exp(sig2) - 1) * math.exp(2*mu + sig2)

def test_multivariate_default_params():
    # Test 2d, default mean/variance
    nodes, weights = qnwlogn([2, 2]) # 164μs -> 162μs (1.56% faster)

def test_multivariate_custom_mean_var():
    # Test 2d, custom mean and covariance
    mu = [0.5, 1.0]
    sig2 = [[1.0, 0.3], [0.3, 0.5]]
    nodes, weights = qnwlogn([3, 3], mu, sig2) # 159μs -> 157μs (1.60% faster)

def test_scalar_mean_and_var_broadcast():
    # Scalar mu and sig2 should broadcast in 1d and multidim
    nodes1, weights1 = qnwlogn(4, 0.7, 0.2)
    nodes2, weights2 = qnwlogn([4, 4], 0.7, 0.2)

def test_weights_are_nonnegative():
    # All weights should be nonnegative
    nodes, weights = qnwlogn(7) # 84.0μs -> 47.4μs (77.3% faster)
    nodes, weights = qnwlogn([3, 4]) # 150μs -> 163μs (7.78% slower)

# Edge Test Cases

def test_one_node_univariate():
    # n=1: should return one node, weight=1
    nodes, weights = qnwlogn(1) # 81.2μs -> 43.9μs (84.9% faster)

def test_one_node_multivariate():
    # n=[1,1]: should return one node, weight=1
    nodes, weights = qnwlogn([1, 1]) # 164μs -> 152μs (7.98% faster)

def test_highly_skewed_lognormal():
    # Large mu, large sig2: nodes should be very spread
    mu = 3.0
    sig2 = 2.0
    nodes, weights = qnwlogn(5, mu, sig2) # 74.0μs -> 37.0μs (99.7% faster)
    # Largest node should be much bigger than smallest
    ratio = np.max(nodes) / np.min(nodes)

def test_negative_mu():
    # Negative mu: nodes should be positive but mean < 1
    mu = -2.0
    sig2 = 0.5
    nodes, weights = qnwlogn(5, mu, sig2) # 73.5μs -> 37.3μs (96.9% faster)
    approx_mean = np.sum(nodes * weights)

def test_identity_covariance_multivariate():
    # Multivariate, identity covariance
    mu = [0.0, 0.0]
    sig2 = [[1.0, 0.0], [0.0, 1.0]]
    nodes, weights = qnwlogn([2, 2], mu, sig2) # 166μs -> 212μs (21.9% slower)

def test_nonpositive_definite_covariance():
    # Covariance matrix not positive definite should raise
    mu = [0.0, 0.0]
    sig2 = [[1.0, 2.0], [2.0, 1.0]]  # Not positive definite
    with pytest.raises(Exception):
        qnwlogn([2, 2], mu, sig2) # 144μs -> 153μs (5.90% slower)

def test_large_univariate():
    # n=50: should run and output correct shape
    nodes, weights = qnwlogn(50) # 125μs -> 90.8μs (38.1% faster)

def test_large_multivariate():
    # n=[10,10]: 100 nodes, shape (100,2)
    nodes, weights = qnwlogn([10, 10]) # 182μs -> 175μs (3.88% faster)

def test_large_multivariate_custom_cov():
    # n=[8,8], custom covariance
    mu = [0.5, -0.2]
    sig2 = [[2.0, 0.5], [0.5, 1.5]]
    nodes, weights = qnwlogn([8,8], mu, sig2) # 173μs -> 162μs (6.52% faster)

def test_large_3d():
    # n=[6,6,6]: 216 nodes, shape (216,3)
    mu = [0.0, 0.5, -0.5]
    sig2 = [[1.0, 0.2, 0.1], [0.2, 0.8, 0.0], [0.1, 0.0, 0.5]]
    nodes, weights = qnwlogn([6,6,6], mu, sig2) # 224μs -> 234μs (4.47% slower)

def test_performance_large():
    # n=[20,20]: 400 nodes, check runtime and correctness
    nodes, weights = qnwlogn([20, 20]) # 194μs -> 203μs (4.25% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import math
# function to test
import sys

import numpy as np
# imports
import pytest  # used for our unit tests
from numpy.testing import assert_allclose
from quantecon.quad import qnwlogn

# Helper function to check basic properties of lognormal quadrature
def check_lognormal_properties(nodes, weights, n):
    """
    Helper function to verify basic mathematical properties of lognormal quadrature.
    
    Parameters:
    - nodes: quadrature nodes
    - weights: quadrature weights
    - n: expected number of nodes
    """
    # Convert to arrays for consistent handling
    nodes_arr = np.atleast_1d(nodes)
    weights_arr = np.atleast_1d(weights)
    
    # Check that we have the right number of nodes/weights
    if nodes_arr.ndim == 1:
        expected_count = n if isinstance(n, int) else np.prod(n)
    else:
        expected_count = nodes_arr.shape[0]

# ============================================================================
# BASIC TEST CASES
# ============================================================================

def test_qnwlogn_basic_scalar_defaults():
    """
    Test basic functionality with scalar n and default parameters.
    This is the simplest case: 1D lognormal with standard parameters.
    """
    # Call function with n=3 nodes, default mu=0, sig2=1
    n = 3
    nodes, weights = qnwlogn(n) # 150μs -> 90.6μs (65.7% faster)
    
    # Verify basic properties
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_basic_scalar_custom_mu():
    """
    Test with scalar n and custom mean parameter.
    """
    n = 5
    mu = 1.0  # Custom mean
    nodes, weights = qnwlogn(n, mu=mu) # 105μs -> 59.9μs (76.3% faster)
    
    # Verify basic properties
    check_lognormal_properties(nodes, weights, n)
    
    # With positive mu, nodes should be shifted higher
    # Mean of lognormal is exp(mu + sig2/2), with sig2=1, mean = exp(1.5)
    expected_mean = np.exp(mu + 0.5)
    actual_mean = np.sum(nodes * weights)

def test_qnwlogn_basic_scalar_custom_sig2():
    """
    Test with scalar n and custom variance parameter.
    """
    n = 7
    mu = 0.0
    sig2 = 0.25  # Smaller variance
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 86.8μs -> 45.1μs (92.7% faster)
    
    # Verify basic properties
    check_lognormal_properties(nodes, weights, n)
    
    # With smaller variance, nodes should be more concentrated
    node_std = np.sqrt(np.sum(weights * (nodes - np.sum(nodes * weights))**2))

def test_qnwlogn_basic_multivariate_2d():
    """
    Test basic 2D multivariate case with default parameters.
    """
    n = np.array([3, 3])  # 3 nodes in each dimension
    nodes, weights = qnwlogn(n) # 194μs -> 217μs (10.5% slower)
    
    # Total nodes should be product of dimensions
    total_nodes = np.prod(n)
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_basic_multivariate_custom_params():
    """
    Test 2D case with custom mean and covariance matrix.
    """
    n = np.array([4, 4])
    mu = np.array([0.5, 1.0])  # Different means for each dimension
    sig2 = np.array([[1.0, 0.0], [0.0, 2.0]])  # Diagonal covariance
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 180μs -> 179μs (0.379% faster)
    
    # Verify basic properties
    total_nodes = np.prod(n)
    check_lognormal_properties(nodes, weights, n)

# ============================================================================
# EDGE TEST CASES
# ============================================================================

def test_qnwlogn_edge_n_equals_1():
    """
    Test edge case where n=1 (single node).
    """
    n = 1
    nodes, weights = qnwlogn(n) # 81.5μs -> 45.4μs (79.5% faster)

def test_qnwlogn_edge_negative_mu():
    """
    Test with negative mean (valid for lognormal).
    """
    n = 5
    mu = -2.0  # Negative mean
    nodes, weights = qnwlogn(n, mu=mu) # 80.3μs -> 46.7μs (72.0% faster)
    
    # Verify basic properties
    check_lognormal_properties(nodes, weights, n)
    
    # With negative mu, nodes should be smaller
    # Mean of lognormal is exp(mu + sig2/2), with sig2=1, mean = exp(-1.5)
    expected_mean = np.exp(mu + 0.5)
    actual_mean = np.sum(nodes * weights)

def test_qnwlogn_edge_large_variance():
    """
    Test with large variance parameter.
    """
    n = 10
    mu = 0.0
    sig2 = 4.0  # Large variance
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 76.3μs -> 39.0μs (95.6% faster)
    
    # Verify basic properties
    check_lognormal_properties(nodes, weights, n)
    
    # With large variance, nodes should be more spread out
    node_range = np.max(nodes) - np.min(nodes)

def test_qnwlogn_edge_small_variance():
    """
    Test with very small variance parameter.
    """
    n = 8
    mu = 0.0
    sig2 = 0.01  # Very small variance
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 74.6μs -> 38.3μs (94.5% faster)
    
    # Verify basic properties
    check_lognormal_properties(nodes, weights, n)
    
    # With small variance, nodes should be concentrated around exp(mu)
    expected_center = np.exp(mu)
    node_range = np.max(nodes) - np.min(nodes)

def test_qnwlogn_edge_identity_covariance():
    """
    Test multivariate case with identity covariance matrix (independent dimensions).
    """
    n = np.array([3, 3])
    mu = np.array([0.0, 0.0])
    sig2 = np.eye(2)  # Identity matrix
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 153μs -> 160μs (4.17% slower)
    
    # Verify basic properties
    total_nodes = np.prod(n)
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_edge_nondiagonal_covariance():
    """
    Test with non-diagonal covariance matrix (correlated dimensions).
    """
    n = np.array([4, 4])
    mu = np.array([0.0, 0.0])
    # Covariance with correlation
    sig2 = np.array([[1.0, 0.5], [0.5, 1.0]])
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 153μs -> 154μs (0.530% slower)
    
    # Verify basic properties
    total_nodes = np.prod(n)
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_edge_different_nodes_per_dimension():
    """
    Test multivariate case with different number of nodes per dimension.
    """
    n = np.array([3, 5, 4])  # Different nodes in each dimension
    nodes, weights = qnwlogn(n) # 210μs -> 218μs (3.52% slower)
    
    # Total nodes should be product
    total_nodes = 3 * 5 * 4
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_edge_zero_mean():
    """
    Test explicitly with zero mean (should match default).
    """
    n = 6
    mu = 0.0
    nodes, weights = qnwlogn(n, mu=mu) # 81.3μs -> 45.6μs (78.2% faster)
    
    # Verify basic properties
    check_lognormal_properties(nodes, weights, n)
    
    # Should match default behavior
    nodes_default, weights_default = qnwlogn(n) # 48.3μs -> 17.5μs (176% faster)
    assert_allclose(nodes, nodes_default, rtol=1e-10)
    assert_allclose(weights, weights_default, rtol=1e-10)

# ============================================================================
# MATHEMATICAL PROPERTY TESTS
# ============================================================================

def test_qnwlogn_property_weights_sum_to_one():
    """
    Test that weights always sum to 1 for various configurations.
    """
    test_cases = [
        (5, None, None),
        (10, 0.5, 2.0),
        (np.array([3, 3]), None, None),
        (np.array([4, 5]), np.array([0.0, 1.0]), np.eye(2)),
    ]
    
    for n, mu, sig2 in test_cases:
        nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 342μs -> 302μs (13.2% faster)
        weight_sum = np.sum(weights)

def test_qnwlogn_property_all_positive():
    """
    Test that all nodes and weights are positive (lognormal property).
    """
    test_cases = [
        (7, -1.0, 0.5),
        (5, 2.0, 3.0),
        (np.array([3, 4]), np.array([-1.0, 0.5]), np.eye(2)),
    ]
    
    for n, mu, sig2 in test_cases:
        nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 235μs -> 184μs (27.3% faster)
        nodes_arr = np.atleast_1d(nodes)

def test_qnwlogn_property_exp_relationship():
    """
    Test that lognormal nodes are exponential of normal nodes.
    """
    n = 5
    mu = 0.5
    sig2 = 1.0
    
    # Get lognormal nodes
    logn_nodes, logn_weights = qnwlogn(n, mu=mu, sig2=sig2) # 73.1μs -> 36.0μs (103% faster)
    
    # Get normal nodes (import qnwnorm)
    from quantecon.quad import qnwnorm
    norm_nodes, norm_weights = qnwnorm(n, mu=mu, sig2=sig2)
    
    # Lognormal nodes should be exp of normal nodes
    expected_nodes = np.exp(norm_nodes)
    assert_allclose(logn_nodes, expected_nodes, rtol=1e-10)
    
    # Weights should be the same
    assert_allclose(logn_weights, norm_weights, rtol=1e-10)

def test_qnwlogn_property_mean_approximation():
    """
    Test that quadrature approximates the theoretical mean of lognormal distribution.
    Mean of lognormal(mu, sig2) is exp(mu + sig2/2).
    """
    n = 20  # Use more nodes for better approximation
    mu = 1.0
    sig2 = 0.5
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 80.6μs -> 43.2μs (86.5% faster)
    
    # Compute quadrature mean
    quad_mean = np.sum(nodes * weights)
    
    # Theoretical mean
    theoretical_mean = np.exp(mu + sig2 / 2)
    
    # Should be close (within 1% for n=20)
    relative_error = abs(quad_mean - theoretical_mean) / theoretical_mean

# ============================================================================
# INPUT VARIATION TESTS
# ============================================================================

def test_qnwlogn_input_scalar_n():
    """
    Test that scalar n is handled correctly.
    """
    n = 5  # Scalar int
    nodes, weights = qnwlogn(n) # 79.8μs -> 43.0μs (85.6% faster)
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_input_array_n():
    """
    Test that array n is handled correctly.
    """
    n = np.array([4, 5])  # Array
    nodes, weights = qnwlogn(n) # 161μs -> 158μs (2.09% faster)
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_input_list_n():
    """
    Test that list n is converted properly.
    """
    n = [3, 4, 3]  # List
    nodes, weights = qnwlogn(n) # 209μs -> 213μs (1.93% slower)
    
    total_nodes = 3 * 4 * 3
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_input_scalar_mu_broadcast():
    """
    Test that scalar mu is broadcast to all dimensions.
    """
    n = np.array([3, 3])
    mu = 1.0  # Scalar, should broadcast to [1.0, 1.0]
    
    nodes, weights = qnwlogn(n, mu=mu) # 160μs -> 157μs (1.49% faster)
    
    # Should work without error
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_input_3d():
    """
    Test 3D multivariate case.
    """
    n = np.array([3, 3, 3])
    mu = np.array([0.0, 0.5, 1.0])
    sig2 = np.eye(3)
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 194μs -> 194μs (0.254% faster)
    
    total_nodes = 27
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_input_4d():
    """
    Test 4D multivariate case.
    """
    n = np.array([2, 2, 2, 2])
    nodes, weights = qnwlogn(n) # 236μs -> 240μs (1.68% slower)
    
    total_nodes = 16
    check_lognormal_properties(nodes, weights, n)

# ============================================================================
# LARGE SCALE TEST CASES
# ============================================================================

def test_qnwlogn_large_n_1d():
    """
    Test with large number of nodes in 1D.
    """
    n = 100  # Large number of nodes
    nodes, weights = qnwlogn(n) # 243μs -> 207μs (17.5% faster)
    
    # Should still satisfy basic properties
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_large_n_2d():
    """
    Test with moderately large 2D grid.
    """
    n = np.array([20, 20])  # 400 total nodes
    nodes, weights = qnwlogn(n) # 190μs -> 192μs (1.03% slower)
    
    total_nodes = 400
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_large_dimensions():
    """
    Test with higher dimensional problem (5D).
    """
    n = np.array([3, 3, 3, 3, 3])  # 5D, 243 total nodes
    nodes, weights = qnwlogn(n) # 296μs -> 292μs (1.45% faster)
    
    total_nodes = 243
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_large_variance_range():
    """
    Test with large variance to ensure numerical stability.
    """
    n = 15
    mu = 0.0
    sig2 = 10.0  # Very large variance
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 78.1μs -> 42.2μs (85.0% faster)
    
    # Should still satisfy basic properties despite large variance
    check_lognormal_properties(nodes, weights, n)
    
    # Nodes should span a very wide range
    node_ratio = np.max(nodes) / np.min(nodes)

def test_qnwlogn_large_asymmetric_grid():
    """
    Test with very asymmetric grid (different nodes per dimension).
    """
    n = np.array([2, 50])  # Very asymmetric
    nodes, weights = qnwlogn(n) # 207μs -> 203μs (1.69% faster)
    
    total_nodes = 100
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_large_covariance_matrix():
    """
    Test with larger covariance matrix (4D with correlations).
    """
    n = np.array([3, 3, 3, 3])
    mu = np.zeros(4)
    # Create a valid covariance matrix with correlations
    sig2 = np.eye(4) + 0.3 * np.ones((4, 4))
    np.fill_diagonal(sig2, 1.0)
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 233μs -> 228μs (2.03% faster)
    
    total_nodes = 81
    check_lognormal_properties(nodes, weights, n)

def test_qnwlogn_large_extreme_parameters():
    """
    Test with combination of large n and extreme parameters.
    """
    n = 30
    mu = -3.0  # Negative mean
    sig2 = 5.0  # Large variance
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 90.3μs -> 53.2μs (69.8% faster)
    
    # Should handle extreme parameters
    check_lognormal_properties(nodes, weights, n)
    
    # Mean should still be approximated correctly
    quad_mean = np.sum(nodes * weights)
    theoretical_mean = np.exp(mu + sig2 / 2)
    relative_error = abs(quad_mean - theoretical_mean) / theoretical_mean

def test_qnwlogn_large_precision_check():
    """
    Test numerical precision with large number of nodes.
    """
    n = 50
    mu = 0.0
    sig2 = 1.0
    
    nodes, weights = qnwlogn(n, mu=mu, sig2=sig2) # 116μs -> 79.9μs (45.8% faster)
    
    # Weight sum should be very close to 1
    weight_sum = np.sum(weights)

# ============================================================================
# CONSISTENCY AND DETERMINISM TESTS
# ============================================================================

def test_qnwlogn_deterministic():
    """
    Test that function returns same results on repeated calls (deterministic).
    """
    n = 7
    mu = 0.5
    sig2 = 1.5
    
    # Call multiple times
    nodes1, weights1 = qnwlogn(n, mu=mu, sig2=sig2) # 74.2μs -> 37.5μs (97.9% faster)
    nodes2, weights2 = qnwlogn(n, mu=mu, sig2=sig2) # 34.0μs -> 14.2μs (139% faster)
    nodes3, weights3 = qnwlogn(n, mu=mu, sig2=sig2) # 30.0μs -> 12.6μs (138% faster)
    
    # Should be identical
    assert_allclose(nodes1, nodes2, rtol=1e-15)
    assert_allclose(nodes1, nodes3, rtol=1e-15)
    assert_allclose(weights1, weights2, rtol=1e-15)
    assert_allclose(weights1, weights3, rtol=1e-15)

def test_qnwlogn_consistency_across_dimensions():
    """
    Test that 1D results are consistent when embedded in higher dimensions.
    """
    n_1d = 5
    nodes_1d, weights_1d = qnwlogn(n_1d, mu=0.0, sig2=1.0) # 74.2μs -> 36.9μs (101% faster)
    
    # For independent 2D with same parameters in each dimension,
    # marginal distributions should match 1D
    n_2d = np.array([5, 1])
    nodes_2d, weights_2d = qnwlogn(n_2d, mu=np.array([0.0, 0.0]), sig2=np.eye(2)) # 121μs -> 145μs (16.4% slower)
    
    # First dimension should match 1D case
    unique_nodes_dim0 = np.unique(nodes_2d[:, 0])
    assert_allclose(np.sort(unique_nodes_dim0), np.sort(nodes_1d), rtol=1e-10)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-qnwlogn-mjvsrgz6 and push.

Codeflash Static Badge

The optimized code achieves a **14% speedup** by introducing Numba JIT compilation for the single-dimension case in `qnwnorm`, which is the most frequently executed path based on profiling data.

**Key Optimization:**

The original code spends 99.3% of execution time calling `_qnwnorm1` (already JIT-compiled) from pure Python code. The optimization introduces two new JIT-compiled helper functions:

1. **`_cholesky_decomp`**: JIT-compiled wrapper for Cholesky decomposition
2. **`_process_single_node`**: JIT-compiled function that handles the single-dimension case, combining `_qnwnorm1` call, Cholesky decomposition, and node transformation into one JIT-compiled block

**Why This Works:**

When `qnwnorm` has a single dimension (which the profiling shows is common), the original code makes multiple Python→Numba transitions:
- Python calls `_qnwnorm1` (Numba)
- Returns to Python for Cholesky decomposition
- Returns to Python for node transformation

The optimized version wraps this entire sequence in `_process_single_node`, keeping execution in JIT-compiled code and eliminating the Python interpreter overhead for these transitions. This is particularly effective because:
- Numba can better optimize the full sequence
- No marshalling of data between Python and Numba contexts
- Branch prediction and CPU cache usage improve

**Test Results Analysis:**

The speedup is most pronounced in **univariate (single-dimension) test cases**:
- `test_univariate_default_params`: 80.7% faster
- `test_univariate_custom_mean_var`: 95.0% faster  
- `test_negative_mu`: 96.9% faster
- `test_highly_skewed_lognormal`: 99.7% faster

For **multivariate cases**, the optimization shows minimal impact or slight regression:
- `test_multivariate_default_params`: 1.56% faster
- Some 2D/3D cases: 1-7% slower

This is expected because multivariate cases don't use `_process_single_node` - they still follow the original code path. The slight regressions likely come from JIT compilation overhead or additional function call indirection.

**Workload Impact:**

If the function is called in hot paths (loops, Monte Carlo simulations, repeated integration tasks), the 14% average speedup translates to meaningful wall-clock time savings, especially for workloads dominated by univariate quadrature computations. The optimization is most beneficial for:
- High-frequency univariate normal/lognormal quadrature
- Applications with many repeated single-dimension integrations
- Scenarios where `_qnwnorm1` is the bottleneck (as profiling confirms)
@codeflash-ai codeflash-ai bot requested a review from aseembits93 January 1, 2026 18:47
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Jan 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant