⚡️ Speed up function qnwlogn by 15%
#64
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 15% (0.15x) speedup for
qnwlogninquantecon/quad.py⏱️ Runtime :
7.56 milliseconds→6.60 milliseconds(best of7runs)📝 Explanation and details
The optimized code achieves a 14% speedup by introducing Numba JIT compilation for the single-dimension case in
qnwnorm, which is the most frequently executed path based on profiling data.Key Optimization:
The original code spends 99.3% of execution time calling
_qnwnorm1(already JIT-compiled) from pure Python code. The optimization introduces two new JIT-compiled helper functions:_cholesky_decomp: JIT-compiled wrapper for Cholesky decomposition_process_single_node: JIT-compiled function that handles the single-dimension case, combining_qnwnorm1call, Cholesky decomposition, and node transformation into one JIT-compiled blockWhy This Works:
When
qnwnormhas a single dimension (which the profiling shows is common), the original code makes multiple Python→Numba transitions:_qnwnorm1(Numba)The optimized version wraps this entire sequence in
_process_single_node, keeping execution in JIT-compiled code and eliminating the Python interpreter overhead for these transitions. This is particularly effective because:Test Results Analysis:
The speedup is most pronounced in univariate (single-dimension) test cases:
test_univariate_default_params: 80.7% fastertest_univariate_custom_mean_var: 95.0% fastertest_negative_mu: 96.9% fastertest_highly_skewed_lognormal: 99.7% fasterFor multivariate cases, the optimization shows minimal impact or slight regression:
test_multivariate_default_params: 1.56% fasterThis is expected because multivariate cases don't use
_process_single_node- they still follow the original code path. The slight regressions likely come from JIT compilation overhead or additional function call indirection.Workload Impact:
If the function is called in hot paths (loops, Monte Carlo simulations, repeated integration tasks), the 14% average speedup translates to meaningful wall-clock time savings, especially for workloads dominated by univariate quadrature computations. The optimization is most beneficial for:
_qnwnorm1is the bottleneck (as profiling confirms)✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-qnwlogn-mjvsrgz6and push.