Newton Krylov improvements #2581

pcarruscag · 2025-09-29T04:08:13Z

Proposed Changes

Described in the config template and the Oneram M6 with NK example.
It was also possible to improve performance by not computing the gradients of density and enthalpy for the Roe scheme.

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
I used the pre-commit hook to prevent dirty commits and used pre-commit run --all to format old commits.
I have added a test case that demonstrates my contribution, if necessary.
I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.

Develop

bigfooted · 2025-09-29T21:50:40Z

Cool! So how large are the improvements?

pcarruscag · 2025-09-29T21:53:55Z

The 30p30n vanv test converges in 1200 iterations from scratch, it used to take several thousand because we needed to limit CFL

rois1995 · 2025-10-02T08:32:10Z

Are there any recommended settings or guidelines when using NK (especially with these changes)? For example, from my experience with the HLPW5, I've seen that using a fixed CFL is worse than using an adaptive one, as it leads to a much larger time to solution, even though the number of iterations is lower.

pcarruscag · 2025-10-02T14:56:21Z

I need to re run those cases. When there are no major convergence issues it's better to have high limit in the adaptive CFL and this relaxation in "auto mode".

pcarruscag · 2025-10-02T14:57:09Z

Without this auto mode I've had to limit the CFL to keep the linear solver happy which slowed things down.

rois1995 · 2025-10-03T09:02:23Z

I have tried comparing the NK implementation on version 8.1, 8.3, and 8.3 on this branch. The comparison is made on the Level B mesh of the Pointwise family (1.r.09). Version 8.1 is included since the implicit discretization of the diffusion terms was not implemented yet. On the left figure, the functional is plotted against the number of iterations, whereas on the right figure, it is plotted against the computational time (cumulative sum of the time-per-iteration).

Density residual:

$\tilde{\nu}$ residual:

Average CFL:

Pitching moment coefficient:

The NK options for the SU2 versions 8.1.0 and 8.3.0 develop are the following:
NEWTON_KRYLOV_IPARAM= (10, 5, 2)
NEWTON_KRYLOV_DPARAM= (1.0, 0.01, -6, 1e-5)

The NK options for the SU2 version 8.3.0 on this branch are the following:
NEWTON_KRYLOV_IPARAM= (200, 0, 0)
NEWTON_KRYLOV_DPARAM= (0, 0.0, 0, 1e-5, -1)

Other options common to all three simulations:
NUM_METHOD_GRAD= GREEN_GAUSS
LINEAR_SOLVER= RESTARTED_FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 1e-2
LINEAR_SOLVER_ITER= 60
CFL_ADAPT= YES
CFL_NUMBER= 10.0
CFL_REDUCTION_TURB= 1.0
CFL_ADAPT_PARAM= ( 0.1, 1.1, 10.0, 100.0, 0.01)

The simulation starts for 500 iterations at constant CFL = 1 without NK, then I activate NK with the settings above.

The computational time should not be that different, since the number of cores and the machine on which they are running is the same (at least the first 500 iterations should be the same). Maybe it is related to the fact that the simulation in yellow has access to more cache.

Nevertheless, the simulation with the changes here proposed seems to reach low residual levels much faster, although it then plateaus on the same levels as the standard implementation, which is kind of strange.

The new implementation is able to reach the maximum CFL, although only for a few iterations. Then, they all go down to the lowest value possible (10).

The implicit discretization of the diffusion term (red vs blue curve) does not have much influence.

Should I try the same NK settings used for this branch on the standard implementation?

pcarruscag · 2025-10-03T14:58:19Z

I think that case is not ideal to compare NK settings because the CFL eventually settles at the minimum value, which is very small (10). @Bot-Enigma-0 saw the same behavior. We have to figure out where and why the CFL becomes so limited.

pcarruscag · 2025-10-03T15:03:25Z

In general to make NK as cost effective as possible we need loosely converged linear systems.
This seems a lot, but maybe it's necessary
LINEAR_SOLVER_ERROR= 1e-2
LINEAR_SOLVER_ITER= 60

You looked into this case a lot more than me in recent times, did you try other linear solver settings?
Sometimes it's also helpful to use a higher linear tolerance in the CFL_ADAPT so that the CFL doesn't become limited by the linear solver, e.g. CFL_ADAPT_PARAM= ( 0.1, 1.1, 10.0, 100.0, 0.1)

pcarruscag · 2025-10-04T19:37:19Z

I tried it on the first HLCRM-WBHV mesh, with these settings:

NUM_METHOD_GRAD= GREEN_GAUSS
CFL_NUMBER= 10.0
CFL_ADAPT= YES
CFL_ADAPT_PARAM= ( 1.0, 1.05, 10.0, 50.0 )
%

% -------------------- FLOW NUMERICAL METHOD DEFINITION -----------------------%
%
CONV_NUM_METHOD_FLOW= ROE
ENTROPY_FIX_COEFF= 0.001
TIME_DISCRE_FLOW= EULER_IMPLICIT
%
% ----------- SLOPE LIMITER AND DISSIPATION SENSOR DEFINITION -----------------%
%
MUSCL_FLOW= YES
SLOPE_LIMITER_FLOW= VAN_ALBADA_EDGE
MUSCL_TURB= NO
SLOPE_LIMITER_TURB= VENKATAKRISHNAN_WANG
VENKAT_LIMITER_COEFF= 0.1
%
% -------------------- TURBULENT NUMERICAL METHOD DEFINITION ------------------%
%
% Convective numerical method (SCALAR_UPWIND, BOUNDED_SCALAR)
CONV_NUM_METHOD_TURB= SCALAR_UPWIND
TIME_DISCRE_TURB= EULER_IMPLICIT
CFL_REDUCTION_TURB= 1
%
% ------------------------ LINEAR SOLVER DEFINITION ---------------------------%
%
LINEAR_SOLVER= FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 0.2
LINEAR_SOLVER_ITER= 12

NEWTON_KRYLOV= YES
NEWTON_KRYLOV_IPARAM= (400, 0, 1)
NEWTON_KRYLOV_DPARAM= (0.0, 0.0, 0.0, 1e-4, -1)

The convergence looks like this

pcarruscag · 2025-10-04T19:40:53Z

It would be nice if #2570 could point out the source of the relatively low convergence rate, at least the coefficients are converged by 2k iterations (I know it gets a lot slower as the meshes get larger).

rois1995 · 2025-10-05T17:08:06Z

I tried it on the first HLCRM-WBHV mesh, with these settings:

NUM_METHOD_GRAD= GREEN_GAUSS
CFL_NUMBER= 10.0
CFL_ADAPT= YES
CFL_ADAPT_PARAM= ( 1.0, 1.05, 10.0, 50.0 )
%

% -------------------- FLOW NUMERICAL METHOD DEFINITION -----------------------%
%
CONV_NUM_METHOD_FLOW= ROE
ENTROPY_FIX_COEFF= 0.001
TIME_DISCRE_FLOW= EULER_IMPLICIT
%
% ----------- SLOPE LIMITER AND DISSIPATION SENSOR DEFINITION -----------------%
%
MUSCL_FLOW= YES
SLOPE_LIMITER_FLOW= VAN_ALBADA_EDGE
MUSCL_TURB= NO
SLOPE_LIMITER_TURB= VENKATAKRISHNAN_WANG
VENKAT_LIMITER_COEFF= 0.1
%
% -------------------- TURBULENT NUMERICAL METHOD DEFINITION ------------------%
%
% Convective numerical method (SCALAR_UPWIND, BOUNDED_SCALAR)
CONV_NUM_METHOD_TURB= SCALAR_UPWIND
TIME_DISCRE_TURB= EULER_IMPLICIT
CFL_REDUCTION_TURB= 1
%
% ------------------------ LINEAR SOLVER DEFINITION ---------------------------%
%
LINEAR_SOLVER= FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 0.2
LINEAR_SOLVER_ITER= 12

NEWTON_KRYLOV= YES
NEWTON_KRYLOV_IPARAM= (400, 0, 1)
NEWTON_KRYLOV_DPARAM= (0.0, 0.0, 0.0, 1e-4, -1)

The convergence looks like this

Isn't 1e-3 the standard value for the CFL increase tolerance? Plus, what was your rationale behind these settings?

TestCases/rans/oneram6/turb_ONERAM6_nk.cfg

pcarruscag · 2025-10-05T20:55:08Z

1e-3 is the default, but that setting only matters if it's larger than the linear solver tolerance option.
The motivation there was that sometimes you want to keep the linear solver going to a fixed number of iterations (low tolerance), but base the CFL adaptation on a different target.
When you are aiming for a high tolerance (to keep cost low) the convergence can be dominated by a few bad eigenvalues of the Jacobian, and so the linear solver hits the target tolerance quickly, but it doesn't do any smoothing on the rest of the spectrum.
Some solvers are more sensitive than others, BiCGSTAB tends to produce terrible solutions at high tolerance, compared to GMRES.

The rationale is getting away with the cheapest linear solve you can while getting a reasonable convergence rate, between approximating the Jacobian (or Jacobian-vector product in the NK case) by making it less second order and/or limiting CFL, and only converging the linear system a little.

I know this is hand-wavy, but it's a very non-linear problem... I do know the "optimum" solution time is not going to be below CFL of 25 or above ~15 linear iterations. If we need that kind of settings for some cases for stability we need to understand what's responsible for it.

pcarruscag · 2025-10-05T22:43:31Z

For completeness, something I never got to try were subspace recycling techniques, each linear system starts from scratch, if we can carry some information from the previous iterations, we can potentially get to a higher CFL at the same cost.
https://personal.math.vt.edu/sturler/publications/SISC_KrylovRecycling_2006.pdf

The other known challenge of Krylov methods is scaling, with more unknowns, you also need larger subspaces because there are more "slow modes". In theory, the solution for that is a preconditioner that can take care of those modes (multigrid).

rois1995 · 2025-10-06T13:10:02Z

I tried it on the first HLCRM-WBHV mesh, with these settings:

NUM_METHOD_GRAD= GREEN_GAUSS
CFL_NUMBER= 10.0
CFL_ADAPT= YES
CFL_ADAPT_PARAM= ( 1.0, 1.05, 10.0, 50.0 )
%

% -------------------- FLOW NUMERICAL METHOD DEFINITION -----------------------%
%
CONV_NUM_METHOD_FLOW= ROE
ENTROPY_FIX_COEFF= 0.001
TIME_DISCRE_FLOW= EULER_IMPLICIT
%
% ----------- SLOPE LIMITER AND DISSIPATION SENSOR DEFINITION -----------------%
%
MUSCL_FLOW= YES
SLOPE_LIMITER_FLOW= VAN_ALBADA_EDGE
MUSCL_TURB= NO
SLOPE_LIMITER_TURB= VENKATAKRISHNAN_WANG
VENKAT_LIMITER_COEFF= 0.1
%
% -------------------- TURBULENT NUMERICAL METHOD DEFINITION ------------------%
%
% Convective numerical method (SCALAR_UPWIND, BOUNDED_SCALAR)
CONV_NUM_METHOD_TURB= SCALAR_UPWIND
TIME_DISCRE_TURB= EULER_IMPLICIT
CFL_REDUCTION_TURB= 1
%
% ------------------------ LINEAR SOLVER DEFINITION ---------------------------%
%
LINEAR_SOLVER= FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 0.2
LINEAR_SOLVER_ITER= 12

NEWTON_KRYLOV= YES
NEWTON_KRYLOV_IPARAM= (400, 0, 1)
NEWTON_KRYLOV_DPARAM= (0.0, 0.0, 0.0, 1e-4, -1)

The convergence looks like this

With these settings I am going towards a completely different solution than the other ones, although it is still far from being converged.

RMS density

Pitching coefficient

Linear solver residual

Linear solver turbulence residual

The only other difference is that you have a slope limiter active, but it should not change the solution so much.

rois1995 · 2025-10-06T14:01:15Z

Do you think that the Eisenstat-Walker strategy for choosing the tolerance of the linear solver could be useful?

pcarruscag · 2025-10-06T14:11:24Z

I think those strategies require additional matrix-vector products which is what we are trying to minimize.
Also at CFL in O(10-100) we are far from a Newton method so the criteria probably doesn't apply.

rois1995 · 2025-10-07T12:18:26Z

I can confirm that decreasing so much the linear solver tolerance has not a good impact on the simulation of the HLPW5 case.

Residual Density:

Pitching moment coefficient

Lift coefficient

Maybe at the start of the simulation, the tolerance was ok, but then as the non-linear residual dropped, it had become too large. That is why I was proposing something like the Eisenstat-Walker to relate the linear solver tolerance to the non-linear one, although I understand it might be impractical.

pcarruscag · 2025-10-07T14:44:59Z

Yeah when the convergence is all spiky like that it's usually not a good sign, in any case I think I gave good-enough disclaimers 😅

When there are no major convergence issues it's better to have high limit in the adaptive CFL and this relaxation in "auto mode".
The rationale is getting away with the cheapest linear solve you can while getting a reasonable convergence rate

If tighter tolerances are necessary to achieve convergence then we can try to understand why

When you are aiming for a high tolerance (to keep cost low) the convergence can be dominated by a few bad eigenvalues of the Jacobian, and so the linear solver hits the target tolerance quickly, but it doesn't do any smoothing on the rest of the spectrum.

Some of the NK parameters allow ramping the linear solver tolerance (third IPARAM and third DPARAM).
In any case, I'm not changing the default behavior here, but some cases can converge faster and we have one more degree of freedom to tune besides CFL and linear solver.
So can I get an approval? 😄

bigfooted

Thank you for your service!

Bot-Enigma-0 · 2025-10-08T13:49:27Z

The convergence looks good. Just a few notes on the CFL adaption based on my experimentation, especially with the HLPW cases:

A slow ramping is very helpful in the early iterations, as it helps stabilise the convergence. For the finer meshes its best to use a slow ramping. I find the downfactor and upfactor in the ranges of 0.9-0.99 and 1.02-1.2 respectively work well. For small or coarse meshes this threshold can be relaxed.
Using a slow ramping, setting the CFL_MAX value to a high value is fine, as I have found it will never reach this value. Instead it should settle to a stable CFL, based on the convergence. As for the CFL_MIN, setting it to a low value (0.1-1) should be fine, since it may reset a few times but should pick up rather quickly.
I would recommend not to keep the down factor = 1.0, since:

Convergence of the linear solver is the first flag to enable the adaptive algorithm. So, as long as the CFL satisfies the convergence of the linear solver it will attempt to increase the CFL and only reset when the convergence is not met (as Pedro mentioned we saw).
This means the solver will discard what is happening to the non-linear convergence and simply reset when they appear to have stalled (i.e., no change in the cumulative residual sum over the last 100 iterations), instead of finding a slightly lower stable CFL.

As Pedro mentioned keeping the linear solver tolerance parameter higher than the requested linear solver error is ideal, so the algorithm doesn't over rely on the linear solver's convergence. Keeping the LINEAR_SOLVER_ERROR between e-1, e-2 with 15-20 iterations should be sufficient.

@rois1995, for the last 2 results you posted, how does the CFL compare per iteration? And how do the new force values compare to the workshop results?

bigfooted · 2025-10-09T06:13:54Z

A slow ramping is very helpful in the early iterations, as it helps stabilise the convergence. For the finer meshes its best to use a slow ramping. I find the downfactor and upfactor in the ranges of 0.9-0.99 and 1.02-1.2 respectively work well. For small or coarse meshes this threshold can be relaxed.
Using a slow ramping, setting the CFL_MAX value to a high value is fine, as I have found it will never reach this value. Instead it should settle to a stable CFL, based on the convergence. As for the CFL_MIN, setting it to a low value (0.1-1) should be fine, since it may reset a few times but should pick up rather quickly.

I have the same experience here, that this is a very safe setup. It might be possible that there are setup that benefit from more aggressive settings, but this always seems to work fine. I also use low linear solver convergence criteria for the CFL_UP.

pcarruscag and others added 4 commits August 31, 2025 19:06

Merge pull request #2560 from su2code/develop

708016b

Develop

newton krylov relaxation to help the linear solver converge

5032059

add badge

e1befed

update docs

7059a78

pcarruscag added the changelog:feature label Sep 29, 2025

pcarruscag added 3 commits September 29, 2025 09:07

Merge remote-tracking branch 'upstream/master' into pedro/nk_relaxation

620c27c

Merge remote-tracking branch 'upstream/develop' into pedro/nk_relaxation

a602d27

fix merge

cee6d23

fix

f98e478

pcarruscag added 3 commits October 4, 2025 12:47

reduce number of gradients and limiters for Roe

07494fc

fix var counts

0c6cd0e

update a few tests

f2c50cc

bigfooted reviewed Oct 5, 2025

View reviewed changes

TestCases/rans/oneram6/turb_ONERAM6_nk.cfg Show resolved Hide resolved

same logic for target tol

a5f5513

pcarruscag mentioned this pull request Oct 6, 2025

[WIP] Automating First Order Initialisation When Using MUSCL #2544

Draft

6 tasks

bigfooted approved these changes Oct 7, 2025

View reviewed changes

pcarruscag merged commit d2481dd into develop Oct 10, 2025
36 of 37 checks passed

pcarruscag deleted the pedro/nk_relaxation branch October 10, 2025 04:24

Newton Krylov improvements #2581

Newton Krylov improvements #2581

Uh oh!

Conversation

pcarruscag commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

PR Checklist

Uh oh!

bigfooted commented Sep 29, 2025

Uh oh!

pcarruscag commented Sep 29, 2025

Uh oh!

rois1995 commented Oct 2, 2025

Uh oh!

pcarruscag commented Oct 2, 2025

Uh oh!

pcarruscag commented Oct 2, 2025

Uh oh!

rois1995 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcarruscag commented Oct 3, 2025

Uh oh!

pcarruscag commented Oct 3, 2025

Uh oh!

pcarruscag commented Oct 4, 2025

Uh oh!

pcarruscag commented Oct 4, 2025

Uh oh!

rois1995 commented Oct 5, 2025

Uh oh!

Uh oh!

pcarruscag commented Oct 5, 2025

Uh oh!

pcarruscag commented Oct 5, 2025

Uh oh!

rois1995 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rois1995 commented Oct 6, 2025

Uh oh!

pcarruscag commented Oct 6, 2025

Uh oh!

rois1995 commented Oct 7, 2025

Uh oh!

pcarruscag commented Oct 7, 2025

Uh oh!

bigfooted left a comment

Choose a reason for hiding this comment

Uh oh!

Bot-Enigma-0 commented Oct 8, 2025

Uh oh!

bigfooted commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pcarruscag commented Sep 29, 2025 •

edited

Loading

rois1995 commented Oct 3, 2025 •

edited

Loading

rois1995 commented Oct 6, 2025 •

edited

Loading