Skip to content

Conversation

@pcarruscag
Copy link
Member

@pcarruscag pcarruscag commented Sep 29, 2025

Proposed Changes

Described in the config template and the Oneram M6 with NK example.
It was also possible to improve performance by not computing the gradients of density and enthalpy for the Roe scheme.

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
  • My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
  • I used the pre-commit hook to prevent dirty commits and used pre-commit run --all to format old commits.
  • I have added a test case that demonstrates my contribution, if necessary.
  • I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.

@bigfooted
Copy link
Contributor

Cool! So how large are the improvements?

@pcarruscag
Copy link
Member Author

The 30p30n vanv test converges in 1200 iterations from scratch, it used to take several thousand because we needed to limit CFL

@rois1995
Copy link
Contributor

rois1995 commented Oct 2, 2025

Are there any recommended settings or guidelines when using NK (especially with these changes)? For example, from my experience with the HLPW5, I've seen that using a fixed CFL is worse than using an adaptive one, as it leads to a much larger time to solution, even though the number of iterations is lower.

@pcarruscag
Copy link
Member Author

I need to re run those cases. When there are no major convergence issues it's better to have high limit in the adaptive CFL and this relaxation in "auto mode".

@pcarruscag
Copy link
Member Author

Without this auto mode I've had to limit the CFL to keep the linear solver happy which slowed things down.

@rois1995
Copy link
Contributor

rois1995 commented Oct 3, 2025

I have tried comparing the NK implementation on version 8.1, 8.3, and 8.3 on this branch. The comparison is made on the Level B mesh of the Pointwise family (1.r.09). Version 8.1 is included since the implicit discretization of the diffusion terms was not implemented yet. On the left figure, the functional is plotted against the number of iterations, whereas on the right figure, it is plotted against the computational time (cumulative sum of the time-per-iteration).

Density residual:
rms_Rho_

$\tilde{\nu}$ residual:
rms_nu_

Average CFL:
AvgCFL

Pitching moment coefficient:
CMy

The NK options for the SU2 versions 8.1.0 and 8.3.0 develop are the following:
NEWTON_KRYLOV_IPARAM= (10, 5, 2)
NEWTON_KRYLOV_DPARAM= (1.0, 0.01, -6, 1e-5)

The NK options for the SU2 version 8.3.0 on this branch are the following:
NEWTON_KRYLOV_IPARAM= (200, 0, 0)
NEWTON_KRYLOV_DPARAM= (0, 0.0, 0, 1e-5, -1)

Other options common to all three simulations:
NUM_METHOD_GRAD= GREEN_GAUSS
LINEAR_SOLVER= RESTARTED_FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 1e-2
LINEAR_SOLVER_ITER= 60
CFL_ADAPT= YES
CFL_NUMBER= 10.0
CFL_REDUCTION_TURB= 1.0
CFL_ADAPT_PARAM= ( 0.1, 1.1, 10.0, 100.0, 0.01)

The simulation starts for 500 iterations at constant CFL = 1 without NK, then I activate NK with the settings above.

The computational time should not be that different, since the number of cores and the machine on which they are running is the same (at least the first 500 iterations should be the same). Maybe it is related to the fact that the simulation in yellow has access to more cache.

Nevertheless, the simulation with the changes here proposed seems to reach low residual levels much faster, although it then plateaus on the same levels as the standard implementation, which is kind of strange.

The new implementation is able to reach the maximum CFL, although only for a few iterations. Then, they all go down to the lowest value possible (10).

The implicit discretization of the diffusion term (red vs blue curve) does not have much influence.

Should I try the same NK settings used for this branch on the standard implementation?

@pcarruscag
Copy link
Member Author

I think that case is not ideal to compare NK settings because the CFL eventually settles at the minimum value, which is very small (10). @Bot-Enigma-0 saw the same behavior. We have to figure out where and why the CFL becomes so limited.

@pcarruscag
Copy link
Member Author

In general to make NK as cost effective as possible we need loosely converged linear systems.
This seems a lot, but maybe it's necessary
LINEAR_SOLVER_ERROR= 1e-2
LINEAR_SOLVER_ITER= 60

You looked into this case a lot more than me in recent times, did you try other linear solver settings?
Sometimes it's also helpful to use a higher linear tolerance in the CFL_ADAPT so that the CFL doesn't become limited by the linear solver, e.g. CFL_ADAPT_PARAM= ( 0.1, 1.1, 10.0, 100.0, 0.1)

@pcarruscag
Copy link
Member Author

I tried it on the first HLCRM-WBHV mesh, with these settings:

NUM_METHOD_GRAD= GREEN_GAUSS
CFL_NUMBER= 10.0
CFL_ADAPT= YES
CFL_ADAPT_PARAM= ( 1.0, 1.05, 10.0, 50.0 )
%

% -------------------- FLOW NUMERICAL METHOD DEFINITION -----------------------%
%
CONV_NUM_METHOD_FLOW= ROE
ENTROPY_FIX_COEFF= 0.001
TIME_DISCRE_FLOW= EULER_IMPLICIT
%
% ----------- SLOPE LIMITER AND DISSIPATION SENSOR DEFINITION -----------------%
%
MUSCL_FLOW= YES
SLOPE_LIMITER_FLOW= VAN_ALBADA_EDGE
MUSCL_TURB= NO
SLOPE_LIMITER_TURB= VENKATAKRISHNAN_WANG
VENKAT_LIMITER_COEFF= 0.1
%
% -------------------- TURBULENT NUMERICAL METHOD DEFINITION ------------------%
%
% Convective numerical method (SCALAR_UPWIND, BOUNDED_SCALAR)
CONV_NUM_METHOD_TURB= SCALAR_UPWIND
TIME_DISCRE_TURB= EULER_IMPLICIT
CFL_REDUCTION_TURB= 1
%
% ------------------------ LINEAR SOLVER DEFINITION ---------------------------%
%
LINEAR_SOLVER= FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 0.2
LINEAR_SOLVER_ITER= 12

NEWTON_KRYLOV= YES
NEWTON_KRYLOV_IPARAM= (400, 0, 1)
NEWTON_KRYLOV_DPARAM= (0.0, 0.0, 0.0, 1e-4, -1)

The convergence looks like this
image

@pcarruscag
Copy link
Member Author

It would be nice if #2570 could point out the source of the relatively low convergence rate, at least the coefficients are converged by 2k iterations (I know it gets a lot slower as the meshes get larger).

@rois1995
Copy link
Contributor

rois1995 commented Oct 5, 2025

I tried it on the first HLCRM-WBHV mesh, with these settings:

NUM_METHOD_GRAD= GREEN_GAUSS
CFL_NUMBER= 10.0
CFL_ADAPT= YES
CFL_ADAPT_PARAM= ( 1.0, 1.05, 10.0, 50.0 )
%

% -------------------- FLOW NUMERICAL METHOD DEFINITION -----------------------%
%
CONV_NUM_METHOD_FLOW= ROE
ENTROPY_FIX_COEFF= 0.001
TIME_DISCRE_FLOW= EULER_IMPLICIT
%
% ----------- SLOPE LIMITER AND DISSIPATION SENSOR DEFINITION -----------------%
%
MUSCL_FLOW= YES
SLOPE_LIMITER_FLOW= VAN_ALBADA_EDGE
MUSCL_TURB= NO
SLOPE_LIMITER_TURB= VENKATAKRISHNAN_WANG
VENKAT_LIMITER_COEFF= 0.1
%
% -------------------- TURBULENT NUMERICAL METHOD DEFINITION ------------------%
%
% Convective numerical method (SCALAR_UPWIND, BOUNDED_SCALAR)
CONV_NUM_METHOD_TURB= SCALAR_UPWIND
TIME_DISCRE_TURB= EULER_IMPLICIT
CFL_REDUCTION_TURB= 1
%
% ------------------------ LINEAR SOLVER DEFINITION ---------------------------%
%
LINEAR_SOLVER= FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 0.2
LINEAR_SOLVER_ITER= 12

NEWTON_KRYLOV= YES
NEWTON_KRYLOV_IPARAM= (400, 0, 1)
NEWTON_KRYLOV_DPARAM= (0.0, 0.0, 0.0, 1e-4, -1)

The convergence looks like this image

Isn't 1e-3 the standard value for the CFL increase tolerance? Plus, what was your rationale behind these settings?

@pcarruscag
Copy link
Member Author

1e-3 is the default, but that setting only matters if it's larger than the linear solver tolerance option.
The motivation there was that sometimes you want to keep the linear solver going to a fixed number of iterations (low tolerance), but base the CFL adaptation on a different target.
When you are aiming for a high tolerance (to keep cost low) the convergence can be dominated by a few bad eigenvalues of the Jacobian, and so the linear solver hits the target tolerance quickly, but it doesn't do any smoothing on the rest of the spectrum.
Some solvers are more sensitive than others, BiCGSTAB tends to produce terrible solutions at high tolerance, compared to GMRES.

The rationale is getting away with the cheapest linear solve you can while getting a reasonable convergence rate, between approximating the Jacobian (or Jacobian-vector product in the NK case) by making it less second order and/or limiting CFL, and only converging the linear system a little.

I know this is hand-wavy, but it's a very non-linear problem... I do know the "optimum" solution time is not going to be below CFL of 25 or above ~15 linear iterations. If we need that kind of settings for some cases for stability we need to understand what's responsible for it.

@pcarruscag
Copy link
Member Author

For completeness, something I never got to try were subspace recycling techniques, each linear system starts from scratch, if we can carry some information from the previous iterations, we can potentially get to a higher CFL at the same cost.
https://personal.math.vt.edu/sturler/publications/SISC_KrylovRecycling_2006.pdf

The other known challenge of Krylov methods is scaling, with more unknowns, you also need larger subspaces because there are more "slow modes". In theory, the solution for that is a preconditioner that can take care of those modes (multigrid).

@rois1995
Copy link
Contributor

rois1995 commented Oct 6, 2025

I tried it on the first HLCRM-WBHV mesh, with these settings:

NUM_METHOD_GRAD= GREEN_GAUSS
CFL_NUMBER= 10.0
CFL_ADAPT= YES
CFL_ADAPT_PARAM= ( 1.0, 1.05, 10.0, 50.0 )
%

% -------------------- FLOW NUMERICAL METHOD DEFINITION -----------------------%
%
CONV_NUM_METHOD_FLOW= ROE
ENTROPY_FIX_COEFF= 0.001
TIME_DISCRE_FLOW= EULER_IMPLICIT
%
% ----------- SLOPE LIMITER AND DISSIPATION SENSOR DEFINITION -----------------%
%
MUSCL_FLOW= YES
SLOPE_LIMITER_FLOW= VAN_ALBADA_EDGE
MUSCL_TURB= NO
SLOPE_LIMITER_TURB= VENKATAKRISHNAN_WANG
VENKAT_LIMITER_COEFF= 0.1
%
% -------------------- TURBULENT NUMERICAL METHOD DEFINITION ------------------%
%
% Convective numerical method (SCALAR_UPWIND, BOUNDED_SCALAR)
CONV_NUM_METHOD_TURB= SCALAR_UPWIND
TIME_DISCRE_TURB= EULER_IMPLICIT
CFL_REDUCTION_TURB= 1
%
% ------------------------ LINEAR SOLVER DEFINITION ---------------------------%
%
LINEAR_SOLVER= FGMRES
LINEAR_SOLVER_PREC= ILU
LINEAR_SOLVER_ERROR= 0.2
LINEAR_SOLVER_ITER= 12

NEWTON_KRYLOV= YES
NEWTON_KRYLOV_IPARAM= (400, 0, 1)
NEWTON_KRYLOV_DPARAM= (0.0, 0.0, 0.0, 1e-4, -1)

The convergence looks like this image

With these settings I am going towards a completely different solution than the other ones, although it is still far from being converged.

RMS density
NKComp_Settings_rms_Rho_

Pitching coefficient
NKComp_Settings_CMy

Linear solver residual
NKComp_Settings_LinSolRes

Linear solver turbulence residual
NKComp_Settings_LinSolResTurb

The only other difference is that you have a slope limiter active, but it should not change the solution so much.

@rois1995
Copy link
Contributor

rois1995 commented Oct 6, 2025

Do you think that the Eisenstat-Walker strategy for choosing the tolerance of the linear solver could be useful?

@pcarruscag
Copy link
Member Author

I think those strategies require additional matrix-vector products which is what we are trying to minimize.
Also at CFL in O(10-100) we are far from a Newton method so the criteria probably doesn't apply.

@rois1995
Copy link
Contributor

rois1995 commented Oct 7, 2025

I can confirm that decreasing so much the linear solver tolerance has not a good impact on the simulation of the HLPW5 case.

Residual Density:
NKComp_Settings_rms_Rho_

Pitching moment coefficient
NKComp_Settings_CMy

Lift coefficient
NKComp_Settings_CL

Maybe at the start of the simulation, the tolerance was ok, but then as the non-linear residual dropped, it had become too large. That is why I was proposing something like the Eisenstat-Walker to relate the linear solver tolerance to the non-linear one, although I understand it might be impractical.

@pcarruscag
Copy link
Member Author

Yeah when the convergence is all spiky like that it's usually not a good sign, in any case I think I gave good-enough disclaimers 😅

When there are no major convergence issues it's better to have high limit in the adaptive CFL and this relaxation in "auto mode".
The rationale is getting away with the cheapest linear solve you can while getting a reasonable convergence rate

If tighter tolerances are necessary to achieve convergence then we can try to understand why

When you are aiming for a high tolerance (to keep cost low) the convergence can be dominated by a few bad eigenvalues of the Jacobian, and so the linear solver hits the target tolerance quickly, but it doesn't do any smoothing on the rest of the spectrum.

Some of the NK parameters allow ramping the linear solver tolerance (third IPARAM and third DPARAM).
In any case, I'm not changing the default behavior here, but some cases can converge faster and we have one more degree of freedom to tune besides CFL and linear solver.
So can I get an approval? 😄

Copy link
Contributor

@bigfooted bigfooted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your service!

@Bot-Enigma-0
Copy link
Contributor

The convergence looks good. Just a few notes on the CFL adaption based on my experimentation, especially with the HLPW cases:

  1. A slow ramping is very helpful in the early iterations, as it helps stabilise the convergence. For the finer meshes its best to use a slow ramping. I find the downfactor and upfactor in the ranges of 0.9-0.99 and 1.02-1.2 respectively work well. For small or coarse meshes this threshold can be relaxed.
  2. Using a slow ramping, setting the CFL_MAX value to a high value is fine, as I have found it will never reach this value. Instead it should settle to a stable CFL, based on the convergence. As for the CFL_MIN, setting it to a low value (0.1-1) should be fine, since it may reset a few times but should pick up rather quickly.
  3. I would recommend not to keep the down factor = 1.0, since:
  • Convergence of the linear solver is the first flag to enable the adaptive algorithm. So, as long as the CFL satisfies the convergence of the linear solver it will attempt to increase the CFL and only reset when the convergence is not met (as Pedro mentioned we saw).
  • This means the solver will discard what is happening to the non-linear convergence and simply reset when they appear to have stalled (i.e., no change in the cumulative residual sum over the last 100 iterations), instead of finding a slightly lower stable CFL.
  1. As Pedro mentioned keeping the linear solver tolerance parameter higher than the requested linear solver error is ideal, so the algorithm doesn't over rely on the linear solver's convergence. Keeping the LINEAR_SOLVER_ERROR between e-1, e-2 with 15-20 iterations should be sufficient.

@rois1995, for the last 2 results you posted, how does the CFL compare per iteration? And how do the new force values compare to the workshop results?

@bigfooted
Copy link
Contributor

A slow ramping is very helpful in the early iterations, as it helps stabilise the convergence. For the finer meshes its best to use a slow ramping. I find the downfactor and upfactor in the ranges of 0.9-0.99 and 1.02-1.2 respectively work well. For small or coarse meshes this threshold can be relaxed.
Using a slow ramping, setting the CFL_MAX value to a high value is fine, as I have found it will never reach this value. Instead it should settle to a stable CFL, based on the convergence. As for the CFL_MIN, setting it to a low value (0.1-1) should be fine, since it may reset a few times but should pick up rather quickly.

I have the same experience here, that this is a very safe setup. It might be possible that there are setup that benefit from more aggressive settings, but this always seems to work fine. I also use low linear solver convergence criteria for the CFL_UP.

@pcarruscag pcarruscag merged commit d2481dd into develop Oct 10, 2025
36 of 37 checks passed
@pcarruscag pcarruscag deleted the pedro/nk_relaxation branch October 10, 2025 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants