-
Notifications
You must be signed in to change notification settings - Fork 910
Newton Krylov improvements #2581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Cool! So how large are the improvements? |
|
The 30p30n vanv test converges in 1200 iterations from scratch, it used to take several thousand because we needed to limit CFL |
|
Are there any recommended settings or guidelines when using NK (especially with these changes)? For example, from my experience with the HLPW5, I've seen that using a fixed CFL is worse than using an adaptive one, as it leads to a much larger time to solution, even though the number of iterations is lower. |
|
I need to re run those cases. When there are no major convergence issues it's better to have high limit in the adaptive CFL and this relaxation in "auto mode". |
|
Without this auto mode I've had to limit the CFL to keep the linear solver happy which slowed things down. |
|
I have tried comparing the NK implementation on version 8.1, 8.3, and 8.3 on this branch. The comparison is made on the Level B mesh of the Pointwise family (1.r.09). Version 8.1 is included since the implicit discretization of the diffusion terms was not implemented yet. On the left figure, the functional is plotted against the number of iterations, whereas on the right figure, it is plotted against the computational time (cumulative sum of the time-per-iteration). The NK options for the SU2 versions 8.1.0 and 8.3.0 develop are the following: The NK options for the SU2 version 8.3.0 on this branch are the following: Other options common to all three simulations: The simulation starts for 500 iterations at constant CFL = 1 without NK, then I activate NK with the settings above. The computational time should not be that different, since the number of cores and the machine on which they are running is the same (at least the first 500 iterations should be the same). Maybe it is related to the fact that the simulation in yellow has access to more cache. Nevertheless, the simulation with the changes here proposed seems to reach low residual levels much faster, although it then plateaus on the same levels as the standard implementation, which is kind of strange. The new implementation is able to reach the maximum CFL, although only for a few iterations. Then, they all go down to the lowest value possible (10). The implicit discretization of the diffusion term (red vs blue curve) does not have much influence. Should I try the same NK settings used for this branch on the standard implementation? |
|
I think that case is not ideal to compare NK settings because the CFL eventually settles at the minimum value, which is very small (10). @Bot-Enigma-0 saw the same behavior. We have to figure out where and why the CFL becomes so limited. |
|
In general to make NK as cost effective as possible we need loosely converged linear systems. You looked into this case a lot more than me in recent times, did you try other linear solver settings? |
|
I tried it on the first HLCRM-WBHV mesh, with these settings: |
|
It would be nice if #2570 could point out the source of the relatively low convergence rate, at least the coefficients are converged by 2k iterations (I know it gets a lot slower as the meshes get larger). |
Isn't 1e-3 the standard value for the CFL increase tolerance? Plus, what was your rationale behind these settings? |
|
1e-3 is the default, but that setting only matters if it's larger than the linear solver tolerance option. The rationale is getting away with the cheapest linear solve you can while getting a reasonable convergence rate, between approximating the Jacobian (or Jacobian-vector product in the NK case) by making it less second order and/or limiting CFL, and only converging the linear system a little. I know this is hand-wavy, but it's a very non-linear problem... I do know the "optimum" solution time is not going to be below CFL of 25 or above ~15 linear iterations. If we need that kind of settings for some cases for stability we need to understand what's responsible for it. |
|
For completeness, something I never got to try were subspace recycling techniques, each linear system starts from scratch, if we can carry some information from the previous iterations, we can potentially get to a higher CFL at the same cost. The other known challenge of Krylov methods is scaling, with more unknowns, you also need larger subspaces because there are more "slow modes". In theory, the solution for that is a preconditioner that can take care of those modes (multigrid). |
|
Do you think that the Eisenstat-Walker strategy for choosing the tolerance of the linear solver could be useful? |
|
I think those strategies require additional matrix-vector products which is what we are trying to minimize. |
|
I can confirm that decreasing so much the linear solver tolerance has not a good impact on the simulation of the HLPW5 case. Maybe at the start of the simulation, the tolerance was ok, but then as the non-linear residual dropped, it had become too large. That is why I was proposing something like the Eisenstat-Walker to relate the linear solver tolerance to the non-linear one, although I understand it might be impractical. |
|
Yeah when the convergence is all spiky like that it's usually not a good sign, in any case I think I gave good-enough disclaimers 😅
If tighter tolerances are necessary to achieve convergence then we can try to understand why
Some of the NK parameters allow ramping the linear solver tolerance (third IPARAM and third DPARAM). |
bigfooted
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your service!
|
The convergence looks good. Just a few notes on the CFL adaption based on my experimentation, especially with the HLPW cases:
@rois1995, for the last 2 results you posted, how does the CFL compare per iteration? And how do the new force values compare to the workshop results? |
I have the same experience here, that this is a very safe setup. It might be possible that there are setup that benefit from more aggressive settings, but this always seems to work fine. I also use low linear solver convergence criteria for the CFL_UP. |














Proposed Changes
Described in the config template and the Oneram M6 with NK example.
It was also possible to improve performance by not computing the gradients of density and enthalpy for the Roe scheme.
PR Checklist
pre-commit run --allto format old commits.