-
Notifications
You must be signed in to change notification settings - Fork 14.2k
presets: refactor, allow cascade presets from different sources, add global section #18169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks good! I'll deploy this on my test server tonight and report back with results! |
|
First basic test OK: In my case, as a user, this allows to have a complete configuration file; it is no longer necessary to modify the command line (systemd) each time I want to change a global ./llama-server --port 8082 --models-max 1 --models-preset backend.ini --webui-config-file frontend.json Presets (backend.ini)Log :-> Minor nitpick (not in this PR): if we want --kv-unified in logs instead of -kvu, we could swap the order in arg.cpp: {"-kvu", "--kv-unified"} since to_args() uses .back() -> EDIT : #18196 (for all args + doc) CLI with -c 1024 overwrite .ini per-model config -> wanted -> OKSingle model mode testing (with some args) OK!Major functionality validated, we can merge it! |
Thanks for testing. Yes, feel free to create a new PR for fixing this. Out convention is to have short form first, then followed by long form |
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm traveling for a few days and won't be able to do very detailed testing/review. Approving to not block the work on this and added @ServeurpersoCom to write access group for additional approvals if needed.
| } | ||
| // 2. local models specificed via --models-dir | ||
| common_presets cached_models = ctx_preset.load_from_cache(); | ||
| SRV_INF("Loaded %zu cached model presets\n", cached_models.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for most logs we prefix with the function name:
| SRV_INF("Loaded %zu cached model presets\n", cached_models.size()); | |
| SRV_INF("%s: Loaded %zu cached model presets\n", __func__, cached_models.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just test Windows (build OK on my side, need some basic test), merge, and I complete my separate PR "special nits" #18196 (that way we don't have any conflicts) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the SRV_INF macro should already be prefixed with the function name, so I think this is not necessary:
#define SRV_INF(fmt, ...) LOG_INF("srv %12.*s: " fmt, 12, __func__, __VA_ARGS__)
Alternative to #17959
Fix #17948
Before this PR, the logic for loading models from different sources (cache / local / custom ini) was quite messy and doesn't allow ini preset to take precedence over other sources.
With this PR, we unify the method for loading server models and presets:
preset.cppis responsible for collecting all model sources (cache / local) and generate a base preset for each of the known GGUFpreset.cppthen load INI and parse the global section ([*])server-models.cpp) to decide how to cascade these presetsThe current cascading rule can be found in server's docs:
llama-server(highest priority)[ggml-org/MY-MODEL...])[*])