Skip to content

Misc. bug: Vision and Omni models (except for Gemma3VL) crashes and exits when processing image. #18311

@alan-l

Description

@alan-l

Name and Version

llama-server.exe running b7513 but haven't fully tested how many versions back this started.

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server.exe --model ".\gemma-3-12b-it-qat-UD-Q4_K_XL.gguf" --mmproj ".\gemma-3-12b-it-qat-UD-mmproj.F16.gguf"

Problem description & steps to reproduce

WebUI processing image with my request generates the following error and then quits:

0.49.164.680 I slot launch_slot_: id 3 | task 0 | processing task
0.49.164.725 I slot update_slots: id 3 | task 0 | new prompt, n_ctx_slot = 16384, n_keep = -1, task.n_tokens = 1465
0.49.164.731 I slot update_slots: id 3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
0.49.164.778 I slot update_slots: id 3 | task 0 | prompt processing progress, n_tokens = 28, batch.n_tokens = 28, progress = 0.019113
0.51.364.526 I slot update_slots: id 3 | task 0 | n_tokens = 28, memory_seq_rm [28, end)
0.51.364.550 I srv process_chun: processing image...
0.51.364.668 I encoding image slice...
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-vulkan\ggml-vulkan.cpp:5928: GGML_ASSERT(wg0 <= ctx->device->properties.limits.maxComputeWorkGroupCount[0] && wg1 <= ctx->device->properties.limits.maxComputeWorkGroupCount[1] && wg2 <= ctx->device->properties.limits.maxComputeWorkGroupCount[2]) failed

First Bad Commit

currently on b7513, but haven't tested how many versions back this problem started.

Relevant log output

0.49.164.680 I slot launch_slot_: id  3 | task 0 | processing task
0.49.164.725 I slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 16384, n_keep = -1, task.n_tokens = 1465
0.49.164.731 I slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
0.49.164.778 I slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 28, batch.n_tokens = 28, progress = 0.019113
0.51.364.526 I slot update_slots: id  3 | task 0 | n_tokens = 28, memory_seq_rm [28, end)
0.51.364.550 I srv  process_chun: processing image...
0.51.364.668 I encoding image slice...
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-vulkan\ggml-vulkan.cpp:5928: GGML_ASSERT(wg0 <= ctx->device->properties.limits.maxComputeWorkGroupCount[0] && wg1 <= ctx->device->properties.limits.maxComputeWorkGroupCount[1] && wg2 <= ctx->device->properties.limits.maxComputeWorkGroupCount[2]) failed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions