Batch Inference On 8255 device

Hi, I want to perform batch inference on the 8255 device now. 
I noticed there is a --num_iters parameter in qnn_llama_runner. Is this parameter for batch inference? Additionally, how can I use the KV cache, that is, load the model and system_prompt once and then perform multiple inferences. 
Looking forward to your reply.

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch Inference On 8255 device #16413

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batch Inference On 8255 device #16413

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions