Skip to content

Batch Inference On 8255 device #16413

@imjking

Description

@imjking

Hi, I want to perform batch inference on the 8255 device now.
I noticed there is a --num_iters parameter in qnn_llama_runner. Is this parameter for batch inference? Additionally, how can I use the KV cache, that is, load the model and system_prompt once and then perform multiple inferences.
Looking forward to your reply.

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/partner: qualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions