-
Notifications
You must be signed in to change notification settings - Fork 788
Open
Labels
module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/partner: qualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, QualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm
Description
Hi, I want to perform batch inference on the 8255 device now.
I noticed there is a --num_iters parameter in qnn_llama_runner. Is this parameter for batch inference? Additionally, how can I use the KV cache, that is, load the model and system_prompt once and then perform multiple inferences.
Looking forward to your reply.
cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin
Metadata
Metadata
Assignees
Labels
module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/partner: qualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, QualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm