use cublas_ops for a 15-20% speedup on matmuls with some loss of quality. #389

Ph0rk0z · 2025-12-21T20:56:54Z

So cublas ops in comfyui is actually a python package. https://github.com/aredden/torch-cublas-hgemm

It uses custom cuda kernels to speed up matmuls and does wonders for FP16 weights. On my 2080ti-22g I actually flip to IT/s from s/IT using the BF16 cast weights. Not so dramatic here. Only about 1.40s/it to 1.12s/it. In practical terms it turns a 14.x second generation to a 10.3 second one. with LoRA

Everything comes with a price and using FP16 is not always the best quality. If you're doing sage attention already.. that's int8. And no, that's not lossless either.

Being a shitty developer, I whip this up with some help for your perusal. If you want faster gens on meh hardware, it's worth looking into. Probably did some stupids. My gens in sillytavern aren't meant to be a masterpiece or a deliverable so I'd rather not wait.

model I tested this with is z-image Q8 and of course a GGUF qwen. we know those 2 will work and dare I say, it appears to run faster and have better quality than what I got out of nunchaku (2080ti). ofc I am also compiling with torch.

edit: This does not work for T5. It dequants to F32 and doesn't survive the casting. My only viable idea is to detect F32/BF16 and then skip cublas_ops.

add cublas_ops

6accdfa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use cublas_ops for a 15-20% speedup on matmuls with some loss of quality. #389

use cublas_ops for a 15-20% speedup on matmuls with some loss of quality. #389

Uh oh!

Ph0rk0z commented Dec 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

use cublas_ops for a 15-20% speedup on matmuls with some loss of quality. #389

Are you sure you want to change the base?

use cublas_ops for a 15-20% speedup on matmuls with some loss of quality. #389

Uh oh!

Conversation

Ph0rk0z commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ph0rk0z commented Dec 21, 2025 •

edited

Loading