Reducing WAN VRAM requirements by pipelining encoding->high noise->low_noise stages #1129
rendang-github
started this conversation in
Ideas
Replies: 2 comments 1 reply
-
|
Did you use |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Feel free to code review #1059 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have been perplexed by my inability to run WAN 2.2 renders on my RTX 3090 (24GB VRAM), even when using sd-cli and the demo prompts in docs/wan.md and related Q8 GGUFs. So I dug into the codebase with vim and gdb to figure out what's going on, and it appears that the system attempts to load the T5 encoder, the low noise model, and the high noise model concurrently.
My understanding of how WAN 2.2 works is that the text encoding, high noise pass, and low noise pass all take place in distinct stages, where there should not be any overlap of model usage. I was wondering if there was a reason why they are all pre-loaded into VRAM instead of loading and loading each one sequentially in lower VRAM environments? I get that this approach would be slower than loading everything in VRAM all at once at startup, however my budget for H200's appears to have run dry (I checked for spare change under the sofa too), so I'm prepared to compromise on slightly slower run times to facilitate swapping in and out of models as they are required.
Beta Was this translation helpful? Give feedback.
All reactions