Skip to content

CogvideoX VAE decoder consumes significantly more memory in the latest version  #10035

@ic-synth

Description

@ic-synth

Describe the bug

The memory consumption for CogVideoX decoder in diffusers 0.31.0 version consumes significantly more memory. To the point where the model goes OOM even on 80G H100 GPUs with a relatively modest frame count.
I include two profiles for very small input tensors of only 5 frames where its visible how much larger the VAE memory consumption is.

Memory footprints for different input sizes are shown below. As you can see, with latest version memory keeps growing with frame count.
diffusers version
diffusers version 3

Reproduction

Run CogVideoXDecoder3D model with diffusers 0.30.3 and 0.31.0 on the inputs of the same shape and measure the memory consumption as the frame count increases.

# Code requires a GPU with 50+ gigabytes on RAM 
import torch
import diffusers
from diffusers import AutoencoderKLCogVideoX

with torch.no_grad():
    vae = AutoencoderKLCogVideoX().to(dtype=torch.bfloat16).eval()
    vae.decoder = vae.decoder.to(device="cuda:0")
    input_tensor = torch.randn(1,16,5,96,170).to(device="cuda:0", dtype=torch.bfloat16)
    print("Decoding ... Input size:", input_tensor.shape, "Diffuser version", diffusers.__version__)
    vae.decode(input_tensor)
    print(torch.cuda.max_memory_allocated() / (1024 ** 3) , torch.cuda.max_memory_reserved() / (1024 ** 3))

Logs

No response

System Info

Python 3.11.
Diffusers 0.30.3 vs 0.31.0

Who can help?

@sayakpaul @DN6 @yiyixuxu

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions