CogvideoX VAE decoder consumes significantly more memory in the latest version 

### Describe the bug

The memory consumption for CogVideoX decoder in diffusers 0.31.0 version consumes significantly more memory. To the point where the model goes OOM even on 80G H100 GPUs with a relatively modest frame count. 
I include two profiles for very small input tensors of only 5 frames where its visible how much larger the VAE memory consumption is. 

Memory footprints for different input sizes are shown below. As you can see, with latest version memory keeps growing with frame count. 
![diffusers version](https://github.com/user-attachments/assets/ce8524fb-2781-4a9a-91a7-9335f21884df)
![diffusers version 3](https://github.com/user-attachments/assets/eaafc484-96d0-4070-983e-ddba10be7e89)



### Reproduction

Run `CogVideoXDecoder3D` model with diffusers 0.30.3 and 0.31.0 on the inputs of the same shape and measure the memory consumption as the frame count increases. 

```
# Code requires a GPU with 50+ gigabytes on RAM 
import torch
import diffusers
from diffusers import AutoencoderKLCogVideoX

with torch.no_grad():
    vae = AutoencoderKLCogVideoX().to(dtype=torch.bfloat16).eval()
    vae.decoder = vae.decoder.to(device="cuda:0")
    input_tensor = torch.randn(1,16,5,96,170).to(device="cuda:0", dtype=torch.bfloat16)
    print("Decoding ... Input size:", input_tensor.shape, "Diffuser version", diffusers.__version__)
    vae.decode(input_tensor)
    print(torch.cuda.max_memory_allocated() / (1024 ** 3) , torch.cuda.max_memory_reserved() / (1024 ** 3))
```


### Logs

_No response_

### System Info

Python 3.11. 
Diffusers 0.30.3 vs 0.31.0

### Who can help?

@sayakpaul @DN6 @yiyixuxu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CogvideoX VAE decoder consumes significantly more memory in the latest version #10035

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CogvideoX VAE decoder consumes significantly more memory in the latest version #10035

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions