Questions of pre-training LoRA with other modules simutaneously

### Describe the bug

Sorry for asking questions unrelated to diffusers... I am trying to jountly train an ELLA model with a UNET LoRA (rank: 64). However, I confronted a weird situation, the  ELLA model worked well while the LoRA crashed and made the output like random noise.  Have you guys witnessed this phenomenon?

![img-0-size-512-lora-True](https://github.com/huggingface/diffusers/assets/38740075/c59f7d89-c18a-4783-b7f1-2bb45cb7ed43)


### Reproduction

I have tried multiple ways, like FP16, FP32, Deepspeed, and checked my code many times. It confused me for a long time. I would appreciate if the community conld provide some suggestions or guidiances.

### Logs

_No response_

### System Info

- `diffusers` version: 0.28.0.dev0
- Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.2.0+cu121 (True)
- Huggingface_hub version: 0.21.3
- Transformers version: 4.30.2
- Accelerate version: 0.21.0
- xFormers version: 0.0.24
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>


### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions of pre-training LoRA with other modules simutaneously #8472

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions of pre-training LoRA with other modules simutaneously #8472

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions