SD3 cannot finetunes a better model (hand and face deformation)?

### Describe the bug

I want to finetune sd3 to improve its human generation quality with 3million high-quality human datasets (which has been proven useful on sdxl and other models).  But hand and face deformation doesn't improve much after two days of training. 

I am using [train](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_sd3.py) script

What I have been done so far:
1. regular training with 3 million data with batch size 2x24(V100) for 2 epochs with lr 5e-6 and adamw optimizer
2. prodigy optimizer training  with same setting
3. Add q,k RMS norm to each attention layer
4. only train several blocks

All of my training gives me nearly the same deformation results, where the hands are never normal like human. 

Could you some provide more experiments about sd3 training? There seems no easy way to adapt sd3 for human generation



### Reproduction

Has described in bug part

### Logs

_No response_

### System Info

V100 24GPU, batchsize 2 for each card, 3 million human data with aesthetic score > 4.5

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SD3 cannot finetunes a better model (hand and face deformation)? #8748

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SD3 cannot finetunes a better model (hand and face deformation)? #8748

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions