Skip to content

leffff/Diffusion-Reward-Modeling-for-Text-Rendering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Diffusion Reward Modeling for text rendering on images ✨

🌟 Main Contributions

1) Text Rendering Dataset

πŸ”— Diffusion-Reward-Modeling-for-Text-Rendering-Dataset
πŸ“Š 14,000 curated prompts for text-on-image generation

2) 6 Alignment Pipelines

  • Supervised Fine-Tuning (SFT)
  • Reward Weighted Regression (RWR)
  • Direct Preference Optimization (DPO)
  • DRaFT
  • ReFL
  • GRPO

3) 2 Text Rendering Quality Metrics

  • OCR Accuracy Metric
  • Reward Model Score

πŸ“Š Visualizations

SD3.5 Medium base

A superrealistic panda holding a sign that says "I Love SMILES 2025"
  
An asian dragon holding a sign with "Summer of Machine Learning by Skoltech 2025 !"
  
"I love Harbin Institute of Technology" written on a chinese office building

SFT Results

OCR Metric

Reward Metric


RWR Results

OCR Metric

Reward Metric


DPO Results

OCR Metric

Reward Metric


SFT + DPO Results

OCR Metric

Reward Metric


DRaFT Results (Reward only)


ReFL Results (Reward only)


GRPO

Reward Metric

πŸ”€ Text Rendering Quality Metrics

πŸ“Š Text Rendering Quality Metric Distributions

πŸ“ˆ Method Comparison

Method comparison metrics

πŸš€ Usage

SFT:

sh run_train_sd3_sft.sh

RWR:

sh run_train_sd3_rwr.sh

DPO:

sh run_train_sd3_dpo.sh

ReFL:

sh run_train_sd3_refl.sh

DRaFT:

sh run_train_sd3_draft.sh

GRPO:

sh run_train_sd3_grpo.sh

Generate Latents:

generate_visuals_sd3_480p.sh

Encode text prompts into embeddings

generate_text_embeds_sd3.sh

Calculate OCR + Levenstein metric

calculate_levenstein_metric.sh

Calculate Reward metric

calculate_reward_metric.sh

⚠️ Warning

Quality of DRaFT, ReFL and GRPO is worse in the examples we provide than the ones of DPO, this is due to a small batch size in comparison to DPO. DRaFT, ReFL and GRPO quality could be improved by introducing EMA.

πŸ“œ Citation

@article{liu2025improving,
  title={Improving video generation with human feedback},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Yuan, Ziyang and Liu, Xiaokun and Zheng, Mingwu and Wu, Xiele and Wang, Qiulin and Qin, Wenyu and Xia, Menghan and others},
  journal={arXiv preprint arXiv:2501.13918},
  year={2025}
}
@article{clark2023directly,
  title={Directly fine-tuning diffusion models on differentiable rewards},
  author={Clark, Kevin and Vicol, Paul and Swersky, Kevin and Fleet, David J},
  journal={arXiv preprint arXiv:2309.17400},
  year={2023}
}
@article{xu2023imagereward,
  title={Imagereward: Learning and evaluating human preferences for text-to-image generation},
  author={Xu, Jiazheng and Liu, Xiao and Wu, Yuchen and Tong, Yuxuan and Li, Qinkai and Ding, Ming and Tang, Jie and Dong, Yuxiao},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={15903--15935},
  year={2023}
}
@article{liu2025flow,
  title={Flow-grpo: Training flow matching models via online rl},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
  journal={arXiv preprint arXiv:2505.05470},
  year={2025}
}
@article{gao2025seedream,
  title={Seedream 3.0 technical report},
  author={Gao, Yu and Gong, Lixue and Guo, Qiushan and Hou, Xiaoxia and Lai, Zhichao and Li, Fanshi and Li, Liang and Lian, Xiaochen and Liao, Chao and Liu, Liyang and others},
  journal={arXiv preprint arXiv:2504.11346},
  year={2025}
}

πŸ“§ Contact

Supported by:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5