π Diffusion-Reward-Modeling-for-Text-Rendering-Dataset
π 14,000 curated prompts for text-on-image generation
- Supervised Fine-Tuning (SFT)
- Reward Weighted Regression (RWR)
- Direct Preference Optimization (DPO)
- DRaFT
- ReFL
- GRPO
- OCR Accuracy Metric
- Reward Model Score
A superrealistic panda holding a sign that says "I Love SMILES 2025"
ββ
An asian dragon holding a sign with "Summer of Machine Learning by Skoltech 2025 !"
ββ
"I love Harbin Institute of Technology" written on a chinese office building
SFT:
sh run_train_sd3_sft.shRWR:
sh run_train_sd3_rwr.shDPO:
sh run_train_sd3_dpo.shReFL:
sh run_train_sd3_refl.shDRaFT:
sh run_train_sd3_draft.shGRPO:
sh run_train_sd3_grpo.shGenerate Latents:
generate_visuals_sd3_480p.shEncode text prompts into embeddings
generate_text_embeds_sd3.shCalculate OCR + Levenstein metric
calculate_levenstein_metric.shCalculate Reward metric
calculate_reward_metric.shQuality of DRaFT, ReFL and GRPO is worse in the examples we provide than the ones of DPO, this is due to a small batch size in comparison to DPO. DRaFT, ReFL and GRPO quality could be improved by introducing EMA.
@article{liu2025improving,
title={Improving video generation with human feedback},
author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Yuan, Ziyang and Liu, Xiaokun and Zheng, Mingwu and Wu, Xiele and Wang, Qiulin and Qin, Wenyu and Xia, Menghan and others},
journal={arXiv preprint arXiv:2501.13918},
year={2025}
}@article{clark2023directly,
title={Directly fine-tuning diffusion models on differentiable rewards},
author={Clark, Kevin and Vicol, Paul and Swersky, Kevin and Fleet, David J},
journal={arXiv preprint arXiv:2309.17400},
year={2023}
}@article{xu2023imagereward,
title={Imagereward: Learning and evaluating human preferences for text-to-image generation},
author={Xu, Jiazheng and Liu, Xiao and Wu, Yuchen and Tong, Yuxuan and Li, Qinkai and Ding, Ming and Tang, Jie and Dong, Yuxiao},
journal={Advances in Neural Information Processing Systems},
volume={36},
pages={15903--15935},
year={2023}
}@article{liu2025flow,
title={Flow-grpo: Training flow matching models via online rl},
author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
journal={arXiv preprint arXiv:2505.05470},
year={2025}
}@article{gao2025seedream,
title={Seedream 3.0 technical report},
author={Gao, Yu and Gong, Lixue and Guo, Qiushan and Hou, Xiaoxia and Lai, Zhichao and Li, Fanshi and Li, Liang and Lian, Xiaochen and Liao, Chao and Liu, Liyang and others},
journal={arXiv preprint arXiv:2504.11346},
year={2025}
}- Lev Novitskiy: Data, SD3 pipeline, DPO, GRPO
- Maria Kovaleva: Rewards, RWR, DraFT, ReFL
- Daniel Kniazev: DraFT, ReFL
- Aleksandr Kutsakov: SFT, DPO
- Ilia Statsenko: Metrics pipeline, SFT
- S. Panova: Paper, Presentation
- Viacheslav Vasilev: Project Coordinator









































