Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion #12852

miguelmartin75 · 2025-12-17T00:32:26Z

What does this PR do?

This PR adds Cosmos Predict2.5 Base. It has been tested using the 2B model checkpoint official HF checkpoint is here. The converted checkpoints have yet to be uploaded to HF.

This change is largely based off the previous predict1/predict2 support done by @a-r-r-o-w

Testing:

Aside from the unit tests, the examples in this README from cosmos predict2.5 have been checked against, and the inference pipeline produces similar outputs.
These are included in the docstring for the pipeline

Additions

Pipelines:

A base pipeline which handles all modes for Predict2.5 Base checkpoint: Text2World, Image2World, Video2World
- This pipeline loads the Reason1 checkpoint via Qwen2_5_VLForConditionalGeneration
- In fact this pipeline supports image output if providing num_frames=1
- Limitations: the pipeline assumes batch_size == 1
Three derivative pipelines are made based on this pipeline, to handle the official modes: Text2World, Image2World, Video2World
Unit tests are added, based on tests present for cosmos predict2

Scheduler:

~~FlowUniPCMultistepScheduler: is a new scheduler introduced, which uses the EDM noise schedule (Karras sigmas) using the UniPC algorithm as predict2.5 uses flow matching. This name can be changed.~~
The above is integrated into the existing UniPCMultistepScheduler scheduler via supporting use_karras_sigmas=True and use_flow_sigmas=True
This is done to match the predict2.5 codebase

Model changes:

Modified CosmosTransformer to accept an optional cross-attention projection layer (used for text embeddings from Reason1)

Scripts:

Extended scripts/convert_cosmos_to_diffusers.py to support Predict2.5

Who can review?

@yiyixuxu - pipeline and scheduler
@sayakpaul - Documentation review
@a-r-r-o-w

miguelmartin75 · 2025-12-17T00:48:19Z

tests/pipelines/cosmos/test_cosmos2_5_predict.py

+        scheduler = FlowUniPCMultistepScheduler()
+
+        # NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5)
+        text_encoder = Qwen2VLForConditionalGeneration.from_pretrained(


Is there an internal Qwen2_5_VL model to test with?

Would something like this work?

diffusers/tests/pipelines/qwenimage/test_qwenimage.py

Line 116 in 5e48f46

text_encoder = Qwen2_5_VLForConditionalGeneration(config)

sayakpaul

Thanks for the PR!

I left some comments. My major comments are on separating the pipelines from one another instead of inheriting from one another.

Let's also add docs?

src/diffusers/models/transformers/transformer_cosmos.py

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

sayakpaul · 2025-12-17T03:59:25Z

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

+            video = self.vae.decode(latents.to(self.vae.dtype), return_dict=False)[0]
+
+            assert self.safety_checker is not None
+            self.safety_checker.to(device)


We don't have to do it in this PR, but we could have a little utility like run_safety_checker() inside the pipelines and copy it over all the cosmos pipelines that require it (much akin to encode_prompt(), for example).

But this is not merge-blocking.

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

sayakpaul · 2025-12-17T04:04:20Z

tests/pipelines/cosmos/test_cosmos2_5_predict.py

+        scheduler = FlowUniPCMultistepScheduler()
+
+        # NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5)
+        text_encoder = Qwen2VLForConditionalGeneration.from_pretrained(


Would something like this work?

diffusers/tests/pipelines/qwenimage/test_qwenimage.py

Line 116 in 5e48f46

text_encoder = Qwen2_5_VLForConditionalGeneration(config)

HuggingFaceDocBuilderDev · 2025-12-17T04:14:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

Thanks for the PR! I left some feedback. Mainly:

Can we look into supporting the scheduler from the existing UniPCMultistepScheduler with the flow matching options (use_flow_sigmas, prediction_type="flow_prediction")? I'm ok adding this if it requires a lot of changes or just doesn't make sense to use the existing one, but wanted to check first.
for the Pipeline, can we combine the 3 pipelines into one Cosmos2_5PredictPipeline that inherits directly from DiffusionPipeline? The current design isn't how we typically structure pipelines in diffusers. This isn't ideal since our pipelines are normally task-based (text2image, image2video, etc.), but i have to admit it's getting increasingly difficult to keep that pattern without huge portion of duplicated code. I think a single unified pipeline is reasonable here.

src/diffusers/schedulers/scheduling_flow_unipc_multistep.py

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

- New scheduler: scheduling_flow_unipc_multistep.py - Changes to TransformerCosmos for text embeddings via crossattn_proj

…True

miguelmartin75 · 2025-12-18T03:30:30Z

Updated PR to address comments, docs should be here: https://moon-ci-docs.huggingface.co/docs/diffusers/pr_12852/en/api/pipelines/cosmos when they are updated. I can update the main example to the latest model once we have uploaded the converted checkpoint to huggingface.

sayakpaul · 2025-12-18T03:33:12Z

@bot /style

github-actions · 2025-12-18T03:33:31Z

Style bot fixed some files and pushed the changes.

sayakpaul · 2025-12-18T03:37:25Z

Could you run make fix-copies?

src/diffusers/pipelines/cosmos/__init__.py

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

yiyixuxu

thanks!

miguelmartin75 changed the title ~~Cosmos/predict2.5 base pr ready~~ Cosmos Predict2.5 Base Model Dec 17, 2025

miguelmartin75 changed the title ~~Cosmos Predict2.5 Base Model~~ Cosmos Predict2.5 Base: inference pipeline, checkpoint conversion & scheduler Dec 17, 2025

miguelmartin75 changed the title ~~Cosmos Predict2.5 Base: inference pipeline, checkpoint conversion & scheduler~~ Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion Dec 17, 2025

miguelmartin75 commented Dec 17, 2025

View reviewed changes

sayakpaul reviewed Dec 17, 2025

View reviewed changes

sayakpaul requested review from DN6 and yiyixuxu December 17, 2025 04:06

yiyixuxu reviewed Dec 17, 2025

View reviewed changes

src/diffusers/schedulers/scheduling_flow_unipc_multistep.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py Outdated Show resolved Hide resolved

miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch 2 times, most recently from 4133c68 to 48f373b Compare December 18, 2025 03:24

miguelmartin75 added 16 commits December 18, 2025 03:25

cosmos predict2.5 base: convert chkpt & pipeline

83d2b81

- New scheduler: scheduling_flow_unipc_multistep.py - Changes to TransformerCosmos for text embeddings via crossattn_proj

scheduler cleanup

4395869

simplify inference pipeline

e6e278e

cleanup scheduler + tests

dd6f540

Basic tests for flow unipc

828788e

working b2b inference

899be86

Rename everything

2cc2b56

Tests for pipeline present, but not working (predict2 also not working)

04f23e8

docstring update

824fffa

wrapper pipelines + make style

bae477a

remove unnecessary files

232a816

UniPCMultistep: support use_karras_sigmas=True and use_flow_sigmas=True

1a132f2

use UniPCMultistepScheduler + fix tests for pipeline

9808220

Remove FlowUniPCMultistepScheduler

abba01c

UniPCMultistepScheduler for use_flow_sigmas=True & use_karras_sigmas=…

b9a35f5

…True

num_inference_steps=36 due to bug in scheduler used by predict2.5

dd429ef

miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch from 48f373b to c14a3da Compare December 18, 2025 03:25

Address comments

b76f9f2

miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch from c14a3da to b76f9f2 Compare December 18, 2025 03:26

make style + make fix-copies

735fb0e

miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch from 5f41bc1 to 735fb0e Compare December 18, 2025 04:03