-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion #12852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion #12852
Conversation
| scheduler = FlowUniPCMultistepScheduler() | ||
|
|
||
| # NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5) | ||
| text_encoder = Qwen2VLForConditionalGeneration.from_pretrained( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an internal Qwen2_5_VL model to test with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work?
| text_encoder = Qwen2_5_VLForConditionalGeneration(config) |
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I left some comments. My major comments are on separating the pipelines from one another instead of inheriting from one another.
Let's also add docs?
| video = self.vae.decode(latents.to(self.vae.dtype), return_dict=False)[0] | ||
|
|
||
| assert self.safety_checker is not None | ||
| self.safety_checker.to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to do it in this PR, but we could have a little utility like run_safety_checker() inside the pipelines and copy it over all the cosmos pipelines that require it (much akin to encode_prompt(), for example).
But this is not merge-blocking.
| scheduler = FlowUniPCMultistepScheduler() | ||
|
|
||
| # NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5) | ||
| text_encoder = Qwen2VLForConditionalGeneration.from_pretrained( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work?
| text_encoder = Qwen2_5_VLForConditionalGeneration(config) |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left some feedback. Mainly:
-
Can we look into supporting the scheduler from the existing
UniPCMultistepSchedulerwith the flow matching options (use_flow_sigmas,prediction_type="flow_prediction")? I'm ok adding this if it requires a lot of changes or just doesn't make sense to use the existing one, but wanted to check first. -
for the Pipeline, can we combine the 3 pipelines into one
Cosmos2_5PredictPipelinethat inherits directly fromDiffusionPipeline? The current design isn't how we typically structure pipelines in diffusers. This isn't ideal since our pipelines are normally task-based (text2image, image2video, etc.), but i have to admit it's getting increasingly difficult to keep that pattern without huge portion of duplicated code. I think a single unified pipeline is reasonable here.
4133c68 to
48f373b
Compare
- New scheduler: scheduling_flow_unipc_multistep.py - Changes to TransformerCosmos for text embeddings via crossattn_proj
48f373b to
c14a3da
Compare
c14a3da to
b76f9f2
Compare
|
Updated PR to address comments, docs should be here: https://moon-ci-docs.huggingface.co/docs/diffusers/pr_12852/en/api/pipelines/cosmos when they are updated. I can update the main example to the latest model once we have uploaded the converted checkpoint to huggingface. |
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
|
Could you run |
5f41bc1 to
735fb0e
Compare
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
What does this PR do?
This PR adds Cosmos Predict2.5 Base. It has been tested using the 2B model checkpoint official HF checkpoint is here. The converted checkpoints have yet to be uploaded to HF.
This change is largely based off the previous predict1/predict2 support done by @a-r-r-o-w
Testing:
Additions
Pipelines:
Qwen2_5_VLForConditionalGenerationnum_frames=1batch_size == 1Scheduler:
FlowUniPCMultistepScheduler: is a new scheduler introduced, which uses the EDM noise schedule (Karras sigmas) using the UniPC algorithm as predict2.5 uses flow matching. This name can be changed.UniPCMultistepSchedulerscheduler via supportinguse_karras_sigmas=Trueanduse_flow_sigmas=TrueModel changes:
CosmosTransformerto accept an optional cross-attention projection layer (used for text embeddings from Reason1)Scripts:
scripts/convert_cosmos_to_diffusers.pyto support Predict2.5Who can review?