Skip to content

Commit 3e60fa7

Browse files
committed
Merge branch 'main' into z-image-omni-base
2 parents 3cbb38d + f7753b1 commit 3e60fa7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+8215
-904
lines changed

docs/source/en/_toctree.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,8 @@
365365
title: HunyuanVideoTransformer3DModel
366366
- local: api/models/latte_transformer3d
367367
title: LatteTransformer3DModel
368+
- local: api/models/longcat_image_transformer2d
369+
title: LongCatImageTransformer2DModel
368370
- local: api/models/ltx_video_transformer3d
369371
title: LTXVideoTransformer3DModel
370372
- local: api/models/lumina2_transformer2d
@@ -402,7 +404,7 @@
402404
- local: api/models/wan_transformer_3d
403405
title: WanTransformer3DModel
404406
- local: api/models/z_image_transformer2d
405-
title: ZImageTransformer2DModel
407+
title: ZImageTransformer2DModel
406408
title: Transformers
407409
- sections:
408410
- local: api/models/stable_cascade_unet
@@ -563,6 +565,8 @@
563565
title: Latent Diffusion
564566
- local: api/pipelines/ledits_pp
565567
title: LEDITS++
568+
- local: api/pipelines/longcat_image
569+
title: LongCat-Image
566570
- local: api/pipelines/lumina2
567571
title: Lumina 2.0
568572
- local: api/pipelines/lumina

docs/source/en/api/models/controlnet.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,21 @@ url = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/m
3333
pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
3434
```
3535

36+
## Loading from Control LoRA
37+
38+
Control-LoRA is introduced by Stability AI in [stabilityai/control-lora](https://huggingface.co/stabilityai/control-lora) by adding low-rank parameter efficient fine tuning to ControlNet. This approach offers a more efficient and compact method to bring model control to a wider variety of consumer GPUs.
39+
40+
```py
41+
from diffusers import ControlNetModel, UNet2DConditionModel
42+
43+
lora_id = "stabilityai/control-lora"
44+
lora_filename = "control-LoRAs-rank128/control-lora-canny-rank128.safetensors"
45+
46+
unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", torch_dtype=torch.bfloat16).to("cuda")
47+
controlnet = ControlNetModel.from_unet(unet).to(device="cuda", dtype=torch.bfloat16)
48+
controlnet.load_lora_adapter(lora_id, weight_name=lora_filename, prefix=None, controlnet_config=controlnet.config)
49+
```
50+
3651
## ControlNetModel
3752

3853
[[autodoc]] ControlNetModel
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# LongCatImageTransformer2DModel
14+
15+
The model can be loaded with the following code snippet.
16+
17+
```python
18+
from diffusers import LongCatImageTransformer2DModel
19+
20+
transformer = LongCatImageTransformer2DModel.from_pretrained("meituan-longcat/LongCat-Image ", subfolder="transformer", torch_dtype=torch.bfloat16)
21+
```
22+
23+
## LongCatImageTransformer2DModel
24+
25+
[[autodoc]] LongCatImageTransformer2DModel

docs/source/en/api/pipelines/cosmos.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,12 @@ output.save("output.png")
7070
- all
7171
- __call__
7272

73+
## Cosmos2_5_PredictBasePipeline
74+
75+
[[autodoc]] Cosmos2_5_PredictBasePipeline
76+
- all
77+
- __call__
78+
7379
## CosmosPipelineOutput
7480

7581
[[autodoc]] pipelines.cosmos.pipeline_output.CosmosPipelineOutput
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# LongCat-Image
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
</div>
18+
19+
20+
We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.
21+
22+
23+
### Key Features
24+
- 🌟 **Exceptional Efficiency and Performance**: With only **6B parameters**, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
25+
- 🌟 **Superior Editing Performance**: LongCat-Image-Edit model achieves state-of-the-art performance among open-source models, delivering leading instruction-following and image quality with superior visual consistency.
26+
- 🌟 **Powerful Chinese Text Rendering**: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
27+
- 🌟 **Remarkable Photorealism**: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.
28+
- 🌟 **Comprehensive Open-Source Ecosystem**: We provide a complete toolchain, from intermediate checkpoints to full training code, significantly lowering the barrier for further research and development.
29+
30+
For more details, please refer to the comprehensive [***LongCat-Image Technical Report***](https://arxiv.org/abs/2412.11963)
31+
32+
33+
## Usage Example
34+
35+
```py
36+
import torch
37+
import diffusers
38+
from diffusers import LongCatImagePipeline
39+
40+
weight_dtype = torch.bfloat16
41+
pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype=torch.bfloat16 )
42+
pipe.to('cuda')
43+
# pipe.enable_model_cpu_offload()
44+
45+
prompt = '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。'
46+
image = pipe(
47+
prompt,
48+
height=768,
49+
width=1344,
50+
guidance_scale=4.0,
51+
num_inference_steps=50,
52+
num_images_per_prompt=1,
53+
generator=torch.Generator("cpu").manual_seed(43),
54+
enable_cfg_renorm=True,
55+
enable_prompt_rewrite=True,
56+
).images[0]
57+
image.save(f'./longcat_image_t2i_example.png')
58+
```
59+
60+
61+
This pipeline was contributed by LongCat-Image Team. The original codebase can be found [here](https://github.com/meituan-longcat/LongCat-Image).
62+
63+
Available models:
64+
<div style="overflow-x: auto; margin-bottom: 16px;">
65+
<table style="border-collapse: collapse; width: 100%;">
66+
<thead>
67+
<tr>
68+
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Models</th>
69+
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Type</th>
70+
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Description</th>
71+
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Download Link</th>
72+
</tr>
73+
</thead>
74+
<tbody>
75+
<tr>
76+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image</td>
77+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Text&#8209;to&#8209;Image</td>
78+
<td style="padding: 8px; border: 1px solid #d0d7de;">Final Release. The standard model for out&#8209;of&#8209;the&#8209;box inference.</td>
79+
<td style="padding: 8px; border: 1px solid #d0d7de;">
80+
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image">Huggingface</a></span>
81+
</td>
82+
</tr>
83+
<tr>
84+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image&#8209;Dev</td>
85+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Text&#8209;to&#8209;Image</td>
86+
<td style="padding: 8px; border: 1px solid #d0d7de;">Development. Mid-training checkpoint, suitable for fine-tuning.</td>
87+
<td style="padding: 8px; border: 1px solid #d0d7de;">
88+
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image-Dev">Huggingface</a></span>
89+
</td>
90+
</tr>
91+
<tr>
92+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image&#8209;Edit</td>
93+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Image Editing</td>
94+
<td style="padding: 8px; border: 1px solid #d0d7de;">Specialized model for image editing.</td>
95+
<td style="padding: 8px; border: 1px solid #d0d7de;">
96+
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image-Edit">Huggingface</a></span>
97+
</td>
98+
</tr>
99+
</tbody>
100+
</table>
101+
</div>
102+
103+
## LongCatImagePipeline
104+
105+
[[autodoc]] LongCatImagePipeline
106+
- all
107+
- __call__
108+
109+
## LongCatImagePipelineOutput
110+
111+
[[autodoc]] pipelines.longcat_image.pipeline_output.LongCatImagePipelineOutput
112+
113+
114+
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Control-LoRA inference example
2+
3+
Control-LoRA is introduced by Stability AI in [stabilityai/control-lora](https://huggingface.co/stabilityai/control-lora) by adding low-rank parameter efficient fine tuning to ControlNet. This approach offers a more efficient and compact method to bring model control to a wider variety of consumer GPUs.
4+
5+
## Installing the dependencies
6+
7+
Before running the scripts, make sure to install the library's training dependencies:
8+
9+
**Important**
10+
11+
To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
12+
```bash
13+
git clone https://github.com/huggingface/diffusers
14+
cd diffusers
15+
pip install .
16+
```
17+
18+
Then cd in the example folder and run
19+
```bash
20+
pip install -r requirements.txt
21+
```
22+
23+
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
24+
25+
```bash
26+
accelerate config
27+
```
28+
29+
## Inference on SDXL
30+
31+
[stabilityai/control-lora](https://huggingface.co/stabilityai/control-lora) provides a set of Control-LoRA weights for SDXL. Here we use the `canny` condition to generate an image from a text prompt and a reference image.
32+
33+
```bash
34+
python control_lora.py
35+
```
36+
37+
## Acknowledgements
38+
39+
- [stabilityai/control-lora](https://huggingface.co/stabilityai/control-lora)
40+
- [comfyanonymous/ControlNet-v1-1_fp16_safetensors](https://huggingface.co/comfyanonymous/ControlNet-v1-1_fp16_safetensors)
41+
- [HighCWu/control-lora-v2](https://github.com/HighCWu/control-lora-v2)
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
import cv2
2+
import numpy as np
3+
import torch
4+
from PIL import Image
5+
6+
from diffusers import (
7+
AutoencoderKL,
8+
ControlNetModel,
9+
StableDiffusionXLControlNetPipeline,
10+
UNet2DConditionModel,
11+
)
12+
from diffusers.utils import load_image, make_image_grid
13+
14+
15+
pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
16+
lora_id = "stabilityai/control-lora"
17+
lora_filename = "control-LoRAs-rank128/control-lora-canny-rank128.safetensors"
18+
19+
unet = UNet2DConditionModel.from_pretrained(pipe_id, subfolder="unet", torch_dtype=torch.bfloat16).to("cuda")
20+
controlnet = ControlNetModel.from_unet(unet).to(device="cuda", dtype=torch.bfloat16)
21+
controlnet.load_lora_adapter(lora_id, weight_name=lora_filename, prefix=None, controlnet_config=controlnet.config)
22+
23+
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
24+
negative_prompt = "low quality, bad quality, sketches"
25+
26+
image = load_image(
27+
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
28+
)
29+
30+
controlnet_conditioning_scale = 1.0 # recommended for good generalization
31+
32+
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae", torch_dtype=torch.bfloat16)
33+
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
34+
pipe_id,
35+
unet=unet,
36+
controlnet=controlnet,
37+
vae=vae,
38+
torch_dtype=torch.bfloat16,
39+
safety_checker=None,
40+
).to("cuda")
41+
42+
image = np.array(image)
43+
image = cv2.Canny(image, 100, 200)
44+
image = image[:, :, None]
45+
image = np.concatenate([image, image, image], axis=2)
46+
image = Image.fromarray(image)
47+
48+
images = pipe(
49+
prompt,
50+
negative_prompt=negative_prompt,
51+
image=image,
52+
controlnet_conditioning_scale=controlnet_conditioning_scale,
53+
num_images_per_prompt=4,
54+
).images
55+
56+
final_image = [image] + images
57+
grid = make_image_grid(final_image, 1, 5)
58+
grid.save("hf-logo_canny.png")

0 commit comments

Comments
 (0)