Skip to content

StableDiffusion3ControlNetPipeline Tensor Shape Mismatch #9360

@Yuhan291

Description

@Yuhan291

Describe the bug

I want to create a class that inherits from StableDiffusion3ControlNetPipeline. When I rewrite the call function, I meet a problem that encoder_hidden_states and context_attn_output have different shapes. The pre-trained model of StableDiffusion3ControlNetPipeline is "stabilityai/stable-diffusion-3-medium-diffusers", and for the ControlNet is "InstantX/SD3-Controlnet-Pose". Where could this problem come from? Thanks for the help.

Reproduction

controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Pose", torch_dtype=torch.float16, cache_dir="models/Controlnet")
pipe = ControlnetPosePipeline.from_pretrained('stabilityai/stable-diffusion-3-medium-diffusers' controlnet=controlnet, token='********', cache_dir="models/Diffusion").to(device)

def call(
self,
latent_list,
prompt: Union[str, List[str]] = None,
prompt_2: Optional[Union[str, List[str]]] = None,
prompt_3: Optional[Union[str, List[str]]] = None,
pose_map: Union[torch.Tensor, PIL.Image.Image, List[PIL.Image.Image]] = None,
image: Union[torch.Tensor, PIL.Image.Image, List[PIL.Image.Image]] = None,
clip_image: Union[torch.Tensor, PIL.Image.Image, List[PIL.Image.Image]] = None,
target_pose = None,
source_pose = None,
height: Optional[int] = None,
width: Optional[int] = None,
num_inference_steps: int = 50,
timesteps: List[int] = None,
sigmas: List[float] = None,
guidance_scale: float = 7.0,
control_guidance_start: Union[float, List[float]] = 0.0,
control_guidance_end: Union[float, List[float]] = 1.0,
controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
controlnet_pooled_projections: Optional[torch.FloatTensor] = None,
negative_prompt: Optional[Union[str, List[str]]] = None,
negative_prompt_2: Optional[Union[str, List[str]]] = None,
negative_prompt_3: Optional[Union[str, List[str]]] = None,
num_images_per_prompt: Optional[int] = 1,
eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.Tensor] = None,
prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.Tensor] = None,
pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
output_type: Optional[str] = "pil",
return_dict: bool = True,
joint_attention_kwargs: Optional[Dict[str, Any]] = None,
clip_skip: Optional[int] = None,
callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
callback_on_step_end_tensor_inputs: List[str] = ["latents"],
max_sequence_length: int = 256,
):
r"""
Function invoked when calling the pipeline for generation.

    Args:
        prompt (`str` or `List[str]`, *optional*):
            The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
            instead.
        pose_map (`torch.Tensor`, `PIL.Image.Image`, `List[torch.Tensor]` or `List[PIL.Image.Image]` or `List[List[PIL.Image.Image]]`):
            The Adapter input condition. Adapter uses this input condition to generate guidance to Unet. If the
            type is specified as `torch.Tensor`, it is passed to Adapter as is. PIL.Image.Image` can also be
            accepted as an image. The control pose_map is automatically resized to fit the output image.
        height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
            The height in pixels of the generated image.
        width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
            The width in pixels of the generated image.
        num_inference_steps (`int`, *optional*, defaults to 50):
            The number of denoising steps. More denoising steps usually lead to a higher quality image at the
            expense of slower inference.
        timesteps (`List[int]`, *optional*):
            Custom timesteps to use for the denoising process with schedulers which support a `timesteps` argument
            in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
            passed will be used. Must be in descending order.
        sigmas (`List[float]`, *optional*):
            Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
            their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
            will be used.
        guidance_scale (`float`, *optional*, defaults to 7.5):
            Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
            `guidance_scale` is defined as `w` of equation 2. of [Imagen
            Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
            1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
            usually at the expense of lower image quality.
        negative_prompt (`str` or `List[str]`, *optional*):
            The prompt or prompts not to guide the image generation. If not defined, one has to pass
            `negative_prompt_embeds`. instead. If not defined, one has to pass `negative_prompt_embeds`. instead.
            Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
        num_images_per_prompt (`int`, *optional*, defaults to 1):
            The number of images to generate per prompt.
        eta (`float`, *optional*, defaults to 0.0):
            Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
            [`schedulers.DDIMScheduler`], will be ignored for others.
        generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
            One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
            to make generation deterministic.
        latents (`torch.Tensor`, *optional*):
            Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
            generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
            tensor will ge generated by sampling using the supplied random `generator`.
        prompt_embeds (`torch.Tensor`, *optional*):
            Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
            provided, text embeddings will be generated from `prompt` input argument.
        negative_prompt_embeds (`torch.Tensor`, *optional*):
            Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
            weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
            argument.
        output_type (`str`, *optional*, defaults to `"pil"`):
            The output format of the generate image. Choose between
            [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
        return_dict (`bool`, *optional*, defaults to `True`):
            Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionAdapterPipelineOutput`] instead
            of a plain tuple.
        callback (`Callable`, *optional*):
            A function that will be called every `callback_steps` steps during inference. The function will be
            called with the following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
        callback_steps (`int`, *optional*, defaults to 1):
            The frequency at which the `callback` function will be called. If not specified, the callback will be
            called at every step.
        cross_attention_kwargs (`dict`, *optional*):
            A kwargs dictionary that if specified is passed along to the `AttnProcessor` as defined under
            `self.processor` in
            [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
        adapter_conditioning_scale (`float` or `List[float]`, *optional*, defaults to 1.0):
            The outputs of the adapter are multiplied by `adapter_conditioning_scale` before they are added to the
            residual in the original unet. If multiple adapters are specified in init, you can set the
            corresponding scale as a list.
        clip_skip (`int`, *optional*):
            Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that
            the output of the pre-final layer will be used for computing the prompt embeddings.
    Examples:

    Returns:
        [`~pipelines.stable_diffusion.StableDiffusionAdapterPipelineOutput`] or `tuple`:
        [`~pipelines.stable_diffusion.StableDiffusionAdapterPipelineOutput`] if `return_dict` is True, otherwise a
        `tuple. When returning a tuple, the first element is a list with the generated images, and the second
        element is a list of `bool`s denoting whether the corresponding generated image likely represents
        "not-safe-for-work" (nsfw) content, according to the `safety_checker`.
    """
    height = height or self.default_sample_size * self.vae_scale_factor
    width = width or self.default_sample_size * self.vae_scale_factor

    # 0.1. lodge our custom attention 
    self.register_controller()
    self.register_attention_control(target_pose, source_pose, height, width)

    # align format for control guidance
    if not isinstance(control_guidance_start, list) and isinstance(control_guidance_end, list):
        control_guidance_start = len(control_guidance_end) * [control_guidance_start]
    elif not isinstance(control_guidance_end, list) and isinstance(control_guidance_start, list):
        control_guidance_end = len(control_guidance_start) * [control_guidance_end]
    elif not isinstance(control_guidance_start, list) and not isinstance(control_guidance_end, list):
        mult = len(self.controlnet.nets) if isinstance(self.controlnet, SD3MultiControlNetModel) else 1
        control_guidance_start, control_guidance_end = (
            mult * [control_guidance_start],
            mult * [control_guidance_end],
        )

    # 1. Check inputs. Raise error if not correct
    self.check_inputs(
        prompt,
        prompt_2,
        prompt_3,
        height,
        width,
        negative_prompt=negative_prompt,
        negative_prompt_2=negative_prompt_2,
        negative_prompt_3=negative_prompt_3,
        prompt_embeds=prompt_embeds,
        negative_prompt_embeds=negative_prompt_embeds,
        pooled_prompt_embeds=pooled_prompt_embeds,
        negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
        callback_on_step_end_tensor_inputs=callback_on_step_end_tensor_inputs,
        max_sequence_length=max_sequence_length,
    )

    self._guidance_scale = guidance_scale
    self._clip_skip = clip_skip
    self._joint_attention_kwargs = joint_attention_kwargs
    self._interrupt = False

    # 2. Define call parameters
    if prompt is not None and isinstance(prompt, str):
        batch_size = 1
    elif prompt is not None and isinstance(prompt, list):
        batch_size = len(prompt)
    else:
        batch_size = prompt_embeds.shape[0]

    device = self._execution_device
    dtype = self.transformer.dtype

    mask = []
    h = int(height/8)
    w = int(width/8)
    for i in range(h):
        for j in range(w):
            tmp = torch.zeros(batch_size*2, h, w)
            transform = PointTransform(source_pose, target_pose, h, w)
            tar = np.array([i, j, 1])
            src = transform @ tar
            if src[0] >= 0 and src[0] < h and src[1] >= 0 and src[1] < w:
                tmp[:,int(max(src[0]-h/16,0)):int(min(src[0]+h/16,h)),int(max(src[1]-w/16,0)):int(min(src[1]+w/16,w))] = 1
            else:
                tmp[:,:,:] = 1
            tmp = rearrange(tmp, 'b h w -> b 1 (h w)')
            mask.append(tmp)
    attention_mask = torch.cat(mask, dim=1).to("cuda")

    # 3. Encode input prompt
    # clip_image_embeds = self.image_encoder(
    #     clip_image.to(device, dtype=self.image_encoder.dtype)
    # ).last_hidden_state
    # clip_image_embeds = self.image_encoder.visual_projection(clip_image_embeds)
    # # image_prompt_embeds = clip_image_embeds[:,0:257,:]
    # image_prompt_embeds = clip_image_embeds.unsqueeze(1)
    # uncond_image_prompt_embeds = torch.zeros_like(image_prompt_embeds)

    # if self.do_classifier_free_guidance:
    #     image_prompt_embeds = torch.cat(
    #         [uncond_image_prompt_embeds, image_prompt_embeds], dim=0
    #     )
    (
        prompt_embeds,
        negative_prompt_embeds,
        pooled_prompt_embeds,
        negative_pooled_prompt_embeds,
    ) = self.encode_prompt(
        prompt=prompt,
        prompt_2=prompt_2,
        prompt_3=prompt_3,
        negative_prompt=negative_prompt,
        negative_prompt_2=negative_prompt_2,
        negative_prompt_3=negative_prompt_3,
        do_classifier_free_guidance=self.do_classifier_free_guidance,
        prompt_embeds=prompt_embeds,
        negative_prompt_embeds=negative_prompt_embeds,
        pooled_prompt_embeds=pooled_prompt_embeds,
        negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
        device=device,
        clip_skip=self.clip_skip,
        num_images_per_prompt=num_images_per_prompt,
        max_sequence_length=max_sequence_length,
    )
    # For classifier free guidance, we need to do two forward passes.
    # Here we concatenate the unconditional and text embeddings into a single batch
    # to avoid doing two forward passes
    # prompt_embeds = torch.cat([image_prompt_embeds[:,0:1,:], prompt_embeds[:,1:2,:]], dim=1)
    if self.do_classifier_free_guidance:
        prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
        pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds], dim=0)

    # 3. Prepare control image
    if isinstance(self.controlnet, SD3ControlNetModel):
        pose_map = self.prepare_image(
            image=pose_map,
            width=width,
            height=height,
            batch_size=batch_size * num_images_per_prompt,
            num_images_per_prompt=num_images_per_prompt,
            device=device,
            dtype=dtype,
            do_classifier_free_guidance=self.do_classifier_free_guidance,
            guess_mode=False,
        )
        height, width = pose_map.shape[-2:]

        pose_map = self.vae.encode(pose_map).latent_dist.sample()
        pose_map = pose_map * self.vae.config.scaling_factor

    elif isinstance(self.controlnet, SD3MultiControlNetModel):
        control_images = []

        for control_image_ in pose_map:
            control_image_ = self.prepare_image(
                image=control_image_,
                width=width,
                height=height,
                batch_size=batch_size * num_images_per_prompt,
                num_images_per_prompt=num_images_per_prompt,
                device=device,
                dtype=dtype,
                do_classifier_free_guidance=self.do_classifier_free_guidance,
                guess_mode=False,
            )

            control_image_ = self.vae.encode(control_image_).latent_dist.sample()
            control_image_ = control_image_ * self.vae.config.scaling_factor

            control_images.append(control_image_)

        pose_map = control_images
    else:
        assert False

    if controlnet_pooled_projections is None:
        controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds)
    else:
        controlnet_pooled_projections = controlnet_pooled_projections or pooled_prompt_embeds

    # 4. Prepare timesteps
    timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)
    num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
    self._num_timesteps = len(timesteps)

    # 5. Prepare latent variables
    num_channels_latents = self.transformer.config.in_channels
    latents = self.prepare_latents(
        batch_size * num_images_per_prompt,
        num_channels_latents,
        height,
        width,
        prompt_embeds.dtype,
        device,
        generator,
        latents,
    )
    print(latents.shape)

    # 6. Create tensor stating which controlnets to keep
    controlnet_keep = []
    for i in range(len(timesteps)):
        keeps = [
            1.0 - float(i / len(timesteps) < s or (i + 1) / len(timesteps) > e)
            for s, e in zip(control_guidance_start, control_guidance_end)
        ]
        controlnet_keep.append(keeps[0] if isinstance(self.controlnet, SD3ControlNetModel) else keeps)

    # 7. Denoising loop
    latents = latent_list[0].clone()
    with self.progress_bar(total=num_inference_steps) as progress_bar:
        for i, t in enumerate(timesteps):
            if self.interrupt:
                continue
                
            # tmp = torch.cat([latent_list[len(timesteps) - 1]] * 2) if self.do_classifier_free_guidance else latent_list[len(timesteps) - 1]
            # timestep = timesteps[len(timesteps) - 1].expand(tmp.shape[0])
            # print(prompt_embeds.shape)
            # noise_pred = self.transformer(
            #     hidden_states=tmp,
            #     timestep=timestep,
            #     encoder_hidden_states=prompt_embeds,
            #     pooled_projections=controlnet_pooled_projections,
            # )[0]
            
            # expand the latents if we are doing classifier free guidance
            latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents
            timestep = t.expand(latent_model_input.shape[0])

            if isinstance(controlnet_keep[i], list):
                cond_scale = [c * s for c, s in zip(controlnet_conditioning_scale, controlnet_keep[i])]
            else:
                controlnet_cond_scale = controlnet_conditioning_scale
                if isinstance(controlnet_cond_scale, list):
                    controlnet_cond_scale = controlnet_cond_scale[0]
                cond_scale = controlnet_cond_scale * controlnet_keep[i]

            # controlnet(s) inference
            control_block_samples = self.controlnet(
                hidden_states=latent_model_input.to(dtype=torch.float16),
                timestep=timestep,
                encoder_hidden_states=prompt_embeds.to(dtype=torch.float16),
                pooled_projections=controlnet_pooled_projections.to(dtype=torch.float16),
                joint_attention_kwargs=self.joint_attention_kwargs,
                controlnet_cond=pose_map.to(dtype=torch.float16),
                conditioning_scale=cond_scale,
                return_dict=False,
            )[0]

            noise_pred = self.transformer(
                hidden_states=latent_model_input,
                timestep=timestep,
                encoder_hidden_states=prompt_embeds,
                pooled_projections=pooled_prompt_embeds,
                block_controlnet_hidden_states=control_block_samples,
                joint_attention_kwargs=self.joint_attention_kwargs,
                return_dict=False,
            )[0]

            # perform guidance
            if self.do_classifier_free_guidance:
                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
                noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)

            # compute the previous noisy sample x_t -> x_t-1
            latents_dtype = latents.dtype
            latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0]

            if latents.dtype != latents_dtype:
                if torch.backends.mps.is_available():
                    # some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272
                    latents = latents.to(latents_dtype)

            if callback_on_step_end is not None:
                callback_kwargs = {}
                for k in callback_on_step_end_tensor_inputs:
                    callback_kwargs[k] = locals()[k]
                callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)

                latents = callback_outputs.pop("latents", latents)
                prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
                negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
                negative_pooled_prompt_embeds = callback_outputs.pop(
                    "negative_pooled_prompt_embeds", negative_pooled_prompt_embeds
                )

            # call the callback, if provided
            if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
                progress_bar.update()

            if XLA_AVAILABLE:
                xm.mark_step()

    if output_type == "latent":
        image = latents

    else:
        latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor

        image = self.vae.decode(latents, return_dict=False)[0]
        image = self.image_processor.postprocess(image, output_type=output_type)

    # Offload all models
    self.maybe_free_model_hooks()

    if not return_dict:
        return (image,)

    return StableDiffusion3PipelineOutput(images=image)

Logs

Traceback (most recent call last):
  File "/root/autodl-tmp/COMP5704_Pose_Driven/src/run.py", line 96, in <module>
    result = pipe(latent_list, "", None, None, pose_map, image, clip_image, target_coord, src_coord, height, width, num_inference_steps=50).images[0]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/COMP5704_Pose_Driven/src/pose_pipeline_controlnet.py", line 590, in __call__
    control_block_samples = self.controlnet(
                            ^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/diffusers/models/controlnet_sd3.py", line 344, in forward
    encoder_hidden_states, hidden_states = block(
                                           ^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/diffusers/models/attention.py", line 196, in forward
    encoder_hidden_states = encoder_hidden_states + context_attn_output
                            ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (333) must match the size of tensor b (4096) at non-singleton dimension 1
root@autodl-container-ab9142bcf9-f1549cc5:~/autodl-tmp/COMP5704_Pose_Driven/src#

System Info

absl-py==2.1.0
accelerate==0.33.0
anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work
anyio==4.4.0
archspec @ file:///croot/archspec_1709217642129/work
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.15.0
beautifulsoup4==4.12.3
bleach==6.1.0
boltons @ file:///work/perseverance-python-buildout/croot/boltons_1698851177130/work
Brotli @ file:///croot/brotli-split_1714483155106/work
certifi @ file:///croot/certifi_1707229174982/work/certifi
cffi @ file:///croot/cffi_1714483155441/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
comm==0.2.2
conda @ file:///croot/conda_1714403036266/work
conda-content-trust @ file:///croot/conda-content-trust_1714483159009/work
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1706733287605/work/src
conda-package-handling @ file:///croot/conda-package-handling_1714483155348/work
conda_package_streaming @ file:///work/perseverance-python-buildout/croot/conda-package-streaming_1698847176583/work
contourpy==1.2.1
controlnet-aux==0.0.9
cryptography @ file:///croot/cryptography_1714660666131/work
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
diffusers==0.30.2
distro @ file:///croot/distro_1714488253808/work
einops==0.8.0
executing==2.0.1
fastjsonschema==2.19.1
filelock==3.14.0
fonttools==4.53.0
fqdn==1.5.1
fsspec==2024.5.0
grpcio==1.64.0
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.24.5
idna @ file:///croot/idna_1714398848350/work
imageio==2.34.2
importlib_metadata==8.2.0
ipykernel==6.29.4
ipython==8.25.0
ipywidgets==8.1.3
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.4
json5==0.9.25
jsonpatch @ file:///croot/jsonpatch_1714483231291/work
jsonpointer==2.1
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.2
jupyter_core==5.7.2
jupyter_server==2.14.1
jupyter_server_terminals==0.5.3
jupyterlab==4.2.1
jupyterlab-language-pack-zh-CN==4.2.post1
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.2
jupyterlab_widgets==3.0.11
kiwisolver==1.4.5
lazy_loader==0.4
libmambapy @ file:///croot/mamba-split_1714483352891/work/libmambapy
Markdown==3.6
MarkupSafe==2.1.5
matplotlib==3.9.0
matplotlib-inline==0.1.7
menuinst @ file:///croot/menuinst_1714510563922/work
mistune==3.0.2
mpmath==1.3.0
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.3
notebook_shim==0.2.4
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvtx-cu12==12.1.105
opencv-python-headless==4.10.0.84
overrides==7.7.0
packaging @ file:///croot/packaging_1710807400464/work
pandas==2.2.2
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pillow==10.3.0
platformdirs @ file:///work/perseverance-python-buildout/croot/platformdirs_1701732573265/work
pluggy @ file:///work/perseverance-python-buildout/croot/pluggy_1698805497733/work
prometheus_client==0.20.0
prompt_toolkit==3.0.45
protobuf==5.27.0
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pycosat @ file:///croot/pycosat_1714510623388/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
Pygments==2.18.0
pyparsing==3.1.2
PySocks @ file:///work/perseverance-python-buildout/croot/pysocks_1698845478203/work
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.3
referencing==0.35.1
regex==2024.7.24
requests @ file:///croot/requests_1707355572290/work
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.1
ruamel.yaml @ file:///work/perseverance-python-buildout/croot/ruamel.yaml_1698863605521/work
safetensors==0.4.3
scikit-image==0.24.0
scipy==1.14.0
Send2Trash==1.8.3
sentencepiece==0.2.0
setuptools==69.5.1
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
stack-data==0.6.3
supervisor==4.2.5
sympy==1.12.1
tensorboard==2.16.2
tensorboard-data-server==0.7.2
terminado==0.18.1
tifffile==2024.7.24
timm==0.6.7
tinycss2==1.3.0
tokenizers==0.19.1
torch==2.4.0
torchaudio==2.4.0
torchvision==0.19.0
tornado==6.4
tqdm @ file:///croot/tqdm_1714567712644/work
traitlets==5.14.3
transformers==4.44.0
triton==3.0.0
truststore @ file:///work/perseverance-python-buildout/croot/truststore_1701735771625/work
types-python-dateutil==2.9.0.20240316
typing_extensions==4.12.1
tzdata==2024.1
uri-template==1.3.0
urllib3 @ file:///croot/urllib3_1707770551213/work
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.8.0
Werkzeug==3.0.3
wheel==0.43.0
widgetsnbextension==4.0.11
xformers==0.0.27.post2
zipp==3.19.2
zstandard @ file:///croot/zstandard_1714677652653/work

Who can help?

@yiyixuxu @DN6 @saya

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions