refactor: replace fa3 wrapper with original fa3 in attention backends registry #12851

badayvedat · 2025-12-17T00:27:58Z

What does this PR do?

The _wrapped_flash_attn_3 function unconditionally unpacks both out and lse from the return value:

out, lse, *_ = flash_attn_3_func(...)

However, it was not passing return_attn_probs=True to request the tuple return. Since Dao-AILab/flash-attention@203b9b3, flash_attn_func returns only out by default, causing:

 File "/root/flash-attention/diffusers/.venv/lib/python3.12/site-packages/torch/_compile.py", line 51, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/flash-attention/diffusers/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/flash-attention/diffusers/.venv/lib/python3.12/site-packages/torch/_library/custom_ops.py", line 367, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/flash-attention/diffusers/src/diffusers/models/attention_dispatch.py", line 643, in _wrapped_flash_attn_3
    out, lse, *_ = flash_attn_3_func(
    ^^^^^^^^^^^^
ValueError: not enough values to unpack (expected at least 2, got 1)

How does this pr fixes it

Adds return_attn_probs=True to the flash_attn_3_func call, consistent with how _flash_attention_3_hub handles.

Reproduction

# requirements.txt
git+github.com/huggingface/diffusers@5e48f466b9c0d257f2650e8feec378a0022f2402"
torch==2.7.1
transformers
accelerate
--extra-index-url=https://download.pytorch.org/whl/cu128

and bring your own flash attention build, to repro this i built it from source @ Dao-AILab/flash-attention@ac9b5f1

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
).to("cuda")

pipe.transformer.set_attention_backend("_flash_3")

# ValueError: not enough values to unpack (expected at least 2, got 1)
pipe("a photo of a cat", num_inference_steps=1)

Alternative

The wrapper seems to exist to support fa3 as custom op. However, fa3 now has native torch.compile support as of Dao-AILab/flash-attention@c7697bb. This might be making _wrapped_flash_attn_3 redundant, tho i dont really know if that is the only reason.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

General functionalities: @sayakpaul @yiyixuxu @DN6

sayakpaul · 2025-12-17T03:40:56Z

Thanks for your PR. Since torch.compile support has been merged, would you be interested in refactoring and cleaning up the FA3 backend in attention_dispatch.py?

badayvedat · 2025-12-18T00:00:22Z

Is there any downstream callers of this function that I also need to test?

sayakpaul

Thanks! Just a single question.

sayakpaul · 2025-12-18T03:05:23Z

src/diffusers/models/attention_dispatch.py

        raise


-# ===== torch op registrations =====


Do you think we should version-guard this to keep it backwards-compatible?

HuggingFaceDocBuilderDev · 2025-12-18T03:16:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fix: flash_attn_3_func return value unpacking in _wrapped_flash_attn_3

abe7d24

badayvedat changed the title ~~fix: flash_attn_3_func return value unpacking in _wrapped_flash_attn_3 w torch compile~~ fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile Dec 17, 2025

badayvedat marked this pull request as ready for review December 17, 2025 00:33

refactor: remove old wrapper

3de380d

badayvedat changed the title ~~fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile~~ refactor: replace fa3 wrapper with original fa3 in attention backends registry Dec 17, 2025

sayakpaul reviewed Dec 18, 2025

View reviewed changes

Merge branch 'main' into fix/flash-attn-3-return-attn-probs

4eb3478

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: replace fa3 wrapper with original fa3 in attention backends registry #12851

refactor: replace fa3 wrapper with original fa3 in attention backends registry #12851

badayvedat commented Dec 17, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Dec 17, 2025

Uh oh!

badayvedat commented Dec 18, 2025

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul Dec 18, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor: replace fa3 wrapper with original fa3 in attention backends registry #12851

Are you sure you want to change the base?

refactor: replace fa3 wrapper with original fa3 in attention backends registry #12851

Conversation

badayvedat commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How does this pr fixes it

Reproduction

Alternative

Before submitting

Who can review?

Uh oh!

sayakpaul commented Dec 17, 2025

Uh oh!

badayvedat commented Dec 18, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

badayvedat commented Dec 17, 2025 •

edited

Loading