Add per-interpreter storage for `gil_safe_call_once_and_store` #5933

XuehaiPan · 2025-12-13T17:28:30Z

Description

For !defined(PYBIND11_HAS_SUBINTERPRETER_SUPPORT), the previous behavior is unchanged.

For defined(PYBIND11_HAS_SUBINTERPRETER_SUPPORT), the pybind11 internals is already a per-interpreter-dependent singleton stored in the interpreter's state dict. This PR adds a new map ~~to the interpreter-dependent internals~~ from the gil_safe_call_once_and_store's address to the result, where the gil_safe_call_once_and_store is a static object that has a fixed address.

UPDATE: The storage map is moved to a separate capsule in the per-interpreter state dict instead of putting it as a member of internals. Because the per-interpreter internals are leaked in the program, the storage map is never GC-ed on (sub-)interpreter shutdown.

Suggested changelog entry:

Add per-interpreter storage for gil_safe_call_once_and_store to make it safe under multi-interpreters.

📚 Documentation preview 📚: https://pybind11--5933.org.readthedocs.build/

…e_result`

XuehaiPan · 2025-12-16T08:04:34Z

include/pybind11/detail/internals.h

+        // FIXME: We cannot use `get_num_interpreters_seen() > 1` here to create a fast path for
+        // the multi-interpreter case. The singleton may be initialized by a subinterpreter not the
+        // main interpreter.
+        //
+        // For multi-interpreter support, the subinterpreters can be initialized concurrently, and
+        // the first time this function may not be called in the main interpreter.
+        // For example, a clean main interpreter that does not import any pybind11 module and then
+        // spawns multiple subinterpreters using `InterpreterPoolExecutor` that each imports a
+        // pybind11 module concurrently.
        if (get_num_interpreters_seen() > 1) {


cc @b-pass @rwgk for thoughts about this, see the code comment referenced for more details.

I test the patch in this PR in:

feat: add subinterpreters support metaopt/optree#245

Most things works fine except test_import_in_subinterpreter_before_main:

run_in_subprocess( """ with contextlib.closing(interpreters.create()) as subinterpreter: subinterpreter.exec('import optree') import optree """ )

If I remove the get_num_interpreters_seen() > 1 condition, my import test works but the cpptest in pybind11 CI breaks because internals_singleton_pp_ is never initialized. For instance, the memory address get_internals() should be a shared constant for differenet pybind11 modules in a single program.

get_num_interpreters_seen() > 1 check is meant to keep the "there are no subinterpreters" case (the most common case) as fast as possible.

Can you help me understand the problem that concurrent subinterpreter imports is causing?

I'm not understanding what exactly the problem is... doing concurrent imports of pybind11 modules each in its own subinterpreter should still result in each subinterpreter getting its own internals (as it should). The one that gets to this code first would populate the internals_singleton_pp_, but this does not have to be the main interpreter, and the next time get_pp is called from that first subinterprter it will take the other (> 1) code path. Since get_num_interpreters_seen() never decreases (it is "seen" not "currently alive"), once it goes up to 2 the inner code path will be used and what is stored in the internals_singleton_pp_ doesn't matter for the rest of the program.

What's the purpose of get_pp_for_main_interpreter() and the associated once flag?

Let me elaborate more about my failed test. I have a C++ singleton instance using gil_safe_call_once_and_store to ensure exactly one instance per-interpreter. With the patch in this PR, the call-once result is stored in the internals (*get_pp()), where the internals is stored in the interpreter's state dict.

MyClass &get_singleton() { PYBIND11_CONSTINIT static py::gil_safe_call_once_and_store<MyClass> storage; return storage .call_once_and_store_result([]() -> MyClass { MyClass instance{}; { // initialize ... } return instance; }) .get_stored(); }

OK - test_import_in_subinterpreters_concurrently: import the module in multiple subinterpreters concurrently, without any import for a pybind11 module in the main interpreter.

OK - test_import_in_subinterpreter_after_main: import the module in the main interpreter, then import in a subinterpreter.

FAIL - test_import_in_subinterpreter_before_main: import the module in a subinterpreter (the main interpreter does not import any pybind11 module yet), then import the module in the main interpreter.

The import succeeds in the subinterpreter, but the instance is ill-initialized in the main interpreter. If I remove the get_num_interpreters_seen() > 1 entirely or initialize the atomic counter for seen interpreters with 2 instead of 0, my test passes.

What's the purpose of get_pp_for_main_interpreter() and the associated once flag?

Just to ensure the singleton is initialized once. I'm not sure if the scoped GIL is still working when there are multiple interpreters running. I can revert the call_once approach back to scopded GIL approach.

include/pybind11/detail/internals.h

XuehaiPan · 2025-12-17T05:58:01Z

I have updated the approach to move the storage map as a separate capsule in the per-interpreter state dict instead of putting it in the internals. The PR description is also updated.

The CI tests are looking good so far. But I found some concurrency bugs for this patch in my downstream PR metaopt/optree#245. I'm not sure if it is coming from pybind11 or the Python core itself.

feat: add subinterpreters support metaopt/optree#245

I will do further investigation about these.

Bug 1: Duplicate native enum registration.

ImportError: pybind11::native_enum<...>("PyTreeKind") is already registered!

Test Case

def test_import_in_subinterpreter_before_main():
    # OK
    check_script_in_subprocess(
        textwrap.dedent(
            """
            import contextlib
            import gc
            from concurrent import interpreters

            subinterpreter = None
            with contextlib.closing(interpreters.create()) as subinterpreter:
                subinterpreter.exec('import optree')

            import optree

            del optree, subinterpreter
            for _ in range(10):
                gc.collect()
            """,
        ).strip(),
        output='',
        rerun=NUM_FLAKY_RERUNS,
    )

    # FAIL
    check_script_in_subprocess(
        textwrap.dedent(
            f"""
            import contextlib
            import gc
            import random
            from concurrent import interpreters

            subinterpreter = subinterpreters = stack = None
            with contextlib.ExitStack() as stack:
                subinterpreters = [
                    stack.enter_context(contextlib.closing(interpreters.create()))
                    for _ in range({NUM_FUTURES})
                ]
                random.shuffle(subinterpreters)
                for subinterpreter in subinterpreters:
                    subinterpreter.exec('import optree')

            import optree

            del optree, subinterpreter, subinterpreters, stack
            for _ in range(10):
                gc.collect()
            """,
        ).strip(),
        output='',
        rerun=NUM_FLAKY_RERUNS,
    )

    # FAIL
    check_script_in_subprocess(
        textwrap.dedent(
            f"""
            import contextlib
            import gc
            import random
            from concurrent import interpreters

            subinterpreter = subinterpreters = stack = None
            with contextlib.ExitStack() as stack:
                subinterpreters = [
                    stack.enter_context(contextlib.closing(interpreters.create()))
                    for _ in range({NUM_FUTURES})
                ]
                random.shuffle(subinterpreters)
                for subinterpreter in subinterpreters:
                    subinterpreter.exec('import optree')

                import optree

            del optree, subinterpreter, subinterpreters, stack
            for _ in range(10):
                gc.collect()
            """,
        ).strip(),
        output='',
        rerun=NUM_FLAKY_RERUNS,
    )

Traceback

__________________________________________________________________ test_import_in_subinterpreter_before_main ___________________________________________________________________

    def test_import_in_subinterpreter_before_main():
        check_script_in_subprocess(
            textwrap.dedent(
                """
                import contextlib
                import gc
                from concurrent import interpreters
    
                subinterpreter = None
                with contextlib.closing(interpreters.create()) as subinterpreter:
                    subinterpreter.exec('import optree')
    
                import optree
    
                del optree, subinterpreter
                for _ in range(10):
                    gc.collect()
                """,
            ).strip(),
            output='',
            rerun=NUM_FLAKY_RERUNS,
        )
    
>       check_script_in_subprocess(
            textwrap.dedent(
                f"""
                import contextlib
                import gc
                import random
                from concurrent import interpreters
    
                subinterpreter = subinterpreters = stack = None
                with contextlib.ExitStack() as stack:
                    subinterpreters = [
                        stack.enter_context(contextlib.closing(interpreters.create()))
                        for _ in range({NUM_FUTURES})
                    ]
                    random.shuffle(subinterpreters)
                    for subinterpreter in subinterpreters:
                        subinterpreter.exec('import optree')
    
                import optree
    
                del optree, subinterpreter, subinterpreters, stack
                for _ in range(10):
                    gc.collect()
                """,
            ).strip(),
            output='',
            rerun=NUM_FLAKY_RERUNS,
        )


concurrent/test_subinterpreters.py:267: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

script = "import contextlib\nimport gc\nimport random\nfrom concurrent import interpreters\n\nsubinterpreter = subinterpreters ...optree')\n\nimport optree\n\ndel optree, subinterpreter, subinterpreters, stack\nfor _ in range(10):\n    gc.collect()"

    def check_script_in_subprocess(script, /, *, output, env=None, cwd=TEST_ROOT, rerun=1):
        result = ''
        for _ in range(rerun):
            try:
                result = subprocess.check_output(
                    [sys.executable, '-Walways', '-Werror', '-c', script],
                    stderr=subprocess.STDOUT,
                    text=True,
                    encoding='utf-8',
                    cwd=cwd,
                    env={
                        key: value
                        for key, value in (env if env is not None else os.environ).items()
                        if not key.startswith(('PYTHON', 'PYTEST', 'COV_'))
                    },
                )
            except subprocess.CalledProcessError as ex:
>               raise CalledProcessError(ex.returncode, ex.cmd, ex.output, ex.stderr) from None
E               helpers.CalledProcessError: Command '['/home/PanXuehai/Projects/optree/venv/bin/python3', '-Walways', '-Werror', '-c', "import contextlib\nimport gc\nimport random\nfrom concurrent import interpreters\n\nsubinterpreter = subinterpreters = stack = None\nwith contextlib.ExitStack() as stack:\n    subinterpreters = [\n        stack.enter_context(contextlib.closing(interpreters.create()))\n        for _ in range(16)\n    ]\n    random.shuffle(subinterpreters)\n    for subinterpreter in subinterpreters:\n        subinterpreter.exec('import optree')\n\nimport optree\n\ndel optree, subinterpreter, subinterpreters, stack\nfor _ in range(10):\n    gc.collect()"]' returned non-zero exit status 1.
E               Output:
E               Traceback (most recent call last):
E                 File "<string>", line 16, in <module>
E                   import optree
E                 File "/home/PanXuehai/Projects/optree/optree/__init__.py", line 17, in <module>
E                   from optree import accessors, dataclasses, functools, integrations, pytree, treespec, typing
E                 File "/home/PanXuehai/Projects/optree/optree/accessors.py", line 25, in <module>
E                   import optree._C as _C
E               ImportError: pybind11::native_enum<...>("PyTreeKind") is already registered!

_          = 0
cwd        = PosixPath('/home/PanXuehai/Projects/optree/tests')
env        = None
output     = ''
rerun      = 8
result     = ''
script     = "import contextlib\nimport gc\nimport random\nfrom concurrent import interpreters\n\nsubinterpreter = subinterpreters ...optree')\n\nimport optree\n\ndel optree, subinterpreter, subinterpreters, stack\nfor _ in range(10):\n    gc.collect()"

helpers.py:187: CalledProcessError

Bug 2: Thread-safety issues for multiple-interpreters are ever created.

Case 1: No multiple interpreters, only the main interpreter. Everything works fine.

# Run tests separately
make test PYTESTOPTS='-k "subinterpreter"'       # OK, no threading concurrency tests
make test PYTESTOPTS='-k "not subinterpreter"'   # OK, the program only has the main interpreter

Case 2: Create multiple interpreters, and destroy them before doing threading concurrency tests (only the main interpreter is alive during the concurrency tests).

The C++ objects are locked with critical sections, and the GIL is enabled during the test.

# There is only the main interpreter alive during threading concurrency tests
#
#     ```python
#     with contextlib.closing(concurrent.interpreters.create()) as subintrepreter:
#         # do something
#         ...
#
#     # at this point, only the main interpreter is alive
#
#     # threading tests
#     ...
#     ```

# Run tests alltogather
make test  # subinterpreter tests all passed, but threading concurrency tests produce incorrect results

include/pybind11/gil_safe_call_once.h

…ce-and-store

XuehaiPan · 2025-12-23T14:26:51Z

The test is still flaky on some platforms after removing get_num_interpreters_seen() > 1. But the number of failing tests goes down.

I don't know why the get_num_interpreters_seen() <= 1 path does not work and why the tests pass on Linux 3.14t and macOS 3.14/3.14t but still fail on Linux 3.14.

b-pass

I don't think these internals changes are a good idea. Which things are being flaky and on which python versions? I'd like to get to the bottom of those issues, but taking out big chunks of the internals management definitely makes me nervous!

b-pass · 2025-12-24T01:18:14Z

I did some poking around in your new tests, and at least in the case of MyEnum, that name is used in another test so I think some of the errors you're getting are caused by that. MyClass is also used elsewhere, but I think the MyClass errors might be a different problem with the tests.

It's possible there's a bug with the "import into sub-interpreter first" case ... we can probably detect that case during module init and increment the get_num_interpreters_seen to at least 2 if needed. But I can't tell from everything going on in this PR whether that one is still a problem or not....

Are there tests in here not directly related to gil_safe_call_once that could be split out into a separate PR?

XuehaiPan · 2025-12-24T03:58:04Z

I did some poking around in your new tests, and at least in the case of MyEnum, that name is used in another test so I think some of the errors you're getting are caused by that. MyClass is also used elsewhere, but I think the MyClass errors might be a different problem with the tests.

The import test is in a completely new subprocess with a clean state. Modules built for other tests will not affect the correctness of the import tests for multiple-interpreters.

XuehaiPan · 2025-12-24T04:01:37Z

Are there tests in here not directly related to gil_safe_call_once that could be split out into a separate PR?

I can do that. But I want to point out that PYBIND11_MODULE(..., py::multiple_interpreters::per_interpreter_gil()) is not usable at this moment due to duplicate registrations of C++ types.

I will comment out the newly added test in this PR and let this get merged first. Then open a new PR for concurrency issues for internals.

b-pass · 2025-12-24T04:11:30Z

But I want to point out that PYBIND11_MODULE(..., py::multiple_interpreters::per_interpreter_gil()) is not usable at this moment due to duplicate registrations of C++ types.

The existing tests do define a class ("A") and import the module into multiple subinterpreters at the same time.

Do you have a test that shows this without using concurrent pools? The concurrency makes it a lot more difficult to debug, and if it doesn't work at all it should be easy to demonstrate...?

tests/test_with_catch/test_subinterpreter.cpp

rwgk · 2025-12-24T14:34:44Z

@XuehaiPan I think (strongly) we should split out the tests/test_multiple_interpreters/ move as a separate, very conservative PR. I.e. the only theme should be the move, no tests added or modified. That'll make it easier to understand and debug the new code and tests.

Could you please let me know if you prefer doing that yourself, or me doing it? I believe with Cursor it'll be very straightforward. It'll probably do almost everything automatically, including merging the updated master into this PR (effectively a rebase, but I'd definitely do it with a regular merge commit, so that we have a coherent dev history here).

XuehaiPan added 2 commits December 13, 2025 20:08

Add new argument to `gil_safe_call_once_and_store::call_once_and_stor…

729654c

…e_result`

Add per-interpreter storage for gil_safe_call_once_and_store

d2b7605

XuehaiPan force-pushed the subinterp-call-once-and-store branch 3 times, most recently from 58834ac to a46973b Compare December 14, 2025 04:07

Make ~gil_safe_call_once_and_store a no-op

e741760

XuehaiPan force-pushed the subinterp-call-once-and-store branch from a46973b to e741760 Compare December 14, 2025 04:08

XuehaiPan added 2 commits December 14, 2025 12:15

Fix C++11 compatibility

5d1d678

Improve thread-safety and add default finalizer

0bac82d

XuehaiPan marked this pull request as ready for review December 14, 2025 06:47

XuehaiPan added 4 commits December 14, 2025 14:47

Merge branch 'master' into subinterp-call-once-and-store

2a4b118

Try fix thread-safety

be97110

Try fix thread-safety

3e77ce9

Add a warning comment

d5b8813

XuehaiPan force-pushed the subinterp-call-once-and-store branch from 8e5bb6e to d5b8813 Compare December 15, 2025 06:42

rwgk mentioned this pull request Dec 15, 2025

Pin 3.14t CI jobs to 3.14.0t to work around a regression in 3.14.1t+ #5934

Closed

XuehaiPan added 5 commits December 15, 2025 18:16

Simplify PYBIND11_INTERNALS_VERSION >= 12

f6d0f88

Try fix thread-safety

7d8339e

Try fix thread-safety

1920f43

Merge branch 'master' into subinterp-call-once-and-store

900bed6

Revert get_pp()

a6754ba

XuehaiPan commented Dec 16, 2025

View reviewed changes

Update comments

1aed3ab

b-pass reviewed Dec 17, 2025

View reviewed changes

include/pybind11/detail/internals.h Outdated Show resolved Hide resolved

XuehaiPan force-pushed the subinterp-call-once-and-store branch from 3a6ec0e to 0830872 Compare December 17, 2025 04:09

XuehaiPan added 3 commits December 17, 2025 12:11

Move call-once storage out of internals

b61e902

Revert internal version bump

b72cd41

Cleanup outdated comments

ac02a32

XuehaiPan force-pushed the subinterp-call-once-and-store branch from 0830872 to ac02a32 Compare December 17, 2025 04:11

Test even harder

99a095d

oremanj approved these changes Dec 22, 2025

View reviewed changes

include/pybind11/gil_safe_call_once.h Outdated Show resolved Hide resolved

include/pybind11/gil_safe_call_once.h Outdated Show resolved Hide resolved

include/pybind11/gil_safe_call_once.h Outdated Show resolved Hide resolved

XuehaiPan added 12 commits December 23, 2025 11:24

Reorg code to avoid duplicates

7daecd7

Fix unique_ptr::reset -> unique_ptr::release

cd950dc

Merge remote-tracking branch 'upstream/master' into subinterp-call-on…

db22bb4

…ce-and-store

Extract reusable functions

96d2eaf

Fix indentation

0efd867

Appease warnings for MSVC

2e2b7fe

Appease warnings for MSVC

c07db4e

Appease warnings for MSVC

8d9a0bd

Merge remote-tracking branch 'upstream/master' into subinterp-call-on…

3886ac9

…ce-and-store

Try fix concurrency by not using get_num_interpreters_seen() > 1

0556fc6

Try fix tests

fd29471

Make Python path handling more robust

25909b0

Update comments and assertion messages

c5d8738

b-pass requested changes Dec 23, 2025

View reviewed changes

Revert changes according to code review

013720c

Disable flaky tests

fc01705

XuehaiPan requested review from b-pass and rwgk December 24, 2025 04:06

b-pass approved these changes Dec 24, 2025

View reviewed changes

tests/test_with_catch/test_subinterpreter.cpp Outdated Show resolved Hide resolved

Use @pytest.mark.xfail rather than pytest.skip

8c0b7bc

XuehaiPan added 2 commits December 24, 2025 23:42

Retrigger CI

c6a0b7b

Retrigger CI

fab140e

Add per-interpreter storage for gil_safe_call_once_and_store #5933

Are you sure you want to change the base?

Add per-interpreter storage for gil_safe_call_once_and_store #5933

Conversation

XuehaiPan commented Dec 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Suggested changelog entry:

Uh oh!

XuehaiPan Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

b-pass Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

XuehaiPan Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

XuehaiPan commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XuehaiPan commented Dec 23, 2025

Uh oh!

b-pass left a comment

Choose a reason for hiding this comment

Uh oh!

b-pass commented Dec 24, 2025

Uh oh!

XuehaiPan commented Dec 24, 2025

Uh oh!

XuehaiPan commented Dec 24, 2025

Uh oh!

b-pass commented Dec 24, 2025

Uh oh!

Uh oh!

rwgk commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add per-interpreter storage for `gil_safe_call_once_and_store` #5933

Add per-interpreter storage for `gil_safe_call_once_and_store` #5933

XuehaiPan commented Dec 13, 2025 •

edited by github-actions bot

Loading

XuehaiPan Dec 16, 2025 •

edited

Loading

XuehaiPan Dec 17, 2025 •

edited

Loading