-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add per-interpreter storage for gil_safe_call_once_and_store
#5933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
58834ac to
a46973b
Compare
a46973b to
e741760
Compare
8e5bb6e to
d5b8813
Compare
include/pybind11/detail/internals.h
Outdated
| // FIXME: We cannot use `get_num_interpreters_seen() > 1` here to create a fast path for | ||
| // the multi-interpreter case. The singleton may be initialized by a subinterpreter not the | ||
| // main interpreter. | ||
| // | ||
| // For multi-interpreter support, the subinterpreters can be initialized concurrently, and | ||
| // the first time this function may not be called in the main interpreter. | ||
| // For example, a clean main interpreter that does not import any pybind11 module and then | ||
| // spawns multiple subinterpreters using `InterpreterPoolExecutor` that each imports a | ||
| // pybind11 module concurrently. | ||
| if (get_num_interpreters_seen() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @b-pass @rwgk for thoughts about this, see the code comment referenced for more details.
I test the patch in this PR in:
Most things works fine except test_import_in_subinterpreter_before_main:
run_in_subprocess(
"""
with contextlib.closing(interpreters.create()) as subinterpreter:
subinterpreter.exec('import optree')
import optree
"""
)If I remove the get_num_interpreters_seen() > 1 condition, my import test works but the cpptest in pybind11 CI breaks because internals_singleton_pp_ is never initialized. For instance, the memory address get_internals() should be a shared constant for differenet pybind11 modules in a single program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_num_interpreters_seen() > 1 check is meant to keep the "there are no subinterpreters" case (the most common case) as fast as possible.
Can you help me understand the problem that concurrent subinterpreter imports is causing?
I'm not understanding what exactly the problem is... doing concurrent imports of pybind11 modules each in its own subinterpreter should still result in each subinterpreter getting its own internals (as it should). The one that gets to this code first would populate the internals_singleton_pp_, but this does not have to be the main interpreter, and the next time get_pp is called from that first subinterprter it will take the other (> 1) code path. Since get_num_interpreters_seen() never decreases (it is "seen" not "currently alive"), once it goes up to 2 the inner code path will be used and what is stored in the internals_singleton_pp_ doesn't matter for the rest of the program.
What's the purpose of get_pp_for_main_interpreter() and the associated once flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me elaborate more about my failed test. I have a C++ singleton instance using gil_safe_call_once_and_store to ensure exactly one instance per-interpreter. With the patch in this PR, the call-once result is stored in the internals (*get_pp()), where the internals is stored in the interpreter's state dict.
MyClass &get_singleton() {
PYBIND11_CONSTINIT static py::gil_safe_call_once_and_store<MyClass> storage;
return storage
.call_once_and_store_result([]() -> MyClass {
MyClass instance{};
{
// initialize
...
}
return instance;
})
.get_stored();
}- OK -
test_import_in_subinterpreters_concurrently: import the module in multiple subinterpreters concurrently, without any import for a pybind11 module in the main interpreter. - OK -
test_import_in_subinterpreter_after_main: import the module in the main interpreter, then import in a subinterpreter. - FAIL -
test_import_in_subinterpreter_before_main: import the module in a subinterpreter (the main interpreter does not import any pybind11 module yet), then import the module in the main interpreter.The import succeeds in the subinterpreter, but the instance is ill-initialized in the main interpreter. If I remove the
get_num_interpreters_seen() > 1entirely or initialize the atomic counter for seen interpreters with2instead of0, my test passes.
What's the purpose of
get_pp_for_main_interpreter()and the associated once flag?
Just to ensure the singleton is initialized once. I'm not sure if the scoped GIL is still working when there are multiple interpreters running. I can revert the call_once approach back to scopded GIL approach.
3a6ec0e to
0830872
Compare
0830872 to
ac02a32
Compare
|
I have updated the approach to move the storage map as a separate The CI tests are looking good so far. But I found some concurrency bugs for this patch in my downstream PR metaopt/optree#245. I'm not sure if it is coming from I will do further investigation about these. Bug 1: Duplicate native enum registration. ImportError: pybind11::native_enum<...>("PyTreeKind") is already registered!Test Casedef test_import_in_subinterpreter_before_main():
# OK
check_script_in_subprocess(
textwrap.dedent(
"""
import contextlib
import gc
from concurrent import interpreters
subinterpreter = None
with contextlib.closing(interpreters.create()) as subinterpreter:
subinterpreter.exec('import optree')
import optree
del optree, subinterpreter
for _ in range(10):
gc.collect()
""",
).strip(),
output='',
rerun=NUM_FLAKY_RERUNS,
)
# FAIL
check_script_in_subprocess(
textwrap.dedent(
f"""
import contextlib
import gc
import random
from concurrent import interpreters
subinterpreter = subinterpreters = stack = None
with contextlib.ExitStack() as stack:
subinterpreters = [
stack.enter_context(contextlib.closing(interpreters.create()))
for _ in range({NUM_FUTURES})
]
random.shuffle(subinterpreters)
for subinterpreter in subinterpreters:
subinterpreter.exec('import optree')
import optree
del optree, subinterpreter, subinterpreters, stack
for _ in range(10):
gc.collect()
""",
).strip(),
output='',
rerun=NUM_FLAKY_RERUNS,
)
# FAIL
check_script_in_subprocess(
textwrap.dedent(
f"""
import contextlib
import gc
import random
from concurrent import interpreters
subinterpreter = subinterpreters = stack = None
with contextlib.ExitStack() as stack:
subinterpreters = [
stack.enter_context(contextlib.closing(interpreters.create()))
for _ in range({NUM_FUTURES})
]
random.shuffle(subinterpreters)
for subinterpreter in subinterpreters:
subinterpreter.exec('import optree')
import optree
del optree, subinterpreter, subinterpreters, stack
for _ in range(10):
gc.collect()
""",
).strip(),
output='',
rerun=NUM_FLAKY_RERUNS,
)Traceback__________________________________________________________________ test_import_in_subinterpreter_before_main ___________________________________________________________________
def test_import_in_subinterpreter_before_main():
check_script_in_subprocess(
textwrap.dedent(
"""
import contextlib
import gc
from concurrent import interpreters
subinterpreter = None
with contextlib.closing(interpreters.create()) as subinterpreter:
subinterpreter.exec('import optree')
import optree
del optree, subinterpreter
for _ in range(10):
gc.collect()
""",
).strip(),
output='',
rerun=NUM_FLAKY_RERUNS,
)
> check_script_in_subprocess(
textwrap.dedent(
f"""
import contextlib
import gc
import random
from concurrent import interpreters
subinterpreter = subinterpreters = stack = None
with contextlib.ExitStack() as stack:
subinterpreters = [
stack.enter_context(contextlib.closing(interpreters.create()))
for _ in range({NUM_FUTURES})
]
random.shuffle(subinterpreters)
for subinterpreter in subinterpreters:
subinterpreter.exec('import optree')
import optree
del optree, subinterpreter, subinterpreters, stack
for _ in range(10):
gc.collect()
""",
).strip(),
output='',
rerun=NUM_FLAKY_RERUNS,
)
concurrent/test_subinterpreters.py:267:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
script = "import contextlib\nimport gc\nimport random\nfrom concurrent import interpreters\n\nsubinterpreter = subinterpreters ...optree')\n\nimport optree\n\ndel optree, subinterpreter, subinterpreters, stack\nfor _ in range(10):\n gc.collect()"
def check_script_in_subprocess(script, /, *, output, env=None, cwd=TEST_ROOT, rerun=1):
result = ''
for _ in range(rerun):
try:
result = subprocess.check_output(
[sys.executable, '-Walways', '-Werror', '-c', script],
stderr=subprocess.STDOUT,
text=True,
encoding='utf-8',
cwd=cwd,
env={
key: value
for key, value in (env if env is not None else os.environ).items()
if not key.startswith(('PYTHON', 'PYTEST', 'COV_'))
},
)
except subprocess.CalledProcessError as ex:
> raise CalledProcessError(ex.returncode, ex.cmd, ex.output, ex.stderr) from None
E helpers.CalledProcessError: Command '['/home/PanXuehai/Projects/optree/venv/bin/python3', '-Walways', '-Werror', '-c', "import contextlib\nimport gc\nimport random\nfrom concurrent import interpreters\n\nsubinterpreter = subinterpreters = stack = None\nwith contextlib.ExitStack() as stack:\n subinterpreters = [\n stack.enter_context(contextlib.closing(interpreters.create()))\n for _ in range(16)\n ]\n random.shuffle(subinterpreters)\n for subinterpreter in subinterpreters:\n subinterpreter.exec('import optree')\n\nimport optree\n\ndel optree, subinterpreter, subinterpreters, stack\nfor _ in range(10):\n gc.collect()"]' returned non-zero exit status 1.
E Output:
E Traceback (most recent call last):
E File "<string>", line 16, in <module>
E import optree
E File "/home/PanXuehai/Projects/optree/optree/__init__.py", line 17, in <module>
E from optree import accessors, dataclasses, functools, integrations, pytree, treespec, typing
E File "/home/PanXuehai/Projects/optree/optree/accessors.py", line 25, in <module>
E import optree._C as _C
E ImportError: pybind11::native_enum<...>("PyTreeKind") is already registered!
_ = 0
cwd = PosixPath('/home/PanXuehai/Projects/optree/tests')
env = None
output = ''
rerun = 8
result = ''
script = "import contextlib\nimport gc\nimport random\nfrom concurrent import interpreters\n\nsubinterpreter = subinterpreters ...optree')\n\nimport optree\n\ndel optree, subinterpreter, subinterpreters, stack\nfor _ in range(10):\n gc.collect()"
helpers.py:187: CalledProcessErrorBug 2: Thread-safety issues for multiple-interpreters are ever created.
|
|
The test is still flaky on some platforms after removing I don't know why the |
b-pass
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these internals changes are a good idea. Which things are being flaky and on which python versions? I'd like to get to the bottom of those issues, but taking out big chunks of the internals management definitely makes me nervous!
|
I did some poking around in your new tests, and at least in the case of It's possible there's a bug with the "import into sub-interpreter first" case ... we can probably detect that case during module init and increment the Are there tests in here not directly related to |
The import test is in a completely new subprocess with a clean state. Modules built for other tests will not affect the correctness of the import tests for multiple-interpreters. |
I can do that. But I want to point out that I will comment out the newly added test in this PR and let this get merged first. Then open a new PR for concurrency issues for |
The existing tests do define a class ("A") and import the module into multiple subinterpreters at the same time. Do you have a test that shows this without using concurrent pools? The concurrency makes it a lot more difficult to debug, and if it doesn't work at all it should be easy to demonstrate...? |
|
@XuehaiPan I think (strongly) we should split out the Could you please let me know if you prefer doing that yourself, or me doing it? I believe with Cursor it'll be very straightforward. It'll probably do almost everything automatically, including merging the updated master into this PR (effectively a rebase, but I'd definitely do it with a regular merge commit, so that we have a coherent dev history here). |
Description
Fixes #5926
For
!defined(PYBIND11_HAS_SUBINTERPRETER_SUPPORT), the previous behavior is unchanged.For
defined(PYBIND11_HAS_SUBINTERPRETER_SUPPORT), the pybind11internalsis already a per-interpreter-dependent singleton stored in the interpreter's state dict. This PR adds a new mapto the interpreter-dependentfrom theinternalsgil_safe_call_once_and_store's address to the result, where thegil_safe_call_once_and_storeis a static object that has a fixed address.UPDATE: The storage map is moved to a separate
capsulein the per-interpreter state dict instead of putting it as a member ofinternals. Because the per-interpreterinternalsare leaked in the program, the storage map is never GC-ed on (sub-)interpreter shutdown.Suggested changelog entry:
gil_safe_call_once_and_storeto make it safe under multi-interpreters.📚 Documentation preview 📚: https://pybind11--5933.org.readthedocs.build/