-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
gh-143421: Move JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow.
#143536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-143421: Move JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow.
#143536
Conversation
JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow.JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow.
JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow.JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow.
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty close!
|
Updated! |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
|
Can we avoid unnecessary pointer chasing by embedding the array in the jit state struct, and allocate the entire struct together instead of having part of it embedded in the thread state, and part in a new allocation? ie. instead of typedef struct _PyJitOptState {
struct _JitOptContext *opt_context;
} _PyJitOptState;
typedef struct _PyJitTracerState {
_PyUOpInstruction *code_buffer;
_PyJitOptState opt_state;
_PyJitTracerInitialState initial_state;
_PyJitTracerPreviousState prev_state;
_PyJitTracerTranslatorState translator_state;
} _PyJitTracerState;we would have typedef struct _PyJitTracerState {
_PyJitTracerInitialState initial_state;
_PyJitTracerPreviousState prev_state;
_PyJitTracerTranslatorState translator_state;
_JitOptContext opt_context;
_PyUOpInstruction code_buffer[UOP_MAX_TRACE_LENGTH];
} _PyJitTracerState;Doing so would also decouple the thread state header from the JIT headers, as only a pointer to an opaque struct would need to be declared in the thread state header. I'm also concerned about the amount of allocation and freeing required to compile a small trace. Maybe allocate once and only free when the thread is destroyed? |
I can try implementing it this way, but I'd like to know if this should be done in a new PR/issue. Perhaps we could use benchmarks to determine potential performance issues from allocating |
Isnt that already being done? |
|
@cocolato I think we can apply Mark's changes, it means we would need no new allocations anymore as its all heap allocated with the _PyThreadStateImpl struct. |
|
I'm not sure if the current implementation of the new |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, thank you!
|
This caused a 100% slowdown in bench_thread_pool benchmark https://doesjitgobrrr.com/run/2026-01-08 We need to lazily allocate it and pointer chase, otherwise we're slowing down significantly the spawning of threads. |
…o per-thread heap allocation (pythonGH-143536)" This reverts commit aeb3403.
So should we revert to the first version?4e7a918 |
I will have a PR up. |
|
Sorry, I will run a local benchmark first before making important changes in the next time. |
No it's fine. That's the point of having a benchmark runner, to catch things like this! It's hard to predict what a change will affect, so that's the point of having benchark runners to save human time. |
|
PR up at #143597 |
unique reference trackingin Tier 2 for reference count optimizations #143414 (comment)