Skip to content

Conversation

@cocolato
Copy link
Contributor

@cocolato cocolato commented Jan 8, 2026

@cocolato cocolato changed the title Move JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow. gh143421: Move JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow. Jan 8, 2026
@cocolato cocolato changed the title gh143421: Move JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow. gh-143421: Move JitOptContext from stack allocation to per-thread heap allocation to avoid stack overflow. Jan 8, 2026
Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty close!

@cocolato
Copy link
Contributor Author

cocolato commented Jan 8, 2026

Updated!

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@markshannon
Copy link
Member

Can we avoid unnecessary pointer chasing by embedding the array in the jit state struct, and allocate the entire struct together instead of having part of it embedded in the thread state, and part in a new allocation?

ie. instead of

typedef struct _PyJitOptState {
    struct _JitOptContext *opt_context;
} _PyJitOptState;

typedef struct _PyJitTracerState {
    _PyUOpInstruction *code_buffer;
    _PyJitOptState opt_state;
    _PyJitTracerInitialState initial_state;
    _PyJitTracerPreviousState prev_state;
    _PyJitTracerTranslatorState translator_state;
} _PyJitTracerState;

we would have

typedef struct _PyJitTracerState {
    _PyJitTracerInitialState initial_state;
    _PyJitTracerPreviousState prev_state;
    _PyJitTracerTranslatorState translator_state;
     _JitOptContext opt_context;
    _PyUOpInstruction code_buffer[UOP_MAX_TRACE_LENGTH];
} _PyJitTracerState;

Doing so would also decouple the thread state header from the JIT headers, as only a pointer to an opaque struct would need to be declared in the thread state header.

I'm also concerned about the amount of allocation and freeing required to compile a small trace. Maybe allocate once and only free when the thread is destroyed?

@cocolato
Copy link
Contributor Author

cocolato commented Jan 8, 2026

we would have

typedef struct _PyJitTracerState {
    _PyJitTracerInitialState initial_state;
    _PyJitTracerPreviousState prev_state;
    _PyJitTracerTranslatorState translator_state;
     _JitOptContext opt_context;
    _PyUOpInstruction code_buffer[UOP_MAX_TRACE_LENGTH];
} _PyJitTracerSt

I can try implementing it this way, but I'd like to know if this should be done in a new PR/issue. Perhaps we could use benchmarks to determine potential performance issues from allocating _PyJitTracerState once.

@Fidget-Spinner
Copy link
Member

I'm also concerned about the amount of allocation and freeing required to compile a small trace. Maybe allocate once and only free when the thread is destroyed?

Isnt that already being done?

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jan 8, 2026

@cocolato I think we can apply Mark's changes, it means we would need no new allocations anymore as its all heap allocated with the _PyThreadStateImpl struct.

@cocolato cocolato closed this Jan 8, 2026
@cocolato cocolato reopened this Jan 8, 2026
@cocolato
Copy link
Contributor Author

cocolato commented Jan 8, 2026

I'm not sure if the current implementation of the new pycore_optimizer_types.h is correct. I'd appreciate any suggestions for improvements.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, thank you!

@Fidget-Spinner Fidget-Spinner merged commit aeb3403 into python:main Jan 8, 2026
62 checks passed
@Fidget-Spinner
Copy link
Member

@markshannon

This caused a 100% slowdown in bench_thread_pool benchmark https://doesjitgobrrr.com/run/2026-01-08

We need to lazily allocate it and pointer chase, otherwise we're slowing down significantly the spawning of threads.

Fidget-Spinner added a commit to Fidget-Spinner/cpython that referenced this pull request Jan 9, 2026
…o per-thread heap allocation (pythonGH-143536)"

This reverts commit aeb3403.
@cocolato
Copy link
Contributor Author

cocolato commented Jan 9, 2026

@markshannon

This caused a 100% slowdown in bench_thread_pool benchmark https://doesjitgobrrr.com/run/2026-01-08

We need to lazily allocate it and pointer chase, otherwise we're slowing down significantly the spawning of threads.

So should we revert to the first version?4e7a918

@Fidget-Spinner
Copy link
Member

So should we revert to the first version?4e7a918

I will have a PR up.

@cocolato
Copy link
Contributor Author

cocolato commented Jan 9, 2026

Sorry, I will run a local benchmark first before making important changes in the next time.

@Fidget-Spinner
Copy link
Member

Sorry, I will run a local benchmark first before making important changes in the next time.

No it's fine. That's the point of having a benchmark runner, to catch things like this! It's hard to predict what a change will affect, so that's the point of having benchark runners to save human time.

@Fidget-Spinner
Copy link
Member

PR up at #143597

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants