Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Doc/library/asyncio-queue.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Queue
The queue can no longer grow.
Future calls to :meth:`~Queue.put` raise :exc:`QueueShutDown`.
Currently blocked callers of :meth:`~Queue.put` will be unblocked
and will raise :exc:`QueueShutDown` in the formerly blocked thread.
and will raise :exc:`QueueShutDown` in the formerly awaiting task.

If *immediate* is false (the default), the queue can be wound
down normally with :meth:`~Queue.get` calls to extract tasks
Expand Down
86 changes: 86 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ Summary -- Release highlights
<whatsnew315-utf8-default>`
* :pep:`782`: :ref:`A new PyBytesWriter C API to create a Python bytes object
<whatsnew315-pep782>`
* :ref:`The JIT compiler has been significantly upgraded <whatsnew315-jit>`
* :ref:`Improved error messages <whatsnew315-improved-error-messages>`


Expand Down Expand Up @@ -850,6 +851,91 @@ csv
(Contributed by Maurycy Pawłowski-Wieroński in :gh:`137628`.)


.. _whatsnew315-jit:

Upgraded JIT compiler
=====================

Results from the `pyperformance <https://github.com/python/pyperformance>`__
benchmark suite report
`3-4% <https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251214-3.15.0a2%2B-6cddf04-JIT/bm-20251214-vultr-x86_64-python-6cddf04344a1e8ca9df5-3.15.0a2%2B-6cddf04-vs-base.svg>`__
geometric mean performance improvement for the JIT over the standard CPython
interpreter built with all optimizations enabled. The speedups for JIT
builds versus no JIT builds range from roughly 20% slowdown to over
100% speedup (ignoring the ``unpack_sequence`` microbenchmark) on
x86-64 Linux and AArch64 macOS systems.

.. attention::
These results are not yet final.

The major upgrades to the JIT are:

* LLVM 21 build-time dependency
* New tracing frontend
* Basic register allocation in the JIT
* More JIT optimizations
* Better machine code generation

.. rubric:: LLVM 21 build-time dependency

The JIT compiler now uses LLVM 21 for build-time stencil generation. As
always, LLVM is only needed when building CPython with the JIT enabled;
end users running Python do not need LLVM installed. Instructions for
installing LLVM can be found in the `JIT compiler documentation
<https://github.com/python/cpython/blob/main/Tools/jit/README.md>`__
for all supported platforms.

(Contributed by Savannah Ostrowski in :gh:`140973`.)

.. rubric:: A new tracing frontend

The JIT compiler now supports significantly more bytecode operations and
control flow than in Python 3.14, enabling speedups on a wider variety of
code. For example, simple Python object creation is now understood by the
3.15 JIT compiler. Overloaded operations and generators are also partially
supported. This was made possible by an overhauled JIT tracing frontend
that records actual execution paths through code, rather than estimating
them as the previous implementation did.

(Contributed by Ken Jin in :gh:`139109`. Support for Windows added by
Mark Shannon in :gh:`141703`.)

.. rubric:: Basic register allocation in the JIT

A basic form of register allocation has been added to the JIT compiler's
optimizer. This allows the JIT compiler to avoid certain stack operations
altogether and instead operate on registers. This allows the JIT to produce
more efficient traces by avoiding reads and writes to memory.

(Contributed by Mark Shannon in :gh:`135379`.)

.. rubric:: More JIT optimizations

More `constant-propagation <https://en.wikipedia.org/wiki/Constant_folding>`__
is now performed. This means when the JIT compiler detects that certain user
code results in constants, the code can be simplified by the JIT.

(Contributed by Ken Jin and Savannah Ostrowski in :gh:`132732`.)

The JIT avoids :term:`reference count`\ s where possible. This generally
reduces the cost of most operations in Python.

(Contributed by Ken Jin, Donghee Na, Zheao Li, Savannah Ostrowski,
Noam Cohen, Tomas Roun, PuQing in :gh:`134584`.)

.. rubric:: Better machine code generation

The JIT compiler's machine code generator now produces better machine code
for x86-64 and AArch64 macOS and Linux targets. In general, users should
experience lower memory usage for generated machine code and more efficient
machine code versus the old JIT.

(Contributed by Brandt Bucher in :gh:`136528` and :gh:`136528`.
Implementation for AArch64 contributed by Mark Shannon in :gh:`139855`.
Additional optimizations for AArch64 contributed by Mark Shannon and
Diego Russo in :gh:`140683` and :gh:`142305`.)


Removed
=======

Expand Down
7 changes: 7 additions & 0 deletions Lib/profiling/sampling/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
"""

from .cli import main
from .errors import SamplingUnknownProcessError, SamplingModuleNotFoundError, SamplingScriptNotFoundError

def handle_permission_error():
"""Handle PermissionError by displaying appropriate error message."""
Expand All @@ -64,3 +65,9 @@ def handle_permission_error():
main()
except PermissionError:
handle_permission_error()
except SamplingUnknownProcessError as err:
print(f"Tachyon cannot find the process: {err}", file=sys.stderr)
sys.exit(1)
except (SamplingModuleNotFoundError, SamplingScriptNotFoundError) as err:
print(f"Tachyon cannot find the target: {err}", file=sys.stderr)
sys.exit(1)
9 changes: 6 additions & 3 deletions Lib/profiling/sampling/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
import time
from contextlib import nullcontext

from .sample import sample, sample_live
from .errors import SamplingUnknownProcessError, SamplingModuleNotFoundError, SamplingScriptNotFoundError
from .sample import sample, sample_live, _is_process_running
from .pstats_collector import PstatsCollector
from .stack_collector import CollapsedStackCollector, FlamegraphCollector
from .heatmap_collector import HeatmapCollector
Expand Down Expand Up @@ -743,6 +744,8 @@ def main():

def _handle_attach(args):
"""Handle the 'attach' command."""
if not _is_process_running(args.pid):
raise SamplingUnknownProcessError(args.pid)
# Check if live mode is requested
if args.live:
_handle_live_attach(args, args.pid)
Expand Down Expand Up @@ -792,13 +795,13 @@ def _handle_run(args):
added_cwd = True
try:
if importlib.util.find_spec(args.target) is None:
sys.exit(f"Error: Module not found: {args.target}")
raise SamplingModuleNotFoundError(args.target)
finally:
if added_cwd:
sys.path.remove(cwd)
else:
if not os.path.exists(args.target):
sys.exit(f"Error: Script not found: {args.target}")
raise SamplingScriptNotFoundError(args.target)

# Check if live mode is requested
if args.live:
Expand Down
19 changes: 19 additions & 0 deletions Lib/profiling/sampling/errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""Custom exceptions for the sampling profiler."""

class SamplingProfilerError(Exception):
"""Base exception for sampling profiler errors."""

class SamplingUnknownProcessError(SamplingProfilerError):
def __init__(self, pid):
self.pid = pid
super().__init__(f"Process with PID '{pid}' does not exist.")

class SamplingScriptNotFoundError(SamplingProfilerError):
def __init__(self, script_path):
self.script_path = script_path
super().__init__(f"Script '{script_path}' not found.")

class SamplingModuleNotFoundError(SamplingProfilerError):
def __init__(self, module_name):
self.module_name = module_name
super().__init__(f"Module '{module_name}' not found.")
68 changes: 40 additions & 28 deletions Lib/profiling/sampling/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,23 +34,29 @@ def __init__(self, pid, sample_interval_usec, all_threads, *, mode=PROFILING_MOD
self.all_threads = all_threads
self.mode = mode # Store mode for later use
self.collect_stats = collect_stats
try:
self.unwinder = self._new_unwinder(native, gc, opcodes, skip_non_matching_threads)
except RuntimeError as err:
raise SystemExit(err) from err
# Track sample intervals and total sample count
self.sample_intervals = deque(maxlen=100)
self.total_samples = 0
self.realtime_stats = False

def _new_unwinder(self, native, gc, opcodes, skip_non_matching_threads):
if _FREE_THREADED_BUILD:
self.unwinder = _remote_debugging.RemoteUnwinder(
self.pid, all_threads=self.all_threads, mode=mode, native=native, gc=gc,
unwinder = _remote_debugging.RemoteUnwinder(
self.pid, all_threads=self.all_threads, mode=self.mode, native=native, gc=gc,
opcodes=opcodes, skip_non_matching_threads=skip_non_matching_threads,
cache_frames=True, stats=collect_stats
cache_frames=True, stats=self.collect_stats
)
else:
only_active_threads = bool(self.all_threads)
self.unwinder = _remote_debugging.RemoteUnwinder(
self.pid, only_active_thread=only_active_threads, mode=mode, native=native, gc=gc,
unwinder = _remote_debugging.RemoteUnwinder(
self.pid, only_active_thread=bool(self.all_threads), mode=self.mode, native=native, gc=gc,
opcodes=opcodes, skip_non_matching_threads=skip_non_matching_threads,
cache_frames=True, stats=collect_stats
cache_frames=True, stats=self.collect_stats
)
# Track sample intervals and total sample count
self.sample_intervals = deque(maxlen=100)
self.total_samples = 0
self.realtime_stats = False
return unwinder

def sample(self, collector, duration_sec=10, *, async_aware=False):
sample_interval_sec = self.sample_interval_usec / 1_000_000
Expand Down Expand Up @@ -86,7 +92,7 @@ def sample(self, collector, duration_sec=10, *, async_aware=False):
collector.collect_failed_sample()
errors += 1
except Exception as e:
if not self._is_process_running():
if not _is_process_running(self.pid):
break
raise e from None

Expand Down Expand Up @@ -148,22 +154,6 @@ def sample(self, collector, duration_sec=10, *, async_aware=False):
f"({(expected_samples - num_samples) / expected_samples * 100:.2f}%)"
)

def _is_process_running(self):
if sys.platform == "linux" or sys.platform == "darwin":
try:
os.kill(self.pid, 0)
return True
except ProcessLookupError:
return False
elif sys.platform == "win32":
try:
_remote_debugging.RemoteUnwinder(self.pid)
except Exception:
return False
return True
else:
raise ValueError(f"Unsupported platform: {sys.platform}")

def _print_realtime_stats(self):
"""Print real-time sampling statistics."""
if len(self.sample_intervals) < 2:
Expand Down Expand Up @@ -279,6 +269,28 @@ def _print_unwinder_stats(self):
print(f" {ANSIColors.YELLOW}Stale cache invalidations: {stale_invalidations}{ANSIColors.RESET}")


def _is_process_running(pid):
if pid <= 0:
return False
if os.name == "posix":
try:
os.kill(pid, 0)
return True
except ProcessLookupError:
return False
except PermissionError:
# EPERM means process exists but we can't signal it
return True
elif sys.platform == "win32":
try:
_remote_debugging.RemoteUnwinder(pid)
except Exception:
return False
return True
else:
raise ValueError(f"Unsupported platform: {sys.platform}")


def sample(
pid,
collector,
Expand Down
13 changes: 13 additions & 0 deletions Lib/test/_test_atexit.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,19 @@ def func():
finally:
atexit.unregister(func)

def test_eq_unregister_clear(self):
# Issue #112127: callback's __eq__ may call unregister or _clear
class Evil:
def __eq__(self, other):
action(other)
return NotImplemented

for action in atexit.unregister, lambda o: atexit._clear():
with self.subTest(action=action):
atexit.register(lambda: None)
atexit.unregister(Evil())
atexit._clear()


if __name__ == "__main__":
unittest.main()
Loading
Loading