diff --git a/.github/workflows/mypy.yml b/.github/workflows/mypy.yml index fac0fa8aba3050..8810730e193bb6 100644 --- a/.github/workflows/mypy.yml +++ b/.github/workflows/mypy.yml @@ -26,6 +26,7 @@ on: - "Tools/build/update_file.py" - "Tools/build/verify_ensurepip_wheels.py" - "Tools/cases_generator/**" + - "Tools/check-c-api-docs/**" - "Tools/clinic/**" - "Tools/jit/**" - "Tools/peg_generator/**" @@ -58,6 +59,7 @@ jobs: "Lib/tomllib", "Tools/build", "Tools/cases_generator", + "Tools/check-c-api-docs", "Tools/clinic", "Tools/jit", "Tools/peg_generator", diff --git a/Doc/glossary.rst b/Doc/glossary.rst index 3a01df99c38caf..68035c2dfb57d4 100644 --- a/Doc/glossary.rst +++ b/Doc/glossary.rst @@ -134,6 +134,14 @@ Glossary iterator's :meth:`~object.__anext__` method until it raises a :exc:`StopAsyncIteration` exception. Introduced by :pep:`492`. + atomic operation + An operation that appears to execute as a single, indivisible step: no + other thread can observe it half-done, and its effects become visible all + at once. Python does not guarantee that high-level statements are atomic + (for example, ``x += 1`` performs multiple bytecode operations and is not + atomic). Atomicity is only guaranteed where explicitly documented. See + also :term:`race condition` and :term:`data race`. + attached thread state A :term:`thread state` that is active for the current OS thread. @@ -289,6 +297,22 @@ Glossary advanced mathematical feature. If you're not aware of a need for them, it's almost certain you can safely ignore them. + concurrency + The ability of a computer program to perform multiple tasks at the same + time. Python provides libraries for writing programs that make use of + different forms of concurrency. :mod:`asyncio` is a library for dealing + with asynchronous tasks and coroutines. :mod:`threading` provides + access to operating system threads and :mod:`multiprocessing` to + operating system processes. Multi-core processors can execute threads and + processes on different CPU cores at the same time (see + :term:`parallelism`). + + concurrent modification + When multiple threads modify shared data at the same time. Concurrent + modification without proper synchronization can cause + :term:`race conditions `, and might also trigger a + :term:`data race `, data corruption, or both. + context This term has different meanings depending on where and how it is used. Some common meanings: @@ -363,6 +387,28 @@ Glossary the :term:`cyclic garbage collector ` is to identify these groups and break the reference cycles so that the memory can be reclaimed. + data race + A situation where multiple threads access the same memory location + concurrently, at least one of the accesses is a write, and the threads + do not use any synchronization to control their access. Data races + lead to :term:`non-deterministic` behavior and can cause data corruption. + Proper use of :term:`locks ` and other :term:`synchronization primitives + ` prevents data races. Note that data races + can only happen in native code, but that :term:`native code` might be + exposed in a Python API. See also :term:`race condition` and + :term:`thread-safe`. + + deadlock + A situation in which two or more tasks (threads, processes, or coroutines) + wait indefinitely for each other to release resources or complete actions, + preventing any from making progress. For example, if thread A holds lock + 1 and waits for lock 2, while thread B holds lock 2 and waits for lock 1, + both threads will wait indefinitely. In Python this often arises from + acquiring multiple locks in conflicting orders or from circular + join/await dependencies. Deadlocks can be avoided by always acquiring + multiple :term:`locks ` in a consistent order. See also + :term:`lock` and :term:`reentrant`. + decorator A function returning another function, usually applied as a function transformation using the ``@wrapper`` syntax. Common examples for @@ -662,6 +708,14 @@ Glossary requires the GIL to be held in order to use it. This refers to having an :term:`attached thread state`. + global state + Data that is accessible throughout a program, such as module-level + variables, class variables, or C static variables in :term:`extension modules + `. In multi-threaded programs, global state shared + between threads typically requires synchronization to avoid + :term:`race conditions ` and + :term:`data races `. + hash-based pyc A bytecode cache file that uses the hash rather than the last-modified time of the corresponding source file to determine its validity. See @@ -706,7 +760,9 @@ Glossary tuples. Such an object cannot be altered. A new object has to be created if a different value has to be stored. They play an important role in places where a constant hash value is needed, for example as a key - in a dictionary. + in a dictionary. Immutable objects are inherently :term:`thread-safe` + because their state cannot be modified after creation, eliminating concerns + about improperly synchronized :term:`concurrent modification`. import path A list of locations (or :term:`path entries `) that are @@ -796,8 +852,9 @@ Glossary CPython does not consistently apply the requirement that an iterator define :meth:`~iterator.__iter__`. - And also please note that the free-threading CPython does not guarantee - the thread-safety of iterator operations. + And also please note that :term:`free-threaded ` + CPython does not guarantee :term:`thread-safe` behavior of iterator + operations. key function @@ -835,10 +892,11 @@ Glossary :keyword:`if` statements. In a multi-threaded environment, the LBYL approach can risk introducing a - race condition between "the looking" and "the leaping". For example, the - code, ``if key in mapping: return mapping[key]`` can fail if another + :term:`race condition` between "the looking" and "the leaping". For example, + the code, ``if key in mapping: return mapping[key]`` can fail if another thread removes *key* from *mapping* after the test, but before the lookup. - This issue can be solved with locks or by using the EAFP approach. + This issue can be solved with :term:`locks ` or by using the + :term:`EAFP` approach. See also :term:`thread-safe`. lexical analyzer @@ -857,6 +915,19 @@ Glossary clause is optional. If omitted, all elements in ``range(256)`` are processed. + lock + A :term:`synchronization primitive` that allows only one thread at a + time to access a shared resource. A thread must acquire a lock before + accessing the protected resource and release it afterward. If a thread + attempts to acquire a lock that is already held by another thread, it + will block until the lock becomes available. Python's :mod:`threading` + module provides :class:`~threading.Lock` (a basic lock) and + :class:`~threading.RLock` (a :term:`reentrant` lock). Locks are used + to prevent :term:`race conditions ` and ensure + :term:`thread-safe` access to shared data. Alternative design patterns + to locks exist such as queues, producer/consumer patterns, and + thread-local state. See also :term:`deadlock`, and :term:`reentrant`. + loader An object that loads a module. It must define the :meth:`!exec_module` and :meth:`!create_module` methods @@ -942,8 +1013,11 @@ Glossary See :term:`method resolution order`. mutable - Mutable objects can change their value but keep their :func:`id`. See - also :term:`immutable`. + An :term:`object` with state that is allowed to change during the course + of the program. In multi-threaded programs, mutable objects that are + shared between threads require careful synchronization to avoid + :term:`race conditions `. See also :term:`immutable`, + :term:`thread-safe`, and :term:`concurrent modification`. named tuple The term "named tuple" applies to any type or class that inherits from @@ -995,6 +1069,13 @@ Glossary See also :term:`module`. + native code + Code that is compiled to machine instructions and runs directly on the + processor, as opposed to code that is interpreted or runs in a virtual + machine. In the context of Python, native code typically refers to + C, C++, Rust or Fortran code in :term:`extension modules ` + that can be called from Python. See also :term:`extension module`. + nested scope The ability to refer to a variable in an enclosing definition. For instance, a function defined inside another function can refer to @@ -1011,6 +1092,15 @@ Glossary properties, :meth:`~object.__getattribute__`, class methods, and static methods. + non-deterministic + Behavior where the outcome of a program can vary between executions with + the same inputs. In multi-threaded programs, non-deterministic behavior + often results from :term:`race conditions ` where the + relative timing or interleaving of threads affects the result. + Proper synchronization using :term:`locks ` and other + :term:`synchronization primitives ` helps + ensure deterministic behavior. + object Any data with state (attributes or value) and defined behavior (methods). Also the ultimate base class of any :term:`new-style @@ -1041,6 +1131,16 @@ Glossary See also :term:`regular package` and :term:`namespace package`. + parallelism + Executing multiple operations at the same time (e.g. on multiple CPU + cores). In Python builds with the + :term:`global interpreter lock (GIL) `, only one + thread runs Python bytecode at a time, so taking advantage of multiple + CPU cores typically involves multiple processes + (e.g. :mod:`multiprocessing`) or native extensions that release the GIL. + In :term:`free-threaded ` Python, multiple Python threads + can run Python code simultaneously on different cores. + parameter A named entity in a :term:`function` (or method) definition that specifies an :term:`argument` (or in some cases, arguments) that the @@ -1215,6 +1315,18 @@ Glossary >>> email.mime.text.__name__ 'email.mime.text' + race condition + A condition of a program where the its behavior + depends on the relative timing or ordering of events, particularly in + multi-threaded programs. Race conditions can lead to + :term:`non-deterministic` behavior and bugs that are difficult to + reproduce. A :term:`data race` is a specific type of race condition + involving unsynchronized access to shared memory. The :term:`LBYL` + coding style is particularly susceptible to race conditions in + multi-threaded code. Using :term:`locks ` and other + :term:`synchronization primitives ` + helps prevent race conditions. + reference count The number of references to an object. When the reference count of an object drops to zero, it is deallocated. Some objects are @@ -1236,6 +1348,25 @@ Glossary See also :term:`namespace package`. + reentrant + A property of a function or :term:`lock` that allows it to be called or + acquired multiple times by the same thread without causing errors or a + :term:`deadlock`. + + For functions, reentrancy means the function can be safely called again + before a previous invocation has completed, which is important when + functions may be called recursively or from signal handlers. Thread-unsafe + functions may be :term:`non-deterministic` if they're called reentrantly in a + multithreaded program. + + For locks, Python's :class:`threading.RLock` (reentrant lock) is + reentrant, meaning a thread that already holds the lock can acquire it + again without blocking. In contrast, :class:`threading.Lock` is not + reentrant - attempting to acquire it twice from the same thread will cause + a deadlock. + + See also :term:`lock` and :term:`deadlock`. + REPL An acronym for the "read–eval–print loop", another name for the :term:`interactive` interpreter shell. @@ -1340,6 +1471,18 @@ Glossary See also :term:`borrowed reference`. + synchronization primitive + A basic building block for coordinating (synchronizing) the execution of + multiple threads to ensure :term:`thread-safe` access to shared resources. + Python's :mod:`threading` module provides several synchronization primitives + including :class:`~threading.Lock`, :class:`~threading.RLock`, + :class:`~threading.Semaphore`, :class:`~threading.Condition`, + :class:`~threading.Event`, and :class:`~threading.Barrier`. Additionally, + the :mod:`queue` module provides multi-producer, multi-consumer queues + that are especially useful in multithreaded programs. These + primitives help prevent :term:`race conditions ` and + coordinate thread execution. See also :term:`lock`. + t-string t-strings String literals prefixed with ``t`` or ``T`` are commonly called @@ -1392,6 +1535,19 @@ Glossary See :ref:`Thread State and the Global Interpreter Lock ` for more information. + thread-safe + A module, function, or class that behaves correctly when used by multiple + threads concurrently. Thread-safe code uses appropriate + :term:`synchronization primitives ` like + :term:`locks ` to protect shared mutable state, or is designed + to avoid shared mutable state entirely. In the + :term:`free-threaded ` build, built-in types like + :class:`dict`, :class:`list`, and :class:`set` use internal locking + to make many operations thread-safe, although thread safety is not + necessarily guaranteed. Code that is not thread-safe may experience + :term:`race conditions ` and :term:`data races ` + when used in multi-threaded programs. + token A small unit of source code, generated by the diff --git a/Lib/test/test_pdb.py b/Lib/test/test_pdb.py index f4c870036a7a20..5cba34ff8bad10 100644 --- a/Lib/test/test_pdb.py +++ b/Lib/test/test_pdb.py @@ -3563,10 +3563,22 @@ def _assert_find_function(self, file_content, func_name, expected): def _fd_dir_for_pipe_targets(self): """Return a directory exposing live file descriptors, if any.""" + return self._proc_fd_dir() or self._dev_fd_dir() + + def _proc_fd_dir(self): + """Return /proc-backed fd dir when it can be used for pipes.""" + # GH-142836: Opening /proc/self/fd entries for pipes raises EACCES on + # Solaris, so prefer other mechanisms there. + if sys.platform.startswith("sunos"): + return None + proc_fd = "/proc/self/fd" if os.path.isdir(proc_fd) and os.path.exists(os.path.join(proc_fd, '0')): return proc_fd + return None + def _dev_fd_dir(self): + """Return /dev-backed fd dir when usable.""" dev_fd = "/dev/fd" if os.path.isdir(dev_fd) and os.path.exists(os.path.join(dev_fd, '0')): if sys.platform.startswith("freebsd"): @@ -3576,7 +3588,6 @@ def _fd_dir_for_pipe_targets(self): except FileNotFoundError: return None return dev_fd - return None def test_find_function_empty_file(self): diff --git a/Lib/test/test_sqlite3/__init__.py b/Lib/test/test_sqlite3/__init__.py index 78a1e2078a5da0..145f3b80024829 100644 --- a/Lib/test/test_sqlite3/__init__.py +++ b/Lib/test/test_sqlite3/__init__.py @@ -6,9 +6,14 @@ import os import sqlite3 +# make sure only print once +_printed_version = False + # Implement the unittest "load tests" protocol. -def load_tests(*args): - if verbose: +def load_tests(loader, tests, pattern): + global _printed_version + if verbose and not _printed_version: print(f"test_sqlite3: testing with SQLite version {sqlite3.sqlite_version}") + _printed_version = True pkg_dir = os.path.dirname(__file__) - return load_package_tests(pkg_dir, *args) + return load_package_tests(pkg_dir, loader, tests, pattern) diff --git a/Lib/test/test_unittest/testmock/testthreadingmock.py b/Lib/test/test_unittest/testmock/testthreadingmock.py index 3603995b090a6c..dda4916434ec1b 100644 --- a/Lib/test/test_unittest/testmock/testthreadingmock.py +++ b/Lib/test/test_unittest/testmock/testthreadingmock.py @@ -219,5 +219,81 @@ def test_function(): self.assertEqual(m.call_count, LOOPS * THREADS) + def test_call_args_thread_safe(self): + m = ThreadingMock() + LOOPS = 100 + THREADS = 10 + def test_function(thread_id): + for i in range(LOOPS): + m(thread_id, i) + + oldswitchinterval = sys.getswitchinterval() + setswitchinterval(1e-6) + try: + threads = [ + threading.Thread(target=test_function, args=(thread_id,)) + for thread_id in range(THREADS) + ] + with threading_helper.start_threads(threads): + pass + finally: + sys.setswitchinterval(oldswitchinterval) + expected_calls = { + (thread_id, i) + for thread_id in range(THREADS) + for i in range(LOOPS) + } + self.assertSetEqual({call.args for call in m.call_args_list}, expected_calls) + + def test_method_calls_thread_safe(self): + m = ThreadingMock() + LOOPS = 100 + THREADS = 10 + def test_function(thread_id): + for i in range(LOOPS): + getattr(m, f"method_{thread_id}")(i) + + oldswitchinterval = sys.getswitchinterval() + setswitchinterval(1e-6) + try: + threads = [ + threading.Thread(target=test_function, args=(thread_id,)) + for thread_id in range(THREADS) + ] + with threading_helper.start_threads(threads): + pass + finally: + sys.setswitchinterval(oldswitchinterval) + for thread_id in range(THREADS): + self.assertEqual(getattr(m, f"method_{thread_id}").call_count, LOOPS) + self.assertEqual({call.args for call in getattr(m, f"method_{thread_id}").call_args_list}, + {(i,) for i in range(LOOPS)}) + + def test_mock_calls_thread_safe(self): + m = ThreadingMock() + LOOPS = 100 + THREADS = 10 + def test_function(thread_id): + for i in range(LOOPS): + m(thread_id, i) + + oldswitchinterval = sys.getswitchinterval() + setswitchinterval(1e-6) + try: + threads = [ + threading.Thread(target=test_function, args=(thread_id,)) + for thread_id in range(THREADS) + ] + with threading_helper.start_threads(threads): + pass + finally: + sys.setswitchinterval(oldswitchinterval) + expected_calls = { + (thread_id, i) + for thread_id in range(THREADS) + for i in range(LOOPS) + } + self.assertSetEqual({call.args for call in m.mock_calls}, expected_calls) + if __name__ == "__main__": unittest.main() diff --git a/Lib/test/test_zoneinfo/test_zoneinfo.py b/Lib/test/test_zoneinfo/test_zoneinfo.py index fad741e477b84a..8f3ca59c9ef5ed 100644 --- a/Lib/test/test_zoneinfo/test_zoneinfo.py +++ b/Lib/test/test_zoneinfo/test_zoneinfo.py @@ -1551,6 +1551,26 @@ def __eq__(self, other): except CustomError: pass + def test_weak_cache_descriptor_use_after_free(self): + class BombDescriptor: + def __get__(self, obj, owner): + return {} + + class EvilZoneInfo(self.klass): + pass + + # Must be set after the class creation. + EvilZoneInfo._weak_cache = BombDescriptor() + + key = "America/Los_Angeles" + zone1 = EvilZoneInfo(key) + self.assertEqual(str(zone1), key) + + EvilZoneInfo.clear_cache() + zone2 = EvilZoneInfo(key) + self.assertEqual(str(zone2), key) + self.assertIsNot(zone2, zone1) + class CZoneInfoCacheTest(ZoneInfoCacheTest): module = c_zoneinfo diff --git a/Misc/NEWS.d/next/Library/2025-12-16-14-49-19.gh-issue-142783.VPV1ig.rst b/Misc/NEWS.d/next/Library/2025-12-16-14-49-19.gh-issue-142783.VPV1ig.rst new file mode 100644 index 00000000000000..f014771ae9a146 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2025-12-16-14-49-19.gh-issue-142783.VPV1ig.rst @@ -0,0 +1 @@ +Fix zoneinfo use-after-free with descriptor _weak_cache. a descriptor as _weak_cache could cause crashes during object creation. The fix ensures proper reference counting for descriptor-provided objects. diff --git a/Misc/NEWS.d/next/Tests/2025-12-17-02-02-57.gh-issue-142836.mR-fvK.rst b/Misc/NEWS.d/next/Tests/2025-12-17-02-02-57.gh-issue-142836.mR-fvK.rst new file mode 100644 index 00000000000000..dd84ce9839ffa9 --- /dev/null +++ b/Misc/NEWS.d/next/Tests/2025-12-17-02-02-57.gh-issue-142836.mR-fvK.rst @@ -0,0 +1 @@ +Accommodated Solaris in ``test_pdb.test_script_target_anonymous_pipe``. diff --git a/Modules/_zoneinfo.c b/Modules/_zoneinfo.c index b99be073db5460..e07dfd19efa06d 100644 --- a/Modules/_zoneinfo.c +++ b/Modules/_zoneinfo.c @@ -292,16 +292,11 @@ static PyObject * get_weak_cache(zoneinfo_state *state, PyTypeObject *type) { if (type == state->ZoneInfoType) { + Py_INCREF(state->ZONEINFO_WEAK_CACHE); return state->ZONEINFO_WEAK_CACHE; } else { - PyObject *cache = - PyObject_GetAttrString((PyObject *)type, "_weak_cache"); - // We are assuming that the type lives at least as long as the function - // that calls get_weak_cache, and that it holds a reference to the - // cache, so we'll return a "borrowed reference". - Py_XDECREF(cache); - return cache; + return PyObject_GetAttrString((PyObject *)type, "_weak_cache"); } } @@ -328,6 +323,7 @@ zoneinfo_ZoneInfo_impl(PyTypeObject *type, PyObject *key) PyObject *weak_cache = get_weak_cache(state, type); instance = PyObject_CallMethod(weak_cache, "get", "O", key, Py_None); if (instance == NULL) { + Py_DECREF(weak_cache); return NULL; } @@ -335,6 +331,7 @@ zoneinfo_ZoneInfo_impl(PyTypeObject *type, PyObject *key) Py_DECREF(instance); PyObject *tmp = zoneinfo_new_instance(state, type, key); if (tmp == NULL) { + Py_DECREF(weak_cache); return NULL; } @@ -342,12 +339,14 @@ zoneinfo_ZoneInfo_impl(PyTypeObject *type, PyObject *key) PyObject_CallMethod(weak_cache, "setdefault", "OO", key, tmp); Py_DECREF(tmp); if (instance == NULL) { + Py_DECREF(weak_cache); return NULL; } ((PyZoneInfo_ZoneInfo *)instance)->source = SOURCE_CACHE; } update_strong_cache(state, type, key, instance); + Py_DECREF(weak_cache); return instance; } @@ -510,12 +509,14 @@ zoneinfo_ZoneInfo_clear_cache_impl(PyTypeObject *type, PyTypeObject *cls, PyObject *item = NULL; PyObject *pop = PyUnicode_FromString("pop"); if (pop == NULL) { + Py_DECREF(weak_cache); return NULL; } PyObject *iter = PyObject_GetIter(only_keys); if (iter == NULL) { Py_DECREF(pop); + Py_DECREF(weak_cache); return NULL; } @@ -540,6 +541,7 @@ zoneinfo_ZoneInfo_clear_cache_impl(PyTypeObject *type, PyTypeObject *cls, Py_DECREF(pop); } + Py_DECREF(weak_cache); if (PyErr_Occurred()) { return NULL; } diff --git a/Tools/check-c-api-docs/mypy.ini b/Tools/check-c-api-docs/mypy.ini new file mode 100644 index 00000000000000..f42eb2836e2fd8 --- /dev/null +++ b/Tools/check-c-api-docs/mypy.ini @@ -0,0 +1,19 @@ +[mypy] +files = Tools/check-c-api-docs/ +pretty = True + +# We need `_colorize` import: +mypy_path = $MYPY_CONFIG_FILE_DIR/../../Misc/mypy + +# Make sure Python can still be built +# using Python 3.13 for `PYTHON_FOR_REGEN`... +python_version = 3.13 + +# ...And be strict: +strict = True +extra_checks = True +enable_error_code = + ignore-without-code, + redundant-expr, + truthy-bool, + possibly-undefined,