Skip to content

Conversation

@yoney
Copy link
Contributor

@yoney yoney commented Dec 18, 2025

Make the attributes in lzma module thread-safe on the free-threading build. Attributes (check, eof, needs_input, unused_data) are now stored atomically or accessed via mutex-protected getters.

Even though the operations are protected by locks, the attributes exposed via PyMemberDef (check, eof, needs_input, unused_data) should still be updated atomically while holding the lock, or accessed via mutex‑protected getters, because they can be read without acquiring the lock.

This issue was not addressed in #140711. A small addition that repeatedly reads these attributes shows the issue under tsan build.

WARNING: ThreadSanitizer: data race (pid=2367860)
  Atomic read of size 1 at 0x7fdbc9506ee0 by thread T19:
    #0 _Py_atomic_load_char_relaxed /workspace/ft_lzma_tsan/./Include/cpython/pyatomic_gcc.h:311 (python+0x99e392) (BuildId: ce943d12d16f92ec125f1db1856a1617d3d75854)
    #1 PyMember_GetOne /workspace/ft_lzma_tsan/Python/structmember.c:37 (python+0x99e392)
    #2 member_get /workspace/ft_lzma_tsan/Objects/descrobject.c:180 (python+0x60f4d5) (BuildId: ce943d12d16f92ec125f1db1856a1617d3d75854)
...
...
...
Previous write of size 1 at 0x7fdbc9506ee0 by thread T11:
    #0 decompress /workspace/ft_lzma_tsan/./Modules/_lzmamodule.c:1067 (_lzma.cpython-315td-x86_64-linux-gnu.so+0x7256) (BuildId: 978a710f12e938f2fbc3233c01c40cf693105ef1)
    #1 _lzma_LZMADecompressor_decompress_impl /workspace/ft_lzma_tsan/./Modules/_lzmamodule.c:1139 (_lzma.cpython-315td-x86_64-linux-gnu.so+0x7256)
    #2 _lzma_LZMADecompressor_decompress /workspace/ft_lzma_tsan/./Modules/clinic/_lzmamodule.c.h:157 (_lzma.cpython-315td-x86_64-linux-gnu.so+0x7256)
...
...
...
SUMMARY: ThreadSanitizer: data race /workspace/ft_lzma_tsan/./Modules/_lzmamodule.c:1043 in decompress

cc: @mpage @colesbury

Make the attributes in lzma module thread-safe on the free-threading build.
Attributes (check, eof, needs_input, unused_data) are now stored atomically
or accessed via mutex-protected getters.
Comment on lines +1325 to +1327
if (result == NULL) {
PyErr_SetString(PyExc_AttributeError, "unused_data");
}
Copy link
Member

@emmatyping emmatyping Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused_data should be b"" if not set, so I don't think this check is needed. It's initialized in __new__ https://github.com/python/cpython/blob/6a4f10325d58deb1906b39d68dc8e84f4c2bf5a4/Modules/_lzmamodule.c#L1239C5-L1239C76

I think we should probably initialize unused_data in bz2 in __new__ as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other possibility would be to standardize on the implementation in compression.zstd https://github.com/python/cpython/blob/main/Modules/_zstd/decompressor.c#L583-L611

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @emmatyping for pointing this out.

I actually wanted to assert(self->unused_data != NULL), but I was a little bit worried about the code block below. I think PyBytes_FromStringAndSize() failure is very unlikely in L1044, and I believe it could happen only with allocation errors, but this was the main reason. With Py_T_OBJECT_EX in PyMemberDef, reading a NULL
raises AttributeError, so I was trying to keep the current behaviour.

cpython/Modules/_lzmamodule.c

Lines 1040 to 1049 in 08bc03f

if (d->eof) {
d->needs_input = 0;
if (lzs->avail_in > 0) {
Py_XSETREF(d->unused_data,
PyBytes_FromStringAndSize((char *)lzs->next_in, lzs->avail_in));
if (d->unused_data == NULL) {
goto error;
}
}
}

So, how do you think we should change it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants