-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-116738: Make lzma module thread-safe #142947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Make the attributes in lzma module thread-safe on the free-threading build. Attributes (check, eof, needs_input, unused_data) are now stored atomically or accessed via mutex-protected getters.
| if (result == NULL) { | ||
| PyErr_SetString(PyExc_AttributeError, "unused_data"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused_data should be b"" if not set, so I don't think this check is needed. It's initialized in __new__ https://github.com/python/cpython/blob/6a4f10325d58deb1906b39d68dc8e84f4c2bf5a4/Modules/_lzmamodule.c#L1239C5-L1239C76
I think we should probably initialize unused_data in bz2 in __new__ as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other possibility would be to standardize on the implementation in compression.zstd https://github.com/python/cpython/blob/main/Modules/_zstd/decompressor.c#L583-L611
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @emmatyping for pointing this out.
I actually wanted to assert(self->unused_data != NULL), but I was a little bit worried about the code block below. I think PyBytes_FromStringAndSize() failure is very unlikely in L1044, and I believe it could happen only with allocation errors, but this was the main reason. With Py_T_OBJECT_EX in PyMemberDef, reading a NULL
raises AttributeError, so I was trying to keep the current behaviour.
Lines 1040 to 1049 in 08bc03f
| if (d->eof) { | |
| d->needs_input = 0; | |
| if (lzs->avail_in > 0) { | |
| Py_XSETREF(d->unused_data, | |
| PyBytes_FromStringAndSize((char *)lzs->next_in, lzs->avail_in)); | |
| if (d->unused_data == NULL) { | |
| goto error; | |
| } | |
| } | |
| } |
So, how do you think we should change it?
Make the attributes in
lzmamodule thread-safe on the free-threading build. Attributes (check,eof,needs_input,unused_data) are now stored atomically or accessed via mutex-protected getters.Even though the operations are protected by locks, the attributes exposed via
PyMemberDef (check, eof, needs_input, unused_data)should still be updated atomically while holding the lock, or accessed via mutex‑protected getters, because they can be read without acquiring the lock.This issue was not addressed in #140711. A small addition that repeatedly reads these attributes shows the issue under tsan build.
cc: @mpage @colesbury