-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-116738: Make zlib module thread-safe #142432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return -1; | ||
| } | ||
| Py_SETREF(self->unused_data, new_unused_data); | ||
| _PyObject_XSetRefDelayed(&self->unused_data, new_unused_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid using _PyObject_XSetRefDelayed for these attributes and use a lock instead (maybe self->mutex if possible).
_PyObject_XSetRefDelayed is useful when data is read-mostly, but I don't think that's the case for unused_data or unconsumed_tail. I'm also concerned that _PyObject_XSetRefDelayed() will keep potentially delay freeing large amounts of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should avoid using
_PyObject_XSetRefDelayed
I think we have a situation similar to the one below, and if we want to avoid using _PyObject_XSetRefDelayed() in Thread A, I might move unused_data and unconsumed_tail from PyMemberDef to PyGetSetDef and lock/unlock self->mutex in Thread B. Is there an easier or better way to do this?
Thread A Thread B (read via PyMemberDef)
═══════════════════════════════════════════════|═══════════════════════════════════════════
[LOCK self->mutex] |
... |
Py_SETREF(self->unused_data, new_unused_data); | FT_ATOMIC_LOAD_RELAXED(self->unused_data)
... |
[UNLOCK self->mutex] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused_data is almost entirely read-only. It's only read (usually once) after decompression has completed, and written once after flush() is called.
unconsumed_tail on the other hand is written to by the decompressor pretty much every decompress() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might move unused_data and unconsumed_tail from PyMemberDef to PyGetSetDef and lock/unlock self->mutex in Thread B. Is there an easier or better way to do this?
This makes sense to me.
Makes the
zlibmodule thread-safe free-threading build. Even though operations are protected by locks, attributes exposed viaPyMemberDef (eof, needs_input, unused_data, unconsumed_tail)should still be stored atomically within locked sections, since they can be read without acquiring the lock.PyThread_type_lockwithPyMutex(similar to gh-116738: Use PyMutex for bz2 module #140555 and gh-116738: Use PyMutex for lzma module #140711)ENTER_ZLIB/LEAVE_ZLIBmacros in favor of directPyMutex_Lock/Unlock(similar to gh-116738: Use PyMutex for bz2 module #140555 and gh-116738: Use PyMutex for lzma module #140711)FT_ATOMIC_STORE_CHAR_RELAXEDforeofandneeds_inputflag updates inside locked sections (attributes exposed viaPyMemberDef)_PyObject_XSetRefDelayedforunused_data/unconsumed_tailinside locked sectionscc: @mpage @colesbury