Skip to content

Conversation

@yoney
Copy link
Contributor

@yoney yoney commented Dec 8, 2025

Makes the zlib module thread-safe free-threading build. Even though operations are protected by locks, attributes exposed via PyMemberDef (eof, needs_input, unused_data, unconsumed_tail) should still be stored atomically within locked sections, since they can be read without acquiring the lock.

cc: @mpage @colesbury

@yoney yoney marked this pull request as ready for review December 8, 2025 22:57
@colesbury colesbury requested review from colesbury and mpage December 9, 2025 18:01
return -1;
}
Py_SETREF(self->unused_data, new_unused_data);
_PyObject_XSetRefDelayed(&self->unused_data, new_unused_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid using _PyObject_XSetRefDelayed for these attributes and use a lock instead (maybe self->mutex if possible).

_PyObject_XSetRefDelayed is useful when data is read-mostly, but I don't think that's the case for unused_data or unconsumed_tail. I'm also concerned that _PyObject_XSetRefDelayed() will keep potentially delay freeing large amounts of memory.

Copy link
Contributor Author

@yoney yoney Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should avoid using _PyObject_XSetRefDelayed

I think we have a situation similar to the one below, and if we want to avoid using _PyObject_XSetRefDelayed() in Thread A, I might move unused_data and unconsumed_tail from PyMemberDef to PyGetSetDef and lock/unlock self->mutex in Thread B. Is there an easier or better way to do this?

Thread A                                         Thread B (read via PyMemberDef)
═══════════════════════════════════════════════|═══════════════════════════════════════════
[LOCK self->mutex]                             |            
...                                            |
Py_SETREF(self->unused_data, new_unused_data); | FT_ATOMIC_LOAD_RELAXED(self->unused_data)
...                                            |             
[UNLOCK self->mutex]                           | 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused_data is almost entirely read-only. It's only read (usually once) after decompression has completed, and written once after flush() is called.

unconsumed_tail on the other hand is written to by the decompressor pretty much every decompress() call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might move unused_data and unconsumed_tail from PyMemberDef to PyGetSetDef and lock/unlock self->mutex in Thread B. Is there an easier or better way to do this?

This makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants