[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593
[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593cyx-6 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces 'PyObjectTying,' a mechanism to link the lifetime of Python wrappers to their underlying C++ FFI objects. By prepending a PyCustomizeAllocHeader to object allocations, the implementation enables stable object identity and address preservation across wrapper finalization cycles. The changes include modifications to the C++ SimpleObjAllocator, Cython bindings to utilize tp_finalize, and updates to the reflection system. Feedback from the review highlights a critical thread-safety issue in the callback used to free cached wrapper memory, which lacks GIL protection, and a potential memory leak when installing handles if a previous phantom wrapper is not correctly released.
435caeb to
1b751b2
Compare
Make `a.x is a.x`, `id(a.x)` stable across drop+refetch, and `f(x) is x`
for FFI returns by attaching a 16-byte custom-allocator header to every
Object allocation. Implements the PyObjectTying design.
Design:
- Two-layer custom-allocator hook in core libtvm_ffi.so:
`TVMFFIObjectAllocHeader { delete_space }`,
`TVMFFICustomAllocator { allocate, context }`, plus
`TVMFFIGetCustomAllocator` / `TVMFFISetCustomAllocator`. libtvm_ffi
installs a builtin default at registry init, so every Object always
carries the 8-byte base header and `TVMFFIGetCustomAllocator` never
returns NULL. The Python Cython module overrides the global default
at module load with `TVMFFIPyAllocate`, which prepends a 16-byte
`PyCustomAllocHeader` encoding the Python wrapper binding.
`make_object` / `make_inplace_array_object` / `PyClassDeleter` and
the Rust `ObjectArc::new[_with_extra_items]` paths all funnel through
the registry, so Python-defined types and Rust-allocated objects
share the layout and lifetime semantics.
- State machine concentrated in `tvm_ffi_python_helpers.h`.
`PyCustomAllocHeader` is `{ PyObject* cached_pyobj; base; }` (16 B).
The Active vs Inactive distinction lives on the wrapper's own
`chandle` field, not on a tag bit, so the header layout stays
pointer-equality clean:
Detached: cached_pyobj == NULL (no wrapper)
Active: cached_pyobj == self AND self.chandle == chandle (wrapper owns +1)
Inactive: cached_pyobj == self AND self.chandle == NULL (preserved bytes)
Four helpers expose the transitions:
`TVMFFIPyTryGetAttachedPyObject`, `TVMFFIPyAttachPyObject`,
`TVMFFIPyDetachPyObject`, `TVMFFIPyTPFinalize`.
- Universal cache-on for `make_ret`. Every FFI return funnels through
`make_ret_object`, which returns the canonical wrapper for a chandle
when one exists. Combined with Inactive -> Active revival on the
cached path, this delivers `a.x is a.x`, stable `id(a.x)` across
drop+refetch, and `f(x) is x` when a function returns the same
chandle it received.
- Inactive state via Cython `def __del__` on `CObject`, mapping to
`tp_finalize` (PEP 442). When other C++ holders keep the chandle
alive past the wrapper's Python refcount hitting 0,
`TVMFFIPyTPFinalize` nulls `self.chandle` BEFORE DECREFing the
chandle, then `Py_IncRef(self)` to resurrect the PyObject.
`cached_pyobj` is unchanged so a future re-fetch can rebind. Cython
auto-generates `tp_dealloc`, which now enters with the
`self.chandle == NULL` invariant established by `tp_finalize`.
Dropped the Py_LIMITED_API / SABI shim and the
`CYTHON_USE_TP_FINALIZE` define.
- Shutdown guard. `TVMFFIPyMarkPythonFinalizing` is wired to atexit
from Cython module init, flipping a `std::atomic<bool>` flag read
by `TVMFFIPyDeleteSpace` (via `TVMFFIPyIsPythonAlive`) before
`PyGILState_Ensure` to avoid GIL acquire after Python finalization
has begun. Inactive wrapper bytes on chandles still alive at that
point are intentionally leaked -- process is exiting.
- Frontend-allocation detection by `delete_space` pointer comparison
(`TVMFFIPyIsCanonical`) avoids a flag bit on `TVMFFIObject`.
Pre-Python-init chandles (statically-initialized global functions in
libtvm_ffi.so) carry only the base header; the Python side detects
this and skips the binding install.
`_move()` semantics: callback args alias the caller's wrapper under
universal cache-on (one wrapper, one chandle ref) and FFI function
returns of the same chandle alias the caller's wrapper. `_move()` is
kept as an API: the rvalue setter on either caller or callback side
eager-detaches the canonical-wrapper binding before the C++
AnyViewToOwnedAny transfer nulls the source chandle.
`test_function.py::test_rvalue_ref` is refactored for the new
aliasing-aware use_count expectations.
Tests: full Python suite passes (2332 passed, 19 skipped, 2 xfailed);
Rust suite passes. New `tests/python/test_pyobject_tying.py` covers
Active/Inactive transitions, cache-on aliasing, `_move()` under
cache-on, pickle stress, threading stress, GC integration with the
Inactive state, multi-chandle isolation, and the weakref limitation.
TODO carried in `function.pxi::_get_global_func`: name-keyed dict
cache for static-init Function id-stability. Most registry-resident
Functions are allocated at C++ static init (before the Python
allocator is registered) so their chandle has only the base header --
make_ret_object's cache can't reach them. Deferred to keep this change
small.
| if self.chandle != NULL: | ||
| CHECK_CALL(TVMFFIObjectDecRef(self.chandle)) | ||
| self.chandle = NULL | ||
| def __del__(self): |
There was a problem hiding this comment.
Let's double check and test it out, making sure this __del__ method will be properly invoked. I had some really negative experience with overloading __del__ because it could possibly prevent Python GC from recycling memory in certain cases. Not 100% sure if it applies to Cython's cdef class's __del__ method though.
| cls = make_fallback_cls_for_type_index(type_index) | ||
| obj = cls.__new__(cls) | ||
| (<CObject>obj).chandle = result.v_obj | ||
| TVMFFIPyAttachPyObject(result.v_obj, <PyObject*>obj) |
There was a problem hiding this comment.
comment, this is a new class, so it must be in detached state, attach it
| dlpack.deleter(dlpack) | ||
| pass | ||
| # default return the tensor | ||
| # default return the tensor. |
There was a problem hiding this comment.
NOTE: we don't try to attach py object for tensor types as it may appear in callback and there maybe translation depending on the context
| // ---------- | ||
| // I1. When a PyObject goes out of scope (no Python var refers to it), | ||
| // its +1 on chandle is always released. | ||
| // I2. When a chandle is destroyed, its cached PyObject (if any) is |
There was a problem hiding this comment.
When a chandle is destroyed, we can only be in Detached or Inactive state, if it is in Inactive state, the PyObject is reclaimed
| // ``self.chandle`` and DecRef. | ||
| // Detached -> Detached : ``self`` is not the canonical wrapper for | ||
| // this chandle (eager-detach via move already | ||
| // cleared the binding). Just DecRef. |
There was a problem hiding this comment.
need clarification here, since Detached menas chandle == NULL
| */ | ||
| TVM_FFI_INLINE void TVMFFIPyTPFinalize(void** ptr_to_chandle, PyObject* wrapper) { | ||
| void* chandle = *ptr_to_chandle; | ||
| if (chandle == nullptr) return; |
| * | ||
| * Detached -> Detached: ``cached_pyobj`` was already cleared by an | ||
| * eager move, or chandle was not allocated through the Python custom | ||
| * allocator. Just null ``wrapper.chandle`` and DecRef. |
There was a problem hiding this comment.
no need to DecRef here? just return and pass to dealloc
Summary
Make
a.x is a.x,id(a.x)stable across drop+refetch, andf(x) is xfor FFI returns, by attaching a 16-byte custom-allocator header to every Object allocation. Implements Tianqi Chen's PyObjectTying doc.Before:
After: both hold, and identity is preserved across a wrapper finalize-and-revive cycle whenever the C++ object outlives the wrapper.
Design
A two-layer allocator hook lives in core libtvm_ffi:
TVMFFIObjectAllocHeader { delete_space }— 8-byte base header preceding every Object body.TVMFFICustomAllocator { allocate, context }— process-wide registry; libtvm_ffi installs a builtin default at registry init soTVMFFIGetCustomAllocatornever returns NULL.TVMFFIGetCustomAllocator/TVMFFISetCustomAllocator— frontends override the default at module load.The Python Cython module overrides the global default with
TVMFFIPyAllocate, which prepends a 16-bytePyCustomAllocHeader { PyObject* cached_pyobj; base; }encoding the wrapper binding. The Rust crate (ObjectArc::new[_with_extra_items]) and the Python-defined types inextra/dataclass.ccroute through the same registry, so layout and lifetime semantics are uniform across frontends.The Active vs Inactive distinction lives on the wrapper's own
chandlefield rather than on a tag bit, so the header layout stays pointer-equality clean:header.cached_pyobjwrapper.chandleNULLwrapperchandlewrapperNULLThe state machine and four transition helpers (
TVMFFIPyTryGetAttachedPyObject/TVMFFIPyAttachPyObject/TVMFFIPyDetachPyObject/TVMFFIPyTPFinalize) are concentrated inpython/tvm_ffi/cython/tvm_ffi_python_helpers.h. Three invariants pin the contract:wrapper.chandle != NULLimplies the wrapper owns +1 on that chandle.Implementation highlights
make_ret— every FFI return funnels throughmake_ret_object, which returns the canonical wrapper for a chandle when one exists. Combined with Inactive -> Active revival, this deliversa.x is a.x, stableid(a.x)across drop+refetch, andf(x) is xfor matching returns.tp_finalize— Cythondef __del__maps totp_finalize(PEP 442). When other C++ holders keep the chandle alive past Python refcount 0,TVMFFIPyTPFinalizenullsself.chandlebefore DECREFing the chandle (preserving I3), thenPy_IncRef(self)to resurrect the PyObject.cached_pyobjis left pointing at the surviving wrapper for future revival. Cython auto-generatestp_dealloc, which now enters with theself.chandle == NULLinvariant established bytp_finalize.CYTHON_USE_TP_FINALIZEshim._move()semantics preserved — the rvalue setter eager-detaches the canonical-wrapper binding before the C++ AnyViewToOwnedAny transfer nulls the source chandle, so a downstream cache lookup never sees a stale back-pointer.TVMFFIPyIsCanonicalcomparesdelete_spaceagainstTVMFFIPyDeleteSpaceto recognize Python-allocated chandles, avoiding a flag bit onTVMFFIObject. Pre-Python-init chandles (statically-initialized global functions in libtvm_ffi.so) carry only the base header and skip the binding install.TVMFFIPyMarkPythonFinalizingis wired to atexit and flips astd::atomic<bool>flag;TVMFFIPyDeleteSpacechecks it (viaTVMFFIPyIsPythonAlive) beforePyGILState_Ensureto avoid touching a teardown interpreter.Test plan
tests/python/test_pyobject_tying.pycovers Active/Inactive transitions, cache-on aliasing,_move()under cache-on, pickle stress, threading stress, GC integration with the Inactive state, multi-chandle isolation, and the weakref limitationtest_function.py::test_rvalue_refrefactored for cache-on aliasing semanticsOut-of-scope follow-ups
_get_global_functo deliver id-stability for static-init Functions (whose chandles predate the Python allocator and so carry only the base header). Tracked as a TODO infunction.pxi::_get_global_func.