Skip to content

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593

Open
cyx-6 wants to merge 1 commit into
apache:mainfrom
cyx-6:pyobject
Open

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593
cyx-6 wants to merge 1 commit into
apache:mainfrom
cyx-6:pyobject

Conversation

@cyx-6
Copy link
Copy Markdown
Contributor

@cyx-6 cyx-6 commented May 19, 2026

Summary

Make a.x is a.x, id(a.x) stable across drop+refetch, and f(x) is x for FFI returns, by attaching a 16-byte custom-allocator header to every Object allocation. Implements Tianqi Chen's PyObjectTying doc.

Before:

a = MyClass(Inner(...))
assert a.x is a.x          # False — fresh wrapper per attribute access
assert id(a.x) == id(a.x)  # flaky

After: both hold, and identity is preserved across a wrapper finalize-and-revive cycle whenever the C++ object outlives the wrapper.

Design

A two-layer allocator hook lives in core libtvm_ffi:

  • TVMFFIObjectAllocHeader { delete_space } — 8-byte base header preceding every Object body.
  • TVMFFICustomAllocator { allocate, context } — process-wide registry; libtvm_ffi installs a builtin default at registry init so TVMFFIGetCustomAllocator never returns NULL.
  • TVMFFIGetCustomAllocator / TVMFFISetCustomAllocator — frontends override the default at module load.

The Python Cython module overrides the global default with TVMFFIPyAllocate, which prepends a 16-byte PyCustomAllocHeader { PyObject* cached_pyobj; base; } encoding the wrapper binding. The Rust crate (ObjectArc::new[_with_extra_items]) and the Python-defined types in extra/dataclass.cc route through the same registry, so layout and lifetime semantics are uniform across frontends.

The Active vs Inactive distinction lives on the wrapper's own chandle field rather than on a tag bit, so the header layout stays pointer-equality clean:

State header.cached_pyobj wrapper.chandle Meaning
Detached NULL No wrapper bound
Active wrapper chandle Live canonical wrapper; wrapper owns +1
Inactive wrapper NULL Wrapper bytes preserved (phantom +1); chandle kept alive only by other C++ holders

The state machine and four transition helpers (TVMFFIPyTryGetAttachedPyObject / TVMFFIPyAttachPyObject / TVMFFIPyDetachPyObject / TVMFFIPyTPFinalize) are concentrated in python/tvm_ffi/cython/tvm_ffi_python_helpers.h. Three invariants pin the contract:

  • I1. When a PyObject goes out of scope, its +1 on chandle is always released.
  • I2. When a chandle is destroyed, its cached PyObject (if any) is reclaimed.
  • I3. wrapper.chandle != NULL implies the wrapper owns +1 on that chandle.

Implementation highlights

  • Universal cache-on for make_ret — every FFI return funnels through make_ret_object, which returns the canonical wrapper for a chandle when one exists. Combined with Inactive -> Active revival, this delivers a.x is a.x, stable id(a.x) across drop+refetch, and f(x) is x for matching returns.
  • Active -> Inactive via tp_finalize — Cython def __del__ maps to tp_finalize (PEP 442). When other C++ holders keep the chandle alive past Python refcount 0, TVMFFIPyTPFinalize nulls self.chandle before DECREFing the chandle (preserving I3), then Py_IncRef(self) to resurrect the PyObject. cached_pyobj is left pointing at the surviving wrapper for future revival. Cython auto-generates tp_dealloc, which now enters with the self.chandle == NULL invariant established by tp_finalize.
  • No SABI — dropped Python Limited API and the CYTHON_USE_TP_FINALIZE shim.
  • _move() semantics preserved — the rvalue setter eager-detaches the canonical-wrapper binding before the C++ AnyViewToOwnedAny transfer nulls the source chandle, so a downstream cache lookup never sees a stale back-pointer.
  • Frontend detectionTVMFFIPyIsCanonical compares delete_space against TVMFFIPyDeleteSpace to recognize Python-allocated chandles, avoiding a flag bit on TVMFFIObject. Pre-Python-init chandles (statically-initialized global functions in libtvm_ffi.so) carry only the base header and skip the binding install.
  • Shutdown guardTVMFFIPyMarkPythonFinalizing is wired to atexit and flips a std::atomic<bool> flag; TVMFFIPyDeleteSpace checks it (via TVMFFIPyIsPythonAlive) before PyGILState_Ensure to avoid touching a teardown interpreter.

Test plan

  • Full Python suite passes (2332 passed, 19 skipped, 2 xfailed)
  • Full Rust suite passes
  • New tests/python/test_pyobject_tying.py covers Active/Inactive transitions, cache-on aliasing, _move() under cache-on, pickle stress, threading stress, GC integration with the Inactive state, multi-chandle isolation, and the weakref limitation
  • test_function.py::test_rvalue_ref refactored for cache-on aliasing semantics
  • CI: lint, clang-tidy, doc, C++/Python/Rust on Linux x86_64 + aarch64, macOS arm64, Windows AMD64

Out-of-scope follow-ups

  • Name-keyed dict cache for _get_global_func to deliver id-stability for static-init Functions (whose chandles predate the Python allocator and so carry only the base header). Tracked as a TODO in function.pxi::_get_global_func.

@cyx-6 cyx-6 marked this pull request as draft May 19, 2026 10:32
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'PyObjectTying,' a mechanism to link the lifetime of Python wrappers to their underlying C++ FFI objects. By prepending a PyCustomizeAllocHeader to object allocations, the implementation enables stable object identity and address preservation across wrapper finalization cycles. The changes include modifications to the C++ SimpleObjAllocator, Cython bindings to utilize tp_finalize, and updates to the reflection system. Feedback from the review highlights a critical thread-safety issue in the callback used to free cached wrapper memory, which lacks GIL protection, and a potential memory leak when installing handles if a previous phantom wrapper is not correctly released.

Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/object.pxi Outdated
@cyx-6 cyx-6 force-pushed the pyobject branch 2 times, most recently from 435caeb to 1b751b2 Compare May 24, 2026 13:03
Comment thread include/tvm/ffi/c_api.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread include/tvm/ffi/c_api.h Outdated
Comment thread include/tvm/ffi/c_api.h
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread src/ffi/custom_allocator.cc Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Make `a.x is a.x`, `id(a.x)` stable across drop+refetch, and `f(x) is x`
for FFI returns by attaching a 16-byte custom-allocator header to every
Object allocation. Implements the PyObjectTying design.

Design:

- Two-layer custom-allocator hook in core libtvm_ffi.so:
  `TVMFFIObjectAllocHeader { delete_space }`,
  `TVMFFICustomAllocator { allocate, context }`, plus
  `TVMFFIGetCustomAllocator` / `TVMFFISetCustomAllocator`. libtvm_ffi
  installs a builtin default at registry init, so every Object always
  carries the 8-byte base header and `TVMFFIGetCustomAllocator` never
  returns NULL. The Python Cython module overrides the global default
  at module load with `TVMFFIPyAllocate`, which prepends a 16-byte
  `PyCustomAllocHeader` encoding the Python wrapper binding.
  `make_object` / `make_inplace_array_object` / `PyClassDeleter` and
  the Rust `ObjectArc::new[_with_extra_items]` paths all funnel through
  the registry, so Python-defined types and Rust-allocated objects
  share the layout and lifetime semantics.

- State machine concentrated in `tvm_ffi_python_helpers.h`.
  `PyCustomAllocHeader` is `{ PyObject* cached_pyobj; base; }` (16 B).
  The Active vs Inactive distinction lives on the wrapper's own
  `chandle` field, not on a tag bit, so the header layout stays
  pointer-equality clean:
    Detached:  cached_pyobj == NULL                                   (no wrapper)
    Active:    cached_pyobj == self  AND  self.chandle == chandle     (wrapper owns +1)
    Inactive:  cached_pyobj == self  AND  self.chandle == NULL        (preserved bytes)
  Four helpers expose the transitions:
  `TVMFFIPyTryGetAttachedPyObject`, `TVMFFIPyAttachPyObject`,
  `TVMFFIPyDetachPyObject`, `TVMFFIPyTPFinalize`.

- Universal cache-on for `make_ret`. Every FFI return funnels through
  `make_ret_object`, which returns the canonical wrapper for a chandle
  when one exists. Combined with Inactive -> Active revival on the
  cached path, this delivers `a.x is a.x`, stable `id(a.x)` across
  drop+refetch, and `f(x) is x` when a function returns the same
  chandle it received.

- Inactive state via Cython `def __del__` on `CObject`, mapping to
  `tp_finalize` (PEP 442). When other C++ holders keep the chandle
  alive past the wrapper's Python refcount hitting 0,
  `TVMFFIPyTPFinalize` nulls `self.chandle` BEFORE DECREFing the
  chandle, then `Py_IncRef(self)` to resurrect the PyObject.
  `cached_pyobj` is unchanged so a future re-fetch can rebind. Cython
  auto-generates `tp_dealloc`, which now enters with the
  `self.chandle == NULL` invariant established by `tp_finalize`.
  Dropped the Py_LIMITED_API / SABI shim and the
  `CYTHON_USE_TP_FINALIZE` define.

- Shutdown guard. `TVMFFIPyMarkPythonFinalizing` is wired to atexit
  from Cython module init, flipping a `std::atomic<bool>` flag read
  by `TVMFFIPyDeleteSpace` (via `TVMFFIPyIsPythonAlive`) before
  `PyGILState_Ensure` to avoid GIL acquire after Python finalization
  has begun. Inactive wrapper bytes on chandles still alive at that
  point are intentionally leaked -- process is exiting.

- Frontend-allocation detection by `delete_space` pointer comparison
  (`TVMFFIPyIsCanonical`) avoids a flag bit on `TVMFFIObject`.
  Pre-Python-init chandles (statically-initialized global functions in
  libtvm_ffi.so) carry only the base header; the Python side detects
  this and skips the binding install.

`_move()` semantics: callback args alias the caller's wrapper under
universal cache-on (one wrapper, one chandle ref) and FFI function
returns of the same chandle alias the caller's wrapper. `_move()` is
kept as an API: the rvalue setter on either caller or callback side
eager-detaches the canonical-wrapper binding before the C++
AnyViewToOwnedAny transfer nulls the source chandle.
`test_function.py::test_rvalue_ref` is refactored for the new
aliasing-aware use_count expectations.

Tests: full Python suite passes (2332 passed, 19 skipped, 2 xfailed);
Rust suite passes. New `tests/python/test_pyobject_tying.py` covers
Active/Inactive transitions, cache-on aliasing, `_move()` under
cache-on, pickle stress, threading stress, GC integration with the
Inactive state, multi-chandle isolation, and the weakref limitation.

TODO carried in `function.pxi::_get_global_func`: name-keyed dict
cache for static-init Function id-stability. Most registry-resident
Functions are allocated at C++ static init (before the Python
allocator is registered) so their chandle has only the base header --
make_ret_object's cache can't reach them. Deferred to keep this change
small.
@cyx-6 cyx-6 marked this pull request as ready for review May 28, 2026 18:10
if self.chandle != NULL:
CHECK_CALL(TVMFFIObjectDecRef(self.chandle))
self.chandle = NULL
def __del__(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's double check and test it out, making sure this __del__ method will be properly invoked. I had some really negative experience with overloading __del__ because it could possibly prevent Python GC from recycling memory in certain cases. Not 100% sure if it applies to Cython's cdef class's __del__ method though.

cls = make_fallback_cls_for_type_index(type_index)
obj = cls.__new__(cls)
(<CObject>obj).chandle = result.v_obj
TVMFFIPyAttachPyObject(result.v_obj, <PyObject*>obj)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment, this is a new class, so it must be in detached state, attach it

dlpack.deleter(dlpack)
pass
# default return the tensor
# default return the tensor.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: we don't try to attach py object for tensor types as it may appear in callback and there maybe translation depending on the context

// ----------
// I1. When a PyObject goes out of scope (no Python var refers to it),
// its +1 on chandle is always released.
// I2. When a chandle is destroyed, its cached PyObject (if any) is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a chandle is destroyed, we can only be in Detached or Inactive state, if it is in Inactive state, the PyObject is reclaimed

// ``self.chandle`` and DecRef.
// Detached -> Detached : ``self`` is not the canonical wrapper for
// this chandle (eager-detach via move already
// cleared the binding). Just DecRef.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need clarification here, since Detached menas chandle == NULL

*/
TVM_FFI_INLINE void TVMFFIPyTPFinalize(void** ptr_to_chandle, PyObject* wrapper) {
void* chandle = *ptr_to_chandle;
if (chandle == nullptr) return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment Detached

*
* Detached -> Detached: ``cached_pyobj`` was already cleared by an
* eager move, or chandle was not allocated through the Python custom
* allocator. Just null ``wrapper.chandle`` and DecRef.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to DecRef here? just return and pass to dealloc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants