Describe the bug
Several HData APIs return a new HData instance but retain references to mutable tensors from the original object instead of cloning them. This creates shared storage between the original and derived objects, so mutating the returned object can unexpectedly mutate the source object as well.
The highest-risk cases are with_y_to(), enrich_node_features(), enrich_node_features_from(), enrich_hyperedge_weights(), enrich_hyperedge_attr(), shuffle(), split() in transductive mode, and cat_same_node_space().
To teproduce
Steps to reproduce the behavior:
- Create an
HData instance with x, hyperedge_index, and optionally global_node_ids.
- Call a method that returns a new
HData, for example with_y_zeros(), split(..., node_space_setting="transductive"), or cat_same_node_space(...).
- Mutate a shared tensor on the returned object, such as
returned.hyperedge_index, returned.x, or returned.global_node_ids.
- Observe that the original
HData instance has changed as well.
Expected behavior
Methods that return a new HData should not share mutable tensor storage with the source object unless that behavior is explicitly documented. Mutating the returned object should not mutate the original one.
Additional context
HyperedgeIndex.to_0based() is intentionally in-place, but it makes this class of bug easier to trigger when caller-owned tensors are passed into wrappers without cloning first.
Concrete examples from the codebase:
HData.with_y_to() returns a “copy” but reuses x, hyperedge_index, global_node_ids, hyperedge_attr, and hyperedge_weights.
HData.split() clones hyperedge_index before rebasing, but in transductive mode it still reuses x and global_node_ids from the parent object.
HData.cat_same_node_space() reuses x and global_node_ids from one of the inputs.
Describe the bug
Several
HDataAPIs return a newHDatainstance but retain references to mutable tensors from the original object instead of cloning them. This creates shared storage between the original and derived objects, so mutating the returned object can unexpectedly mutate the source object as well.The highest-risk cases are
with_y_to(),enrich_node_features(),enrich_node_features_from(),enrich_hyperedge_weights(),enrich_hyperedge_attr(),shuffle(),split()in transductive mode, andcat_same_node_space().To teproduce
Steps to reproduce the behavior:
HDatainstance withx,hyperedge_index, and optionallyglobal_node_ids.HData, for examplewith_y_zeros(),split(..., node_space_setting="transductive"), orcat_same_node_space(...).returned.hyperedge_index,returned.x, orreturned.global_node_ids.HDatainstance has changed as well.Expected behavior
Methods that return a new
HDatashould not share mutable tensor storage with the source object unless that behavior is explicitly documented. Mutating the returned object should not mutate the original one.Additional context
HyperedgeIndex.to_0based()is intentionally in-place, but it makes this class of bug easier to trigger when caller-owned tensors are passed into wrappers without cloning first.Concrete examples from the codebase:
HData.with_y_to()returns a “copy” but reusesx,hyperedge_index,global_node_ids,hyperedge_attr, andhyperedge_weights.HData.split()cloneshyperedge_indexbefore rebasing, but in transductive mode it still reusesxandglobal_node_idsfrom the parent object.HData.cat_same_node_space()reusesxandglobal_node_idsfrom one of the inputs.