Skip to content

HData methods return a new instance but retain references to mutable tensors from the original object #172

@tizianocitro

Description

@tizianocitro

Describe the bug

Several HData APIs return a new HData instance but retain references to mutable tensors from the original object instead of cloning them. This creates shared storage between the original and derived objects, so mutating the returned object can unexpectedly mutate the source object as well.

The highest-risk cases are with_y_to(), enrich_node_features(), enrich_node_features_from(), enrich_hyperedge_weights(), enrich_hyperedge_attr(), shuffle(), split() in transductive mode, and cat_same_node_space().

To teproduce

Steps to reproduce the behavior:

  1. Create an HData instance with x, hyperedge_index, and optionally global_node_ids.
  2. Call a method that returns a new HData, for example with_y_zeros(), split(..., node_space_setting="transductive"), or cat_same_node_space(...).
  3. Mutate a shared tensor on the returned object, such as returned.hyperedge_index, returned.x, or returned.global_node_ids.
  4. Observe that the original HData instance has changed as well.

Expected behavior

Methods that return a new HData should not share mutable tensor storage with the source object unless that behavior is explicitly documented. Mutating the returned object should not mutate the original one.

Additional context

HyperedgeIndex.to_0based() is intentionally in-place, but it makes this class of bug easier to trigger when caller-owned tensors are passed into wrappers without cloning first.

Concrete examples from the codebase:

  • HData.with_y_to() returns a “copy” but reuses x, hyperedge_index, global_node_ids, hyperedge_attr, and hyperedge_weights.
  • HData.split() clones hyperedge_index before rebasing, but in transductive mode it still reuses x and global_node_ids from the parent object.
  • HData.cat_same_node_space() reuses x and global_node_ids from one of the inputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions