Skip to content

Fix reset_index crash when obs index name matches existing column#1100

Open
timtreis wants to merge 3 commits intomainfrom
fix/reset-index-column-collision
Open

Fix reset_index crash when obs index name matches existing column#1100
timtreis wants to merge 3 commits intomainfrom
fix/reset-index-column-collision

Conversation

@timtreis
Copy link
Member

@timtreis timtreis commented Mar 25, 2026

Summary

  • _inner_join_spatialelement_table and _left_join_spatialelement_table call table.obs.reset_index() which raises ValueError: cannot insert <name>, already exists when the obs index name matches an existing column (e.g. EntityID in Merfish data)
  • Adds a small helper _reset_index_preserving_existing_columns that drops the index when its name already exists as a column, avoiding the collision

timtreis and others added 3 commits March 25, 2026 13:49
)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.93%. Comparing base (78f75ef) to head (ef5b015).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1100   +/-   ##
=======================================
  Coverage   91.93%   91.93%           
=======================================
  Files          51       51           
  Lines        7772     7776    +4     
=======================================
+ Hits         7145     7149    +4     
  Misses        627      627           
Files with missing lines Coverage Δ
src/spatialdata/_core/query/relational_query.py 91.70% <100.00%> (+0.07%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis timtreis requested a review from LucaMarconato March 25, 2026 13:39
name collides with an existing column (e.g. "EntityID" in Merfish data).
"""
if df.index.name is not None and df.index.name in df.columns:
return df.reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we are dropping the index instead of a column. This could be a problem if the column is different from the index.

What's the origin behind this problem? Is it that for the merfish data we should have had sdata['table'].index.name = None? That would be a better fix, while still raising an exception here.


@pytest.mark.parametrize("how", ["inner", "left"])
def test_join_spatialelement_table_obs_index_name_collision(how):
"""join_spatialelement_table must not crash when obs index name matches an existing column.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should crash in general to avoid a silent branching in the behavior. The ok case would be if both the index and the column have the same values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants