Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Dec 12, 2025

PR Description: Improve Index.lookup and MultiIndex.lookup semantics

This pull request refines type-handling, error messaging, and row/column matching
behavior for both Index.contains and MultiIndex.lookup.
It also adds a regression test ensuring that mixed-dtype tuple keys do not
trigger incorrect scalar casting.
The changes improve correctness, readability, and alignment with Pandas semantics.


Summary of Changes

1. Index.lookup

File: arkouda/pandas/index.py

  • Updated the return-value docstring to correctly indicate that the result is a
    boolean pdarray of length len(self).
  • Improved the TypeError description to reflect that the function expects a
    value convertible into an Arkouda array.
  • Removed an unused import (akint64).

Motivation:
Clarifies semantics and avoids stale / unused imports.


2. MultiIndex.lookup

File: arkouda/pandas/index.py

Major improvements to validation, dtype behavior, and membership logic:

Validation

  • Rejects keys that are not list or tuple.
  • Enforces that the key length matches nlevels with a clear ValueError.

Two explicit code paths

  1. Per-level arkouda arrays (e.g. list of pdarray / Strings)
    Delegated directly to in1d(self.index, key) for vectorized matching.

  2. Scalar tuple keys (e.g., (1, "red"))

    • Scalars are wrapped into length‑1 Arkouda arrays without casting dtypes.
    • Prevents accidental coercion of string scalars into numeric types.

This behavior aligns better with Pandas and eliminates subtle dtype bugs.


3. New Test: Mixed-dtype tuple lookup

File: tests/pandas/index_test.py

Added test test_multiindex_lookup_tuple_mixed_dtypes:

  • Ensures that a scalar mixed-type key like (1, "red"):
    • Does not cast "red" into numeric types.
    • Produces correct row-level matching.
  • Verifies the mask is [True, False, False, False] for the provided example.

Motivation:
Prevents regressions and captures a real-world bug scenario.


Why This Matters

  • Fixes subtle multi-dtype matching bugs in MultiIndex.lookup.
  • Moves Arkouda’s Pandas-backed behavior closer to Pandas semantics.
  • Improves test coverage around a previously fragile API surface.
  • Supports downstream work on joins, grouping, and Index alignment.

Backward Compatibility

  • No breaking API changes.
  • Behavior is now more correct for mixed-type keys and aligns with expected user intuition.

Closes #5155: Bug: MultiIndex .lookup() attempts illegal dtype cast for tuple keys

@ajpotts ajpotts force-pushed the 5155_Bug_MultiIndex.lookup branch 2 times, most recently from 0c0232a to 2303c7b Compare December 16, 2025 22:55
@ajpotts ajpotts force-pushed the 5155_Bug_MultiIndex.lookup branch from 2303c7b to 11f3fe1 Compare December 17, 2025 13:32
@ajpotts ajpotts marked this pull request as ready for review December 17, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: MultiIndex .lookup() attempts illegal dtype cast for tuple keys

1 participant