Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Dec 2, 2025

PR: Comprehensive Pandas ExtensionArray Inheritance Tests + Categorical Stubs

Summary

This PR introduces a comprehensive test suite verifying that Arkouda’s
ArkoudaArray, ArkoudaStringArray, and ArkoudaCategoricalArray correctly inherit behavior from pandas.api.extensions.ExtensionArray.

Tests ensure:

1. Arkouda Does Not Override pandas EA Methods

For each method, the suite asserts:

  • the method is not defined in Arkouda
  • MRO resolves to pandas’ ExtensionArray
  • the inherited behavior matches pandas where supported

Verified inherited methods include:

_fill_mask_inplace, _formatter, _from_scalars, _get_repr_footer,
_hash_pandas_object, _putmask, _rank, _repr_2d, _values_for_argsort,
delete, dropna, insert, isin, map, ravel, repeat,
searchsorted, shift, tolist, transpose, unique, and more.

2. Behavior-Level Parity With pandas

Tests check:

  • value equality
  • dtype agreement
  • NaN/NA handling
  • scalar vs list-like inputs
  • matching behavior for numeric, string, and categorical EAs

3. Known Unsupported Behavior Is Explicitly Tested

Some pandas behaviors require dtype conversions Arkouda does not support yet.

Locked-in expected behaviors:

  • ArkoudaCategoricalArray.map → xfail
  • ArkoudaCategoricalArray.shift → raises NotImplementedError
  • _repr_2d on numeric EAs → raises TypeError

4. Added NotImplementedError Stubs for Categorical Internals

Stubs added for pandas Categorical-internal helper APIs:
_set_categories, _replace, _reverse_indexer, _values_for_rank,
add_categories, rename_categories, remove_unused_categories, etc.

5. Test Coverage Improvements

Covers:

  • unique() w/ NaNs
  • hashing parity
  • ranks w/ ties
  • deletion, insertion, ravel, repeat
  • searchsorted for strings & categoricals
  • tolist() alignment with pandas EA semantics
  • map() for strings, xfail for categoricals

Motivation

Previously:

  • Many inherited EA behaviors were untested
  • Missing documentation of unsupported categorical operations
  • Risk of silent regressions from upstream pandas changes

This PR establishes a correctness baseline.

Examples

import arkouda as ak
from arkouda.pandas.extension import ArkoudaArray
import pandas as pd

arr = ArkoudaArray(ak.array([10,20,10,30]))
pd.util.hash_pandas_object(pd.Series(arr))
from arkouda.pandas.extension import ArkoudaStringArray
arr = ArkoudaStringArray(ak.array(["a","b","a"]))
arr.unique().tolist()
# ['a', 'b']
from arkouda.pandas.extension import ArkoudaCategoricalArray
arr = ArkoudaCategoricalArray(ak.Categorical(ak.array(["x","y","x"])))
arr.tolist()
# ['x','y','x']

Test Commands

pytest tests/pandas/extension/ -q
pytest -k "Map" -q
pytest -k "Categorical" -q
pytest -k "SearchSorted" -q

Closes #5096: unit tests for inherited functions in arkouda extension array

@ajpotts ajpotts force-pushed the 5096_unit_tests_for_inherited_functions_in_arkouda_extension_array branch from ad25375 to 8e6e666 Compare December 3, 2025 00:02
@ajpotts ajpotts marked this pull request as ready for review December 3, 2025 14:00
@ajpotts ajpotts force-pushed the 5096_unit_tests_for_inherited_functions_in_arkouda_extension_array branch from fec84e2 to 1520945 Compare December 16, 2025 22:32
@ajpotts ajpotts force-pushed the 5096_unit_tests_for_inherited_functions_in_arkouda_extension_array branch from 1520945 to ff72e6b Compare December 18, 2025 20:07
Copy link
Collaborator

@1RyanK 1RyanK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unit tests for inherited functions in arkouda extension array

2 participants