Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Dec 24, 2025

Improve astype semantics across Arkouda pandas ExtensionArrays

Summary

This PR implements fully-featured, pandas-compatible astype behavior for all Arkouda-backed pandas ExtensionArrays:

  • ArkoudaArray
  • ArkoudaCategoricalArray
  • ArkoudaStringArray

The implementation aligns with pandas’ ExtensionArray.astype contract, avoids unnecessary NumPy fallbacks, and consistently returns Arkouda-backed ExtensionArrays whenever possible.

In addition, this PR:

  • Adds comprehensive type hints and overloads to satisfy mypy
  • Expands string dtype normalization ("string" support)
  • Introduces extensive unit tests covering numeric, string, categorical, object, and ExtensionDtype casting
  • Makes doctests resilient to platform-dependent string-width differences

Key Changes

✅ Correct, consistent astype behavior

  • objectalways returns NumPy
  • Same dtype + copy=Falsereturns self
  • Numeric ↔ numeric → server-side Arkouda cast
  • Categorical
    • category → stays categorical
    • stringArkoudaStringArray
    • other dtypes → labels cast via Arkouda, returned as ExtensionArray
  • Strings
    • string targets → stay ArkoudaStringArray
    • numeric/bool targets → server-side cast, return ExtensionArray
    • invalid numeric parses raise RuntimeError (documented and tested)

🧠 Pandas compatibility & typing

  • Adds explicit @overload signatures matching pandas ExtensionArray.astype
  • Fixes mypy override and return-type errors

🧪 Tests

New test coverage includes:

  • Numeric → numeric casts
  • ExtensionDtype targets (pd.Int64Dtype, pd.StringDtype, etc.)
  • Categorical → string / numeric
  • String → numeric / object
  • Invalid string-to-numeric casts
  • Copy vs no-copy semantics
  • Doctest validation using ellipsis to avoid brittle unicode-width assertions

Why this matters

  • Enables Arkouda ExtensionArrays to behave predictably inside pandas pipelines
  • Avoids silent NumPy fallbacks that break distributed execution
  • Clarifies and documents Arkouda’s server-side casting semantics
  • Unblocks downstream pandas operations that rely on astype

Closes #5219: improve astype in ak.pandas.extension

@ajpotts ajpotts force-pushed the 5219_improve_astype_in_ak.pandas.extension branch from f7d1dfc to 1935bed Compare December 24, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improve astype in ak.pandas.extension

1 participant