Skip to content

Implement ArkoudaCategoricalArray.__setitem__ (pandas-aligned) #5430

@ajpotts

Description

@ajpotts

Summary

Implement ArkoudaCategoricalArray.__setitem__ to support
pandas-compatible item assignment into Arkouda-backed categorical
ExtensionArrays.

This is required for common pandas workflows such as:

  • Series.loc[...] = ... / Series.iloc[...] = ...
  • boolean mask assignment
  • where/mask
  • fillna and other in-place manager paths
  • categorical value replacement without dtype loss

Currently, assignment into Arkouda categorical arrays is missing or
inconsistent, leading to TypeError/NotImplementedError or pandas
fallback behavior (often converting to object/NumPy).


Background / Why

pandas Categorical supports item assignment with strict rules:

  • Assigned values must be existing categories or missing
  • New categories are not implicitly added (unless user explicitly
    adds them via add_categories or similar higher-level API)
  • Missing values are supported and propagate through codes/mask
  • Assignment must preserve dtype
    (CategoricalDtype(categories=..., ordered=...))

For Arkouda-backed categoricals, we want identical semantics while
keeping operations server-side where possible.


Requirements / Expected pandas Semantics

Given categories ["a", "b"]:

  1. Assign existing category:
    • cat[0] = "b" is allowed
  2. Assign missing:
    • cat[0] = None / pd.NA is allowed and marks entry missing
  3. Assign value not in categories:
    • cat[0] = "c" should raise (typically
      TypeError/ValueError depending on path)
    • pandas message often indicates: "Cannot setitem on a Categorical
      with a new category..."
  4. Assignment via indexers should work:
    • int, slice, boolean mask, integer array indexer
  5. Broadcasting rules:
    • scalar value broadcasts to all targeted positions
    • array-like values must match number of targeted positions

Scope

In Scope

  • Implement ArkoudaCategoricalArray.__setitem__(key, value)
  • Support keys:
    • int position
    • slice
    • boolean mask (same length)
    • integer indexer (array-like positions)
  • Support values:
    • scalar category label
    • scalar missing (None, pd.NA, possibly np.nan)
    • array-like of labels/missing matching target selection length
    • another ArkoudaCategoricalArray (assignment by position)
  • Enforce "no new categories" rule
  • Preserve:
    • categories
    • ordered flag
    • dtype and internal representation (codes + categories + missing
      marker/mask)
  • Add unit tests

Out of Scope

  • Adding categories automatically during setitem
  • Implementing add_categories / remove_categories (if not already
    present)
  • 2D assignment (categorical EA is 1D)
  • Alignment by Index labels (handled by pandas, not EA)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions