Skip to content

Improve InListExpr types and expectations #20969

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

While reviewing this PR from @erratic-pattern

I found myself confused about the intended type contract for InListExpr, specifically which combinations of needle and haystack types are expected to be valid. I would have expected InListExpr to compare values of the same logical type, and ideally for the implementation comments to make that explicit.

In particular, the current logic around dictionary arrays is hard to reason about. If InListExpr intentionally works on logically equivalent types rather than identical physical Arrow types, I think that should be documented clearly. If top-level dictionary haystacks are just an implementation detail, it may be cleaner to normalize them when building the static filter so the filter always stores the underlying value representation.

Describe the solution you'd like

At a minimum, I think it would help to:

  • document the valid needle / haystack type combinations
  • explain why dictionary needles are special-cased in contains
  • make the tests reflect the intended contract

Describe alternatives you've considered

I think it is potentially worth changing InListExpr:

  1. to always flatten dictionaries (only use values in the in_list and then specially handle input DictionaryArrays
  2. Only accept the same datatype for needle and haystack

This would make the API contract easier to reason about (always the same types) but the implementation can be optimized internally as well

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions