-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Is your feature request related to a problem or challenge?
While reviewing this PR from @erratic-pattern
I found myself confused about the intended type contract for InListExpr, specifically which combinations of needle and haystack types are expected to be valid. I would have expected InListExpr to compare values of the same logical type, and ideally for the implementation comments to make that explicit.
In particular, the current logic around dictionary arrays is hard to reason about. If InListExpr intentionally works on logically equivalent types rather than identical physical Arrow types, I think that should be documented clearly. If top-level dictionary haystacks are just an implementation detail, it may be cleaner to normalize them when building the static filter so the filter always stores the underlying value representation.
Describe the solution you'd like
At a minimum, I think it would help to:
- document the valid needle / haystack type combinations
- explain why dictionary needles are special-cased in contains
- make the tests reflect the intended contract
Describe alternatives you've considered
I think it is potentially worth changing InListExpr:
- to always flatten dictionaries (only use
valuesin thein_listand then specially handle inputDictionaryArrays - Only accept the same datatype for
needleandhaystack
This would make the API contract easier to reason about (always the same types) but the implementation can be optimized internally as well
Additional context
No response