Skip to content

GH-1125: Read empty ListVector correctly using UnionListReader after IPC deser#1136

Open
bodduv wants to merge 1 commit intoapache:mainfrom
bodduv:arrow-java/GH-1125-empty-list-reader-position
Open

GH-1125: Read empty ListVector correctly using UnionListReader after IPC deser#1136
bodduv wants to merge 1 commit intoapache:mainfrom
bodduv:arrow-java/GH-1125-empty-list-reader-position

Conversation

@bodduv
Copy link
Copy Markdown
Contributor

@bodduv bodduv commented May 6, 2026

Arrow IPC represents variable-length vectors with an offset buffer containing valueCount + 1 offsets. For an empty ListVector, that still means the serialized and deserialized vector can have a non-empty offset buffer containing the leading zero offset. This is correct according to the Arrow layout, but it exposes a bug at UnionListReader.setPosition and other similar places. UnionListReader.setPosition(0) used offset-buffer capacity as the empty-vector check. That worked only when the offset buffer had zero capacity. After IPC, the empty vector has non-zero offset-buffer capacity, so the reader could throw IndexOutOfBoundsException. UnionLargeListReader has the same logical issue and also lacked the empty-buffer guard.

What's Changed

Update UnionListReader and UnionLargeListReader to validate reader positioning against valueCount instead of treating offset-buffer capacity as the logical row boundary. All out-of-range positions will throw. For valid non-empty positions, the readers also defensively verify that the offset buffer has enough capacity for both index and index + 1 before reading offsets.

The shared bounds logic is kept in a package-private UnionListReaderBoundsChecker helper so UnionListReader and UnionLargeListReader use the reuse code.

Closes #1125.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Thank you for opening a pull request!

Please label the PR with one or more of:

  • bug-fix
  • chore
  • dependencies
  • documentation
  • enhancement

Also, add the 'breaking-change' label if appropriate.

See CONTRIBUTING.md for details.

@bodduv
Copy link
Copy Markdown
Contributor Author

bodduv commented May 6, 2026

I don't think this is a breaking-change, it should be labeled bug-fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnionListReader.setPosition throws IOOBE on a post-IPC empty List

1 participant