Skip to content

Add FieldMinMax unified API for global numeric min/max retrieval#15752

Open
SYEDMDSAAD wants to merge 2 commits intoapache:mainfrom
SYEDMDSAAD:field-minmax-api
Open

Add FieldMinMax unified API for global numeric min/max retrieval#15752
SYEDMDSAAD wants to merge 2 commits intoapache:mainfrom
SYEDMDSAAD:field-minmax-api

Conversation

@SYEDMDSAAD
Copy link
Contributor

This patch introduces FieldMinMax, a unified API to retrieve global
minimum and maximum numeric values for a field across an IndexReader.

Currently Lucene exposes two different mechanisms:

  • PointValues.getMinPackedValue / getMaxPackedValue

    • returns null when no values exist
  • DocValuesSkipper.globalMinValue / globalMaxValue

    • returns sentinel values when metadata is unavailable

This forces callers to understand internal storage details and manually filter invalid values.

FieldMinMax abstracts over both implementations and provides consistent behavior:

  • Uses PointValues when available (accurate index statistics)
  • Falls back to DocValuesSkipper when metadata exists
  • Falls back to scanning NumericDocValues when skipper metadata is absent
  • Returns null when no values exist

This prevents sentinel leakage and simplifies caller logic.

Tests cover:

  • missing field
  • point values
  • doc values
  • mixed segments
  • empty segments

@navneet1v
Copy link
Contributor

@SYEDMDSAAD trying to understand the motivation behind this API in Lucene specifically? Why this cannot be added in your application side?

@SYEDMDSAAD
Copy link
Contributor Author

@SYEDMDSAAD trying to understand the motivation behind this API in Lucene specifically? Why this cannot be added in your application side?

@navneet1v Thanks for the question.

The motivation is mainly consistency. Right now, Lucene exposes two different APIs for global min/max, and they behave differently when no data exists — one returns null, the other returns sentinel values. That forces callers to know internal details and handle edge cases themselves.

While this can be implemented on the application side, the logic depends on Lucene-specific behavior (like skipper availability and sentinel semantics). Centralizing it in Lucene provides a single, consistent contract and avoids duplication and potential mistakes in user code.

So the idea is to normalize existing Lucene behavior, not just add convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants