Add first-class logical UUID type support#18140
Open
xiangfu0 wants to merge 10 commits intoapache:masterfrom
Open
Add first-class logical UUID type support#18140xiangfu0 wants to merge 10 commits intoapache:masterfrom
xiangfu0 wants to merge 10 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18140 +/- ##
============================================
+ Coverage 63.13% 63.17% +0.03%
- Complexity 1610 1616 +6
============================================
Files 3213 3222 +9
Lines 195730 196333 +603
Branches 30240 30338 +98
============================================
+ Hits 123583 124041 +458
- Misses 62281 62354 +73
- Partials 9866 9938 +72
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
94e9c8b to
be53c71
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a first-class logical UUID type to Pinot (backed by 16-byte BYTES in v1), and wires UUID semantics through schema/type handling, indexing, query planning/execution, and result formatting, with unit + integration coverage and user-facing documentation.
Changes:
- Introduce
UUIDas a logical type (FieldSpec.DataType.UUID/DataSchema.ColumnDataType.UUID) with canonical RFC 4122 lowercase string rendering viaUuidUtils. - Propagate UUID-aware behavior through dictionaries, bloom filters, raw-value inverted indexes, casts/literals, predicates, distinct/grouping, and query planner/runtime type mapping + (de)serialization.
- Add targeted unit tests plus offline/realtime integration tests and README documentation.
Reviewed changes
Copilot reviewed 76 out of 76 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Document UUID logical type usage, casting, and migration notes. |
| pinot-spi/src/test/java/org/apache/pinot/spi/data/SchemaTest.java | Add schema validation tests for UUID SV-only and default handling. |
| pinot-spi/src/test/java/org/apache/pinot/spi/data/FieldSpecTest.java | Add UUID DataType storedType/size and conversion/default-null tests. |
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/UuidUtils.java | New UUID conversion utilities (string/UUID/bytes/ByteArray). |
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java | Add UUID null placeholders. |
| pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java | Allow UUID in schema validation; enforce UUID SV-only. |
| pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java | Add UUID DataType and UUID-aware conversions/formatting/default handling. |
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/creator/BloomFilterCreator.java | Add UUID string rendering when inserting into bloom filters. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/creator/inv/RawValueBitmapInvertedIndexTest.java | Extend raw inverted index tests to UUID + generic API path. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/creator/BloomFilterCreatorTest.java | Add bloom filter creator test for UUID values stored as bytes. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/column/DefaultNullValueVirtualColumnProviderTest.java | Add UUID coverage for virtual column default-null dictionary/metadata. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/realtime/impl/dictionary/MutableDictionaryTest.java | Add UUID canonical string lookup tests for mutable bytes dictionaries. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/RawValueBitmapInvertedIndexReader.java | Make bytes dictionary logical-type aware; add getDocIdsForBytes. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/OnHeapBytesDictionary.java | Add logical type to parse/format BYTES vs UUID. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/ConstantValueBytesDictionary.java | Add logical type to parse/format BYTES vs UUID. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/BytesDictionary.java | Add logical type to parse/format BYTES vs UUID. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/InvertedIndexHandler.java | Use stored type; enable raw inverted index creation for UUID (stored as BYTES). |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/columnminmaxvalue/ColumnMinMaxValueGenerator.java | Pass logical type into bytes dictionary for min/max generation. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/bloomfilter/BloomFilterHandler.java | Use DataType-aware string formatting for bloom filter population (UUID vs BYTES). |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java | Plumb logical type into bytes dictionary and mutable dictionary creation. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/column/DefaultNullValueVirtualColumnProvider.java | Build bytes dictionary with logical type to format UUID defaults correctly. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/inv/RawValueBitmapInvertedIndexCreator.java | Fix raw inverted index dictionary temp-file handling; use ByteArray keys for BYTES/UUID. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/BaseSegmentCreator.java | Allow inverted index without dictionary for UUID via raw-value inverted index creator. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/dictionary/MutableDictionaryFactory.java | Create bytes dictionaries with logical type (UUID vs BYTES) based on stored type. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/dictionary/BytesOnHeapMutableDictionary.java | Add logical type parsing/formatting for UUID vs BYTES. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/dictionary/BytesOffHeapMutableDictionary.java | Add logical type parsing/formatting for UUID vs BYTES. |
| pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java | Emit UUID IN operands as canonical strings instead of raw bytes literals. |
| pinot-query-planner/src/test/java/org/apache/pinot/query/type/TypeFactoryTest.java | Add UUID type conversion tests and skip UUID array tests. |
| pinot-query-planner/src/test/java/org/apache/pinot/query/planner/serde/RexExpressionSerDeTest.java | Add UUID literal SerDe test and supported type list. |
| pinot-query-planner/src/test/java/org/apache/pinot/query/planner/logical/RelToPlanNodeConverterTest.java | Add UUID column type conversion tests and reject UUID arrays. |
| pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeFactory.java | Map Pinot UUID to Calcite SqlTypeName.UUID. |
| pinot-query-planner/src/main/java/org/apache/pinot/query/planner/serde/RexExpressionToProtoExpression.java | Map UUID column type to proto enum. |
| pinot-query-planner/src/main/java/org/apache/pinot/query/planner/serde/ProtoExpressionToRexExpression.java | Map proto UUID enum back to planner column type. |
| pinot-query-planner/src/main/java/org/apache/pinot/query/planner/physical/v2/PRelToPlanNodeConverter.java | Convert Calcite UUID to Pinot UUID; reject UUID arrays. |
| pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/RexExpressionUtils.java | Add UUID literal conversion to/from Rex values. |
| pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/RelToPlanNodeConverter.java | Convert Calcite UUID to Pinot UUID; reject UUID arrays. |
| pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java | Serialize UUID literals as canonical strings for SQL/parsing paths. |
| pinot-plugins/pinot-input-format/pinot-avro-base/src/test/java/org/apache/pinot/plugin/inputformat/avro/AvroUtilsTest.java | Test Avro<->Pinot schema mapping for UUID logical type. |
| pinot-plugins/pinot-input-format/pinot-avro-base/src/test/java/org/apache/pinot/plugin/inputformat/avro/AvroSchemaUtilTest.java | Test Avro schema JSON object generation for UUID fields. |
| pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroUtils.java | Enable Avro UUID logical type conversion + UUID schema handling. |
| pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroSchemaUtil.java | Map Avro logicalType: uuid to Pinot UUID + emit uuid logical type in Avro schema JSON. |
| pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroIngestionSchemaValidator.java | Fix mismatch message to use extracted Pinot type name. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/UuidTypeTest.java | Offline integration coverage for select/filter/group/distinct/order/join with UUID. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/UuidTypeRealtimeTest.java | Realtime integration coverage via subclassed UUID test. |
| pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/ClusterTest.java | Treat UUID like STRING/BYTES when extracting JSON response values in tests. |
| pinot-core/src/test/java/org/apache/pinot/core/query/selection/SelectionOperatorUtilsTest.java | Verify result formatting distinguishes UUID (canonical) vs BYTES (hex). |
| pinot-core/src/test/java/org/apache/pinot/core/query/pruner/BloomFilterSegmentPrunerTest.java | Add UUID bloom filter pruning test; allow mocking with arbitrary DataType. |
| pinot-core/src/test/java/org/apache/pinot/core/query/distinct/table/BytesDistinctTableTest.java | Test UUID vs BYTES formatting in bytes distinct table (with/without ORDER BY). |
| pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/CastTransformFunctionTest.java | Add UUID cast tests, invalid literal rejection, and MV-source rejection. |
| pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java | Treat UUID like BYTES in group key extraction (raw bytes). |
| pinot-core/src/main/java/org/apache/pinot/core/query/reduce/filter/PredicateRowMatcher.java | Convert UUID row values to bytes before applying predicate evaluator. |
| pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ValueBasedSegmentPruner.java | Hash bloom filter values using DataType-aware string formatting (UUID vs BYTES). |
| pinot-core/src/main/java/org/apache/pinot/core/query/distinct/table/BytesDistinctTable.java | Preserve internal ByteArray and format at the end via schema type (UUID vs BYTES). |
| pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/InTransformFunction.java | Parse IN-list literals as UUID bytes when main function type is UUID. |
| pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/IdentifierTransformFunction.java | Provide UUID string rendering for UUID columns (from underlying bytes). |
| pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/CastTransformFunction.java | Add CAST(... AS UUID) support (STRING/BYTES -> UUID) and string rendering. |
| pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunction.java | Add UUID metadata and UUID->STRING rendering; prevent generic UUID-as-bytes fallback. |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RawValueInvertedIndexFilterOperator.java | Support raw inverted index filtering for BYTES and UUID literals. |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/PredicateUtils.java | Add UUID IN-predicate dictionary id set computation. |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/NotInPredicateEvaluatorFactory.java | Add UUID raw predicate evaluator support (bytes-set with UUID type). |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/NotEqualsPredicateEvaluatorFactory.java | Add UUID equals/neq evaluator support for dict and raw paths. |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/InPredicateEvaluatorFactory.java | Add UUID IN evaluator support (bytes-set with UUID type). |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/EqualsPredicateEvaluatorFactory.java | Add UUID equals evaluator support for dict and raw paths. |
| pinot-common/src/test/java/org/apache/pinot/common/utils/PinotDataTypeTest.java | Add UUID conversions and type inference tests. |
| pinot-common/src/test/java/org/apache/pinot/common/utils/DataSchemaTest.java | Add UUID column type coverage (compat, formatting, conversion). |
| pinot-common/src/test/java/org/apache/pinot/common/response/encoder/JsonResponseEncoderTest.java | Add UUID round-trip encoding/decoding test for result tables. |
| pinot-common/src/test/java/org/apache/pinot/common/request/context/RequestContextUtilsTest.java | Test filter conversion for UUID cast literals on RHS. |
| pinot-common/src/test/java/org/apache/pinot/common/function/FunctionUtilsTest.java | Test UUID Java type mappings to Pinot types and Calcite rel types. |
| pinot-common/src/main/proto/expressions.proto | Add UUID to proto ColumnDataType enum. |
| pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java | Add UUID PinotDataType and conversions/toInternal handling. |
| pinot-common/src/main/java/org/apache/pinot/common/utils/DataSchema.java | Add UUID ColumnDataType, internal/external conversions, formatting and rel type mapping. |
| pinot-common/src/main/java/org/apache/pinot/common/response/encoder/JsonResponseEncoder.java | Treat UUID like STRING/BYTES when extracting JSON-encoded row values. |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java | Add literal-only CAST evaluation on predicate RHS; support UUID cast literals. |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/BaseInPredicate.java | Add UUID value parsing/cache for IN predicates. |
| pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java | Reuse UuidUtils for UUID bytes conversions. |
| pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java | Add UUID Java type mappings and Calcite rel type mapping. |
...format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroUtils.java
Outdated
Show resolved
Hide resolved
...format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroUtils.java
Show resolved
Hide resolved
271dd9c to
64febd9
Compare
pinot-spi/src/main/java/org/apache/pinot/spi/utils/UuidUtils.java
Outdated
Show resolved
Hide resolved
...ocal/src/main/java/org/apache/pinot/segment/local/segment/index/readers/BytesDictionary.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/segment/local/segment/index/loader/invertedindex/InvertedIndexHandler.java
Outdated
Show resolved
Hide resolved
...java/org/apache/pinot/segment/local/segment/index/loader/bloomfilter/BloomFilterHandler.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/segment/local/realtime/impl/dictionary/MutableDictionaryFactory.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java
Outdated
Show resolved
Hide resolved
...rg/apache/pinot/segment/local/segment/index/creator/inv/RawValueBitmapInvertedIndexTest.java
Show resolved
Hide resolved
68cfdd9 to
92be21a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
UUIDtype to Pinot v1, backed by the existing fixed-width 16-byteBYTESrepresentationCustomDataQueryClusterIntegrationTestWhy
Today Pinot users typically model UUIDs as either
STRINGorBYTES.That works for storage, but it loses type semantics:
BYTESresults render as hex instead of UUID textThis PR adds a native logical UUID type while intentionally reusing Pinot's current
BYTESstorage and wire encodings in v1.V1 design contract
UUIDBYTES8-4-4-4-12)UUIDcolumnsBYTESbehaviorUser guide
1. Schema definition
Single-value UUID columns can be declared directly in the Pinot schema.
{ "schemaName": "events", "dimensionFieldSpecs": [ { "name": "eventId", "dataType": "UUID" }, { "name": "traceId", "dataType": "UUID" } ], "dateTimeFieldSpecs": [ { "name": "ts", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] }If a schema declares a multi-value UUID column, validation now fails clearly.
2. Table behavior
UUID columns work with the same table features as other single-value dimension columns in v1, including:
Example table config fragment:
{ "tableName": "events", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "ts", "schemaName": "events" }, "fieldConfigList": [ { "name": "eventId", "encodingType": "RAW", "indexes": { "bloom": {}, "inverted": {} } } ] }3. Ingest behavior
Pinot normalizes UUID inputs to a 16-byte internal representation.
Accepted v1 inputs:
550e8400-e29b-41d4-a716-446655440000java.util.UUIDbyte[]Supported ingest paths covered in this PR:
logicalType: "uuid"Invalid UUID values fail with explicit validation/conversion errors instead of silently degrading.
4. Query examples
Selection
UUID columns are returned as canonical UUID strings.
CAST to UUID
IN predicate
GROUP BY
DISTINCT
ORDER BY
Equality join
These query patterns are covered for both SSE and MSE where applicable.
5. Result behavior
For columns declared as
UUID:SELECT uuidCol FROM treturns canonical lowercase UUID stringsGROUP BY,DISTINCT, and join outputs also render UUID stringsFor columns declared as
BYTES:BYTESsemantics are changed by this PR6. Existing UUID byte/string helpers
This PR keeps the existing helper functions usable:
toUUIDBytes(...)fromUUIDBytes(...)The new logical UUID type is additive and does not replace plain
BYTESworkflows.Migration notes
Schema evolution constraint
Pinot does not support changing the data type of an existing column in place. This PR adds a new logical type, but it does not change that schema-evolution rule.
From
STRINGIf a column currently stores canonical UUID strings and should behave as a typed UUID column, the practical path is to create a new
UUIDcolumn or a new table/schema and reingest or backfill the data.From
BYTESIf a column already stores 16-byte UUID payloads, declaring a new column as
UUIDgives UUID-aware query behavior and rendering, but existingBYTEScolumns remainBYTEScolumns. Adopting the logical UUID type still requires a new column or table rebuild/reingest rather than an in-place type mutation.Important distinction
A column must be declared as
UUIDto get UUID result rendering. Declaring the column asBYTEScontinues to render hex, even if the underlying bytes happen to represent UUID values.Format compatibility
The
UUIDtype itself does not require a segment or data-table wire format bump in v1. That compatibility statement is about representation, not about in-place schema migration.Scope exclusions in v1
This PR intentionally does not include:
STRINGorBYTEScolumns toUUIDBYTEScolumns as UUIDsValidation
Targeted local validation run for this branch includes:
./mvnw -pl pinot-spi -Dtest=FieldSpecTest,SchemaTest test -Dcheckstyle.skip=true./mvnw -pl pinot-common,pinot-core -am -Dtest=JsonResponseEncoderTest,ArrowResponseEncoderTest,BytesDistinctTableTest,SelectionOperatorUtilsTest,CastTransformFunctionTest,ScalarTransformFunctionWrapperTest -Dsurefire.failIfNoSpecifiedTests=false -Dcheckstyle.skip=true test./mvnw -pl pinot-segment-local,pinot-segment-spi -am -Dtest=MutableDictionaryTest,DefaultNullValueVirtualColumnProviderTest,RawValueBitmapInvertedIndexTest,BloomFilterCreatorTest,BloomFilterSegmentPrunerTest -Dsurefire.failIfNoSpecifiedTests=false -Dcheckstyle.skip=true test./mvnw -pl pinot-plugins/pinot-input-format/pinot-avro-base -am -Dtest=AvroUtilsTest,AvroSchemaUtilTest -Dsurefire.failIfNoSpecifiedTests=false -Dcheckstyle.skip=true test./mvnw -pl pinot-integration-tests -am -Dtest=UuidTypeTest,UuidTypeRealtimeTest -Dsurefire.failIfNoSpecifiedTests=false -Dcheckstyle.skip=true test./mvnw -pl pinot-integration-tests -am -Ppinot-fastdev -Dtest=UuidTypeTest,UuidTypeRealtimeTest -Dsurefire.failIfNoSpecifiedTests=false -Dcheckstyle.skip=true test./mvnw -pl pinot-common checkstyle:check license:check -DskipTestsGitHub Actions on this PR continue to run the full Pinot CI matrix.