[SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps by MaxGekk · Pull Request #56059 · apache/spark

MaxGekk · 2026-05-22T08:14:40Z

What changes were proposed in this pull request?

This PR implements the physical row layer for nanosecond-capable timestamp types, as defined in SPARK-56822 SPIP: Timestamps with nanosecond precision.
Logical types TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) were added in #55952; they still mapped to UninitializedPhysicalType, so values could not be stored or read from InternalRow / UnsafeRow. This change wires up the minimum physical infrastructure that downstream work (casts, Parquet, expressions) can depend on.

SPIP internal representation

Per the SPIP, a value is a composite of:

Epoch microseconds (long, 8 bytes) — same proleptic-Gregorian epoch microsecond count as existing microsecond timestamps
Nanoseconds within that microsecond (short, 0-999) — sub-micro fractional part, not a full nanosecond offset from epoch

Logical defaultSize remains 10 bytes on the types. In UnsafeRow, values use the same variable-length pattern as CalendarInterval: an 8-byte field slot (offset + size) pointing to a 16-byte aligned payload (epochMicros + nanosWithinMicro with padding), so in-place updates remain possible.

Implementation summary

Unsafe value types (distinct classes for NTZ vs LTZ semantics at the Java layer; shared byte layout):
- TimestampNTZNanos - physical value for TIMESTAMP_NTZ(p)
- TimestampLTZNanos - physical value for TIMESTAMP_LTZ(p)
Physical types: PhysicalTimestampNTZNanosType, PhysicalTimestampLTZNanosType registered in PhysicalDataType.applyDefault
Row access: specialized getters/setters on InternalRow, UnsafeRow, UnsafeArrayData, codegen (CodeGenerator, InterpretedUnsafeProjection, SpecializedGettersReader), and literal validation
Columnar: ColumnVector stubs throw SparkUnsupportedOperationException until columnar support is added

Why are the changes needed?

Without a concrete physical type and UnsafeRow accessors, any code path that materializes rows for nanosecond timestamps fails or falls through to UninitializedPhysicalType. This is the unblocker for the rest of sub-tasks.

Does this PR introduce any user-facing change?

No. Logical types exist but are not yet exposed through SQL; behavior of TimestampType, TimestampNTZType, and microsecond storage is unchanged.

How was this patch tested?

unsafe/testOnly *TimestampNanosSuite — unsafe value equality, hashCode, validation (nanosWithinMicro ∈ [0, 999])
catalyst/testOnly *TimestampNanosRowSuite — GenericInternalRow and UnsafeRow roundtrips (NTZ + LTZ, null/non-null), codegen and interpreted unsafe projection, literal validation
DataTypeSuite — PhysicalDataType is not UninitializedPhysicalType for p in {7, 8, 9}; defaultSize remains 10

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor Auto

Register physical types and UnsafeRow/InternalRow accessors for TimestampNTZNanosType and TimestampLTZNanosType so values can be stored and read using the SPIP composite layout (epoch micros + sub-micro nanos).

…anosSuite Wrap the `assertThrows` calls in `invalidNanosWithinMicroNTZ`/`LTZ` so each line stays within the 100-character limit enforced by checkstyle. Co-authored-by: Isaac

…estamps Route nullable TIMESTAMP_NTZ/LTZ nanos fields through UnsafeWriter.write instead of setNull8Bytes so the 16-byte variable-length region is reserved and in-place setTimestampNTZNanos/LTZ remains valid.

…ndian safety writePayload stores nanos via putLong; reading with getShort returned zero on big-endian. Match the write path with getLong and a 16-bit mask.

…timestamps Document unsafe projection support for TimestampNTZNanosType and TimestampLTZNanosType instead of relying only on DatetimeType extending AtomicType. Add a test to pin the contract.

…types Provide defaults in Literal.defaultDefault so planning rules that call Literal.default on TimestampNTZNanosType/TimestampLTZNanosType (e.g. outer- join null replacement, window tolerance) don't throw noDefaultForDataTypeError. Co-authored-by: Isaac

MaxGekk · 2026-05-23T06:58:34Z

@stevomitric Please, have a look at the PR.

… and UnsafeRow storage Add internal scaladoc and code comments explaining the composite value representation, 16-byte UnsafeRow payload, and deferred ordering support.

…entation Wrap scaladoc and Javadoc lines to satisfy scalastyle and checkstyle.

MaxGekk · 2026-05-25T07:15:46Z

@dongjoon-hyun @cloud-fan @felixcheung @peter-toth @mridulm @sunchao friendly ping for review when you have a moment.

This PR wires physical storage for nanosecond timestamp types (TimestampNTZNanos / TimestampLTZNanos) in InternalRow and UnsafeRow, on top of the logical types and parser work already in master.

There is no user-facing SQL change in this PR, but it unblocks the rest of the SPARK-56822 SPIP — casts, Parquet read, analysis gating, ordering/hash, and expression parity all need rows that can hold these values.

stevomitric

Overall looks good - one concern i have here is that we are not following types framework. I found a couple of places where we could integrate - there might be a few more.

stevomitric · 2026-05-25T14:09:52Z

+    assert(unsafeRow.getLong(0) >>> 32 === offset)
+  }
+
+  testBothCodegenAndInterpreted("SPARK-41535: nanos timestamps initialized as null") {


The SPARK-41535 relates to a bug for calendar interval types. I think we shouldn't reuse other bug-tickets when introducing a new type.

The purpose of reference to JIRA tickets is to provide more contexts of why the test is needed at all. But I agree that relation to the calendar interval type might confuse. Let me remove SPARK-41535 from the test title and add more comments.

stevomitric · 2026-05-25T14:24:49Z

    case TimestampNTZType => PhysicalLongType
    case CalendarIntervalType => PhysicalCalendarIntervalType
+    case _: TimestampNTZNanosType => PhysicalTimestampNTZNanosType
+    case _: TimestampLTZNanosType => PhysicalTimestampLTZNanosType


This should go through types framework.

stevomitric · 2026-05-25T14:28:25Z

+    case _: TimestampNTZNanosType =>
+      (input, v) => input.setTimestampNTZNanos(ordinal, v.asInstanceOf[TimestampNTZNanos])
+    case _: TimestampLTZNanosType =>
+      (input, v) => input.setTimestampLTZNanos(ordinal, v.asInstanceOf[TimestampLTZNanos])


We can integrate this into types as well.

MaxGekk · 2026-05-25T14:48:51Z

one concern i have here is that we are not following types framework.

I did that intentionally to: 1. minimize the changes in this PR, 2. Integration to the Type framework could be done later in parallel to other tasks that require functions/APIs introduced by this PR.

stevomitric

I did that intentionally to: 1. minimize the changes in this PR, 2. Integration to the Type framework could be done later in parallel to other tasks that require functions/APIs introduced by this PR.

I see. In that case LGTM.

…s timestamps Rename the test and document its purpose with reference to the CalendarInterval null-init regression (SPARK-41535).

MaxGekk added 6 commits May 22, 2026 09:59

[SPARK-56981][SQL] Add physical representation for nanosecond timestamps

f95e5c6

Register physical types and UnsafeRow/InternalRow accessors for TimestampNTZNanosType and TimestampLTZNanosType so values can be stored and read using the SPIP composite layout (epoch micros + sub-micro nanos).

[SPARK-56981][SQL][FOLLOWUP] Fix checkstyle line length in TimestampN…

d8092dd

…anosSuite Wrap the `assertThrows` calls in `invalidNanosWithinMicroNTZ`/`LTZ` so each line stays within the 100-character limit enforced by checkstyle. Co-authored-by: Isaac

[SPARK-56981][SQL][FOLLOWUP] Fix null unsafe projection for nanos tim…

49effdf

…estamps Route nullable TIMESTAMP_NTZ/LTZ nanos fields through UnsafeWriter.write instead of setNull8Bytes so the 16-byte variable-length region is reserved and in-place setTimestampNTZNanos/LTZ remains valid.

[SPARK-56981][SQL][FOLLOWUP] Read nanosWithinMicro with getLong for e…

f3897bc

…ndian safety writePayload stores nanos via putLong; reading with getShort returned zero on big-endian. Match the write path with getLong and a 16-bit mask.

[SPARK-56981][SQL][FOLLOWUP] Add explicit canSupport cases for nanos …

f2c048f

…timestamps Document unsafe projection support for TimestampNTZNanosType and TimestampLTZNanosType instead of relying only on DatetimeType extending AtomicType. Add a test to pin the contract.

MaxGekk changed the title ~~[WIP][SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps~~ [SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps May 22, 2026

MaxGekk added 2 commits May 24, 2026 11:52

[SPARK-56981][SQL][FOLLOWUP] Document nanos timestamp physical layout…

86ef651

… and UnsafeRow storage Add internal scaladoc and code comments explaining the composite value representation, 16-byte UnsafeRow payload, and deferred ordering support.

[SPARK-56981][SQL][FOLLOWUP] Fix line length in nanos timestamp docum…

3f18668

…entation Wrap scaladoc and Javadoc lines to satisfy scalastyle and checkstyle.

stevomitric reviewed May 25, 2026

View reviewed changes

stevomitric approved these changes May 25, 2026

View reviewed changes

[SPARK-56981][SQL][TEST] Clarify null unsafe projection test for nano…

ba83fe6

…s timestamps Rename the test and document its purpose with reference to the CalendarInterval null-init regression (SPARK-41535).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps#56059

[SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps#56059
MaxGekk wants to merge 9 commits into
apache:masterfrom
MaxGekk:nanos-in-rows

MaxGekk commented May 22, 2026

Uh oh!

MaxGekk commented May 23, 2026

Uh oh!

MaxGekk commented May 25, 2026

Uh oh!

stevomitric left a comment

Uh oh!

stevomitric May 25, 2026

Uh oh!

MaxGekk May 25, 2026

Uh oh!

stevomitric May 25, 2026

Uh oh!

stevomitric May 25, 2026

Uh oh!

MaxGekk commented May 25, 2026

Uh oh!

stevomitric left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxGekk commented May 22, 2026

What changes were proposed in this pull request?

SPIP internal representation

Implementation summary

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented May 23, 2026

Uh oh!

MaxGekk commented May 25, 2026

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

stevomitric May 25, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk May 25, 2026

Choose a reason for hiding this comment

Uh oh!

stevomitric May 25, 2026

Choose a reason for hiding this comment

Uh oh!

stevomitric May 25, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented May 25, 2026

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants