Skip to content

[SPARK-56876][SQL][4.X] Add TimestampNTZNanosType and TimestampLTZNanosType#56099

Open
MaxGekk wants to merge 1 commit into
apache:branch-4.xfrom
MaxGekk:nanos-add-types-4.x
Open

[SPARK-56876][SQL][4.X] Add TimestampNTZNanosType and TimestampLTZNanosType#56099
MaxGekk wants to merge 1 commit into
apache:branch-4.xfrom
MaxGekk:nanos-add-types-4.x

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented May 25, 2026

What changes were proposed in this pull request?

This is a backport of #55952 to branch-4.x.

In the PR, I propose to extend the Spark SQL type system, and add new classes to Scala/Java APIs:

  • TimestampNTZNanosType(p)represents the SQL data type TIMESTAMP_NTZ(p)
  • TimestampLTZNanosType(p)represents TIMESTAMP_LTZ(p)

They are public API entry points only, and have no SQL/DDL/datasource integration in this PR.

The classes align with the SQL standard’s direction for optional feature F555, “Enhanced seconds precision”: datetime types can carry fractional seconds with precision p in the SECOND field beyond the traditional six decimal places (microseconds). Here p is restricted to 7, 8, and 9, i.e. the nanosecond-capable band (up to nine fractional digits, nanoseconds in the second field).

The logical layout documented on the classes matches this precision story: epoch microseconds plus nanoseconds within that microsecond, with a default estimated width of 10 bytes for planning (8 + 2).

Parameterless timestamp_ntz / timestamp_ltz are unchanged and remain the existing microsecond-oriented types.

Why are the changes needed?

New timestamp types are useful for Spark SQL users because they allow:

  1. Represent timestamp without time zone and timestamp with local time zone with fractional-second precision 7–9, in line with SQL optional feature F555 (Enhanced seconds precision).
  2. Describe schemas from other systems that already use nanosecond-capable timestamps, without overloading microsecond timestamp_ntz / timestamp_ltz types.
  3. Migrate SQL and metadata that distinguish NTZ and LTZ at sub-microsecond precision toward Spark in small, reviewable steps.
  4. Prepare later work to read and write such columns from datasources and JDBC, and to apply optimizations that depend on precise timestamp types.

Does this PR introduce any user-facing change?

Public API adds two new types in org.apache.spark.sql.types; they cannot yet be used in DataFrames, schemas read from datasources, or SQL DDL.

How was this patch tested?

By extending DataTypeSuite (round-trip and precision bounds for the new types, including invalid precisions).

$ build/sbt "test:testOnly *DataTypeSuite"

Plus SparkThrowableSuite / error-json validation if error-conditions.json is updated.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Authored-by: Maxim Gekk max.gekk@gmail.com
(cherry picked from commit 1e59b7b)

In the PR, I propose to extend the Spark SQL type system, and add new classes to Scala/Java APIs:

* TimestampNTZNanosType(p)represents the SQL data type TIMESTAMP\_NTZ(p)
* TimestampLTZNanosType(p)represents TIMESTAMP\_LTZ(p)

They are public API entry points only, and have no SQL/DDL/datasource integration in this PR.

The classes align with the SQL standard’s direction for optional feature F555, “Enhanced seconds precision”: datetime types can carry fractional seconds with precision p in the SECOND field beyond the traditional six decimal places (microseconds). Here p is restricted to 7, 8, and 9, i.e. the nanosecond-capable band (up to nine fractional digits, nanoseconds in the second field).

The logical layout documented on the classes matches this precision story: epoch microseconds plus nanoseconds within that microsecond, with a default estimated width of 10 bytes for planning (8 \+ 2).

Parameterless timestamp\_ntz / timestamp\_ltz are unchanged and remain the existing microsecond-oriented types.

New timestamp types are useful for Spark SQL users because they allow:

1. Represent timestamp without time zone and timestamp with local time zone with fractional-second precision 7–9, in line with SQL optional feature F555 (Enhanced seconds precision).
2. Describe schemas from other systems that already use nanosecond-capable timestamps, without overloading microsecond timestamp\_ntz / timestamp\_ltz types.
3. Migrate SQL and metadata that distinguish NTZ and LTZ at sub-microsecond precision toward Spark in small, reviewable steps.
4. Prepare later work to read and write such columns from datasources and JDBC, and to apply optimizations that depend on precise timestamp types.

Public API adds two new types in org.apache.spark.sql.types; they cannot yet be used in DataFrames, schemas read from datasources, or SQL DDL.

By extending DataTypeSuite (round-trip and precision bounds for the new types, including invalid precisions).
```
$ build/sbt "test:testOnly *DataTypeSuite"
```
Plus SparkThrowableSuite / error-json validation if error-conditions.json is updated.

Generated-by: Claude Opus 4.7

Closes apache#55952 from MaxGekk/nanos-add-types.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 1e59b7b)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented May 25, 2026

@peter-toth @stevomitric Could you review this PR, please. This is a backport to 4.x of already reviewed by you PR: #55952.

Copy link
Copy Markdown
Contributor

@stevomitric stevomitric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants