Result set Timestamp handling improvements#166
Conversation
DuckDB supports both `TIMESTAMP WITHOUT TIME ZONE` and `TIMESTAMP WITH TIME ZONE`. When gettting Java `Timestamp` from a result set, user can optionally request desired time zone by specifying a `Calendar` instance. Thus we are getting the following invariants: 1. `TIMESTAMP WITH TIME ZONE`, no `Calendar`: returns `Timestamp` in a system-default (JVM) time zone that represents the same moment in time as the DB's `TIMESTAMP WITH TIME ZONE`. For example with a system-default time zone being `EET` (`UTC+2`) the timestamp with `12:00` time and `CET` (`UTC+1`) DB time zone will be returned with `13:00` time (in `EET`). 2. `TIMESTAMP WITH TIME ZONE`, user-specified Calendar: the same logic is used as in point `1.`, just the time-zone from the user-specified Calendar is used instead of the system-default time zone. 3. `TIMESTAMP WITHOUT TIME ZONE`, no `Calendar`: returns the timestamp that has the same numeric values of all components as the DB's `TIMESTAMP WITHOUT TIME ZONE`. For example the timestamp with `12:00` time will be returned with the same `12:00` time in any system-defaut time zone. 4. `TIMESTAMP WITHOUT TIME ZONE`, user-specified `Calendar`: this case is tricky because user is requesting to adjust the timestamp for the specified timezone, but DB's timestamp does not denote any time zone that can be used as a starting point for adjustment. In this case we are using the same logic as Postgres' JDBC driver: https://github.com/pgjdbc/pgjdbc/blob/d86d7ce19030230f14e573b8095f04721d1d8e35/pgjdbc/src/main/java/org/postgresql/jdbc/TimestampUtils.java#L1606 So we assume that the incoming time stamp belongs to the system-default time zone. Then we calculating the offset between the system-default and the user-requested time zones that is effective at the moment of time defined by our timestamp in a system-default time zone. Note, that such offset can be different depending on the time of year due to DST. And then shifting the timestamp by this offset - this can seen as a "reverse" shift comparing to points `1.` and `2.`. For example, with a system-default time zone being `EET` (`UTC+2`) and user-requested time zone being `CET` (`UTC+1`), the timestamp with time `12:00` will be shifted by the difference between `EET` and `CET` and returned as `13:00` (in `CET`). Additionally the logic added for getting date-only and time-only values from the `TIMESTAMP_*` fields is added. It is done by getting the Java `Timestamp` first (using the invariants described above) and then getting date-only or time-only part of this timestamp. Testing: existing tests adjusted to match the new time zone handling logic; new test is added for getting `Date` and `Time` from `TIMESTAMP`. Fixes: duckdb#146
|
Thanks for the PR! Matching the logic of the Postgres driver makes sense to me.
This part is a bit unclear to me - I think that behavior is fine and makes sense - but just figured I would mention this in case that was not clear. |
|
Thanks for the review! Yes, the wording here is ambiguous, under "DB time zone" I mean the time zone offset that is already included with the DB's |
|
Alright - thanks for the clarification! |
This change implements a number of setter methods on `PreparedStatement` that were previously throwing `SQLFeatureNotSupportedException`. `set*Stream` group of methods is implemented only for compatibility with existing tools, no actual streaming is used on DB level - full input is read to string/bytes before passing it to DB. `execute`/`executeUpdate` methods for generated keys are implemented only for cases when generated keys are not requested. `setTime/Date/Timestamp` methods with `Calendar` (time zone) support are implemented fully following the same approach used before in duckdb#166. Also the logic for `setTime` method without `Calendar` is fixed to correctly NOT use default JVM time zone. Testing: new tests added to cover implemented methods; timestamp tests are moved to separate file and time zone-specific tests are added. Fixes: duckdb#195
This change implements a number of setter methods on `PreparedStatement` that were previously throwing `SQLFeatureNotSupportedException`. `set*Stream` group of methods is implemented only for compatibility with existing tools, no actual streaming is used on DB level - full input is read to string/bytes before passing it to DB. `execute`/`executeUpdate` methods for generated keys are implemented only for cases when generated keys are not requested. `setTime/Date/Timestamp` methods with `Calendar` (time zone) support are implemented fully following the same approach used before in duckdb#166. Also the logic for `setTime` method without `Calendar` is fixed to correctly NOT use default JVM time zone. Testing: new tests added to cover implemented methods; timestamp tests are moved to separate file and time zone-specific tests are added. Fixes: duckdb#195
This change implements a number of setter methods on `PreparedStatement` that were previously throwing `SQLFeatureNotSupportedException`. `set*Stream` group of methods is implemented only for compatibility with existing tools, no actual streaming is used on DB level - full input is read to string/bytes before passing it to DB. `execute`/`executeUpdate` methods for generated keys are implemented only for cases when generated keys are not requested. `setTime/Date/Timestamp` methods with `Calendar` (time zone) support are implemented fully following the same approach used before in #166. Also the logic for `setTime` method without `Calendar` is fixed to correctly NOT use default JVM time zone. Testing: new tests added to cover implemented methods; timestamp tests are moved to separate file and time zone-specific tests are added. Fixes: #195
DuckDB supports `TIMESTAMP WITH TIME ZONE` data type that stores UTC dates (with specified time zone already applied on DB insert). ODBC does not have a notion of time zones. When client app reads such `TIMESTAMP WITH TIME ZONE` it needs to be converted into `TIMESTAMP WITHOUT TIME ZONE` with client-local time zone applied. And such converted timestamp must represent the same moment of time as stored UTC timestamp. Applying client-local time zone to timestamp is a non-trivial operation - the UTC offset depends on both the time zone (for fixed shift) and on the timestamp value (for DST shift). There are two options to get the effective offset for the specified UTC timestamp in the specified time zone: - DuckDB ICU extension, that will use DB-configured time zone - OS API, will use OS-configured time zone Variant with ICU extension appeared to be problematic, as we would like to not link neither to ICU extension (C++ API, not in DuckDB C API) nor to ICU itself (C API exists, but symbols include ICU major version that can change). We can do ICU call from ODBC by creating SQL string and executing it. But this conversion must be performed for every value in every row. And there no reasonbly-easy ways to cache the results. Thus the idea of SQL calls was rejected. Variant with OS API matches the approach used in JDBC (see duckdb/duckdb-java#166) where JVM-configured time zone is used (invariant 1 there). The problem with accessing OS API is that `<ctime>` utilities are not thread-safe until `C++20`. Thus Windows `<timezoneapi.h>` and POSIX `<time.h>` API are used directly. And `<ctime>` approach is used in tests for cross-checking. Testing: new test added that fetches `TIMESTAMP WITH TIME ZONE` value as `SQL_C_TYPE_TIMESTAMP` and checks that it is shifted correctly.
DuckDB supports `TIMESTAMP WITH TIME ZONE` data type that stores UTC dates (with specified time zone already applied on DB insert). ODBC does not have a notion of time zones. When client app reads such `TIMESTAMP WITH TIME ZONE` it needs to be converted into `TIMESTAMP WITHOUT TIME ZONE` with client-local time zone applied. And such converted timestamp must represent the same moment of time as stored UTC timestamp. Applying client-local time zone to timestamp is a non-trivial operation - the UTC offset depends on both the time zone (for fixed shift) and on the timestamp value (for DST shift). There are two options to get the effective offset for the specified UTC timestamp in the specified time zone: - DuckDB ICU extension, that will use DB-configured time zone - OS API, will use OS-configured time zone Variant with ICU extension appeared to be problematic, as we would like to not link neither to ICU extension (C++ API, not in DuckDB C API) nor to ICU itself (C API exists, but symbols include ICU major version that can change). We can do ICU call from ODBC by creating SQL string and executing it. But this conversion must be performed for every value in every row. And there no reasonbly-easy ways to cache the results. Thus the idea of SQL calls was rejected. Variant with OS API matches the approach used in JDBC (see duckdb/duckdb-java#166) where JVM-configured time zone is used (invariant 1 there). The problem with accessing OS API is that `<ctime>` utilities are not thread-safe until `C++20`. Thus Windows `<timezoneapi.h>` and POSIX `<time.h>` API are used directly. And `<ctime>` approach is used in tests for cross-checking. Testing: new test added that fetches `TIMESTAMP WITH TIME ZONE` value as `SQL_C_TYPE_TIMESTAMP` and checks that it is shifted correctly.
DuckDB supports `TIMESTAMP WITH TIME ZONE` data type that stores UTC dates (with specified time zone already applied on DB insert). ODBC does not have a notion of time zones. When client app reads such `TIMESTAMP WITH TIME ZONE` it needs to be converted into `TIMESTAMP WITHOUT TIME ZONE` with client-local time zone applied. And such converted timestamp must represent the same moment of time as stored UTC timestamp. Applying client-local time zone to timestamp is a non-trivial operation - the UTC offset depends on both the time zone (for fixed shift) and on the timestamp value (for DST shift). There are two options to get the effective offset for the specified UTC timestamp in the specified time zone: - DuckDB ICU extension, that will use DB-configured time zone - OS API, will use OS-configured time zone Variant with ICU extension appeared to be problematic, as we would like to not link neither to ICU extension (C++ API, not in DuckDB C API) nor to ICU itself (C API exists, but symbols include ICU major version that can change). We can do ICU call from ODBC by creating SQL string and executing it. But this conversion must be performed for every value in every row. And there no reasonbly-easy ways to cache the results. Thus the idea of SQL calls was rejected. Variant with OS API matches the approach used in JDBC (see duckdb/duckdb-java#166) where JVM-configured time zone is used (invariant 1 there). The problem with accessing OS API is that `<ctime>` utilities are not thread-safe until `C++20`. Thus Windows `<timezoneapi.h>` and POSIX `<time.h>` API are used directly. And `<ctime>` approach is used in tests for cross-checking. Testing: new test added that fetches `TIMESTAMP WITH TIME ZONE` value as `SQL_C_TYPE_TIMESTAMP` and checks that it is shifted correctly.
DuckDB supports `TIMESTAMP WITH TIME ZONE` data type that stores UTC dates (with specified time zone already applied on DB insert). ODBC does not have a notion of time zones. When client app reads such `TIMESTAMP WITH TIME ZONE` it needs to be converted into `TIMESTAMP WITHOUT TIME ZONE` with client-local time zone applied. And such converted timestamp must represent the same moment of time as stored UTC timestamp. Applying client-local time zone to timestamp is a non-trivial operation - the UTC offset depends on both the time zone (for fixed shift) and on the timestamp value (for DST shift). There are two options to get the effective offset for the specified UTC timestamp in the specified time zone: - DuckDB ICU extension, that will use DB-configured time zone - OS API, will use OS-configured time zone Variant with ICU extension appeared to be problematic, as we would like to not link neither to ICU extension (C++ API, not in DuckDB C API) nor to ICU itself (C API exists, but symbols include ICU major version that can change). We can do ICU call from ODBC by creating SQL string and executing it. But this conversion must be performed for every value in every row. And there no reasonbly-easy ways to cache the results. Thus the idea of SQL calls was rejected. Variant with OS API matches the approach used in JDBC (see duckdb/duckdb-java#166) where JVM-configured time zone is used (invariant 1 there). The problem with accessing OS API is that `<ctime>` utilities are not thread-safe until `C++20`. Thus Windows `<timezoneapi.h>` and POSIX `<time.h>` API are used directly. And `<ctime>` approach is used in tests for cross-checking. Testing: new test added that fetches `TIMESTAMP WITH TIME ZONE` value as `SQL_C_TYPE_TIMESTAMP` and checks that it is shifted correctly. Note: this commit is a re-do for 8dbf0ac that was added with incorrect message. Original PR for it is #127.
DuckDB supports both
TIMESTAMP WITHOUT TIME ZONEandTIMESTAMP WITH TIME ZONE. When gettting JavaTimestampfrom a result set, user can optionally request desired time zone by specifying aCalendarinstance.Thus we are getting the following invariants:
TIMESTAMP WITH TIME ZONE, noCalendar: returnsTimestampin a system-default (JVM) time zone that represents the same moment in time as the DB'sTIMESTAMP WITH TIME ZONE. For example with a system-default time zone beingEET(UTC+2) the timestamp with12:00time andCET(UTC+1) DB time zone will be returned with13:00time (inEET).TIMESTAMP WITH TIME ZONE, user-specified Calendar: the same logic is used as in point1., just the time-zone from the user-specified Calendar is used instead of the system-default time zone.TIMESTAMP WITHOUT TIME ZONE, noCalendar: returns the timestamp that has the same numeric values of all components as the DB'sTIMESTAMP WITHOUT TIME ZONE. For example the timestamp with12:00time will be returned with the same12:00time in any system-defaut time zone.TIMESTAMP WITHOUT TIME ZONE, user-specifiedCalendar: this case is tricky because user is requesting to adjust the timestamp for the specified timezone, but DB's timestamp does not denote any time zone that can be used as a starting point for adjustment. In this case we are using the same logic as Postgres' JDBC driver:https://github.com/pgjdbc/pgjdbc/blob/d86d7ce19030230f14e573b8095f04721d1d8e35/pgjdbc/src/main/java/org/postgresql/jdbc/TimestampUtils.java#L1606
So we assume that the incoming time stamp belongs to the system-default time zone. Then we calculating the offset between the system-default and the user-requested time zones that is effective at the moment of time defined by our timestamp in a system-default time zone. Note, that such offset can be different depending on the time of year due to DST. And then shifting the timestamp by this offset - this can seen as a "reverse" shift comparing to points
1.and2.. For example, with a system-default time zone beingEET(UTC+2) and user-requested time zone beingCET(UTC+1), the timestamp with time12:00will be shifted by the difference betweenEETandCETand returned as13:00(inCET).Additionally the logic added for getting date-only and time-only values from the
TIMESTAMP_*fields is added. It is done by getting the JavaTimestampfirst (using the invariants described above) and then getting date-only or time-only part of this timestamp.Testing: existing tests adjusted to match the new time zone handling logic; new test is added for getting
DateandTimefromTIMESTAMP.Fixes: #146