-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-36518: [Java] Fix ArrowFlightJdbcTimeStampVectorAccessor to return Timestamp objects with date and time that corresponds with local time instead of UTC date and time #43149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
2ecbeec
Updated all time related logic to be in line with JDBC API
jhmannok 3952103
Modified comments
jhmannok 3e756e7
- Refactored duplicate code in ArrowFlightJdbcTimeStampVectorAccessor…
jhmannok f104d0d
- renamed compareOffset to a more descriptive name
jhmannok 609074a
ArrowFlightJdbcTimeStampVectorAccessor: Made assigning to sourceTimeZ…
jhmannok 9c03e42
Refactored getLongToUTCDateTimeForVector
jhmannok ce7982d
Executed spotless after cherry picks
scruz-denodo bd31dc4
Removed unused variables
scruz-denodo df82528
Fixed millis truncation.
scruz-denodo 9f9a239
GH-36518: [Java] Fix ArrowFlightJdbcTimeStampVectorAccessor to return…
scruz-denodo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to use the system locale? It should just be UTC (even that is strictly wrong; the date is unzoned by default and only zoned when a calendar comes into play)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or does JDBC expect that it will be in the system timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, problem here is that
java.sql.Timestampandjava.sql.Dateonly contains a long value with milliseconds. You have to tell java how to print this long, in what timezone.So, the method has to follow the same approach done at
org.apache.arrow.driver.jdbc.accessor.impl.calendar.ArrowFlightJdbcTimeStampVectorAccessor#getLocalDateTimejava.sql.ResultSet#getDate(java.lang.String, java.util.Calendar). This is done on this line..atZone(sourceTimeZone).withZoneSameInstant(defaultTimeZone)When this method is used
java.sql.ResultSet#getDate(java.lang.String, java.util.Calendar), it is expected that the caller prints the date returned on the specified time zone. Something like:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I found a little time to look into this finally.
I still don't see why the system timezone has to come into play.
Arrow date is with reference to the UNIX epoch, i.e. UTC. Java Date is with reference to GMT. As the Java docs themselves state, they are technically different but practically the same.
(1) If there is no calendar given, I would expect there to be no datetime arithmetic done, because they are in the same time zone (effectively). It happens to work out here because we convert to/from the system timezone, so the arithmetic here should cancel itself out, but I'm not sure if that will actually be the case around daylight savings time boundaries.
(2) If there is a calendar given, JDBC is frustratingly vague about the intention of the parameter. It seems most sources interpret it as, "treat the date value as a local date (without reference to any timezone) and give me a timestamp that would display that date (at midnight) in the given timezone". In that case, again, I don't see why the system timezone should come into play.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
initially, you have a timestamp at database which does not have an associated time zone. It can be in any of them.
when you invoke the method getTimestamp(int columnIndex), you are saying that the timestamp you are getting is at the default timezone from the JVM.
when you invoke the method getTimestamp(int columnIndex, Calendar cal), you are saying that the timestamp you are getting is at the timezone specified at input
calparameter.Check the example I attached before. You have '2005-06-29 19:19:41' at database.
your JVM is in
America/Los_Angelesand you invoke the getTimestamp(int columnIndex). AtArrowFlightJdbcTimeStampVectorAccessoryou receives that exact timestamp in UTC. But, you have to translate it to the same timestamp value but atAmerica/Los_Angeles(the default JVM timezone).If you print the obtained timestamp, you will see the same value that was at database.
Something like:
2005-06-29 19:19:41.0your JVM is in
America/Los_Angelesand you invoke the getTimestamp(int columnIndex, Calendar cal) specifyingAsia/Tokyo. Same as before, you receive the timestamp in UTC atArrowFlightJdbcTimeStampVectorAccessor, but now you have to put that same timestamp inAsia/Tokyotime zone, because is the one specified at thegetTimestampinvokation.When you print the obtained timestamp, you will see this:
2005-06-29 03:19:41.0This is because the timestamp is printed following your JVM timezone (America/Los_Angeles). But note that it is the expected value because you retrieved the timestamp using
getTimestamp(index, Asia/Tokyo-calendar). So, you specified that the timestamp '2005-06-29 19:19:41' is at the time zoneAsia/Tokyo.If you check the difference, there are 16 hours less, because
America/Los_Angelesis at GMT-7 andAsia/Tokyois at UTC+9. This is why the timestamp is printed as2005-06-29 03:19:41.0,America/Los_Angelesvalue, which is '2005-06-29 19:19:41.0' atAsia/TokyoThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeesh, system timezone appears to be a convention (e.g. this StackOverflow answer) but doesn't appear to be formally specified. I still think that, given Arrow actually specifies the epoch for its values, introducing the system timezone merely to be consistent with databases that appear not to store this information is strictly wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it could depend on how you want to read this sentence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResultSet#getDate never specifies the time zone when a calendar is not provided, so I guess the question is whether we should use the underlying value's time zone or the system time zone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I will be out for a few days. I will try to answer questions when I return.
In the meantime, I would like to comment:
It is true that documentation is not 100% clear, but it is also true that other JDBC drivers from important vendors work like that, using the default JVM timezone. So, a generic JDBC client will expect the data on that way.
I will check again the code. But the problem is the same. The
java.sql.Datehas also hours, minutes and seconds. So, the Arrow JDBC driver retrieves the value in UTC, ok, but when a newDateis created it is done on the default JVM timezone. So, it can generate a different date than the expected. This is why a similar translation than the applied withTimestampis required.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that python adbc driver is also working as proposed fix.
test_date.zip