Skip to content

Conversation

@thomasrebele
Copy link
Contributor

@thomasrebele thomasrebele commented Jan 21, 2026

HIVE-29398

What changes were proposed in this pull request?

Add a property to store the timestamp statistics in the long stats field instead of the timestamp stats field. This has been the legacy behavior before HIVE-22311 has been merged.

Why are the changes needed?

Other projects that use the Hive Metastore (e.g., Impala) are still expecting the long stats field. Adding the property makes it possible to switch back to the old behavior.

Does this PR introduce any user-facing change?

No

How was this patch tested?

I've added a unit test and I've manually verified that the stats of a timestamp field in Impala behaves as expected.

  1. Add the property 'hive.metastore.stats.legacy.timestamp.as.long': 'true', to fe/src/test/resources/hive-site.xml.py
  2. Build Impala using Hive with this patch
  3. Start Impala and an Impala shell (./bin/impala-shell.sh)
  4. Execute the following: create table a(t timestamp); insert into a(t) values ('2026-01-02 12:34:45'), (null), ('1999-01-03 11:12:23'); compute stats a; show column stats a;
  5. The #Distinct Values is 2 and #Nulls is 1, so the patch works as expected; without the patch, or if you change the property of step 1 to false, #Distinct Values and #Nulls are both -1

@sonarqubecloud
Copy link

@kasakrisz
Copy link
Contributor

@thomasrebele
The idea that the patch tries to implement looks good to me, but I have some architectural concerns:

I noticed that the parameter timestampAsLong is passed through several levels of the call stack. This could be avoided by using a singleton approach for stats conversion instead of static methods, since more than one implementation could be provided—one for storing the timestamp as-is and another for storing it as a Long. The factory method of the singleton instance could choose the implementation based on the new setting.

Alternatively, we could ask around on the Hive user/dev mailing lists whether HIVE-22311 can be reverted.

@thomasrebele
Copy link
Contributor Author

Thank you for the review, @kasakrisz! I've sent a mail to dev@hive.apache.org.

@thomasrebele
Copy link
Contributor Author

thomasrebele commented Feb 11, 2026

It seems that removing the field is more difficult than expected. I had tried to remove it, but that leads to some ClassCastException, and the stats do not appear when using the DESCRIBE FORMATTED command.

@kasakrisz, how would the setting be passed to the factory? Do you have a similar example in the Hive repo? Where would the factory/singleton live? Would your suggestion pass some kind of StatsFactory factory parameter instead of the timestamp parameter through the call stack?

Alternatively, to make the approach extensible, while keeping it close to the original code, a new class ColumnStatsConf could be passed instead of the timestampAsLong parameter. If necessary, more configuration options could be added to that class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants