Skip to content

[CALCITE-7393] Support RelDataTypeDigest#4761

Open
zhuwenzhuang wants to merge 1 commit intoapache:mainfrom
zhuwenzhuang:CALCITE-6907
Open

[CALCITE-7393] Support RelDataTypeDigest#4761
zhuwenzhuang wants to merge 1 commit intoapache:mainfrom
zhuwenzhuang:CALCITE-6907

Conversation

@zhuwenzhuang
Copy link
Contributor

Use RelDataTypeDigest to reduce composite type's digest memory and digest relative operation's latency.
test case: HepPlannerTest.testLargeTypeDigest
This is the first part of latency/memory optimization of large plan.

@zhuwenzhuang zhuwenzhuang changed the title [CALCITE-6907] support RelDataTypeDigest [CALCITE-6907] Support RelDataTypeDigest Jan 22, 2026
@xiedeyantu
Copy link
Member

Thank you for your contribution! I have a few suggestions:
First, please ensure consistency between the Jira title, commit message, and PR title.
Second, please do not arbitrarily change or delete public/protected methods or variables in existing classes (they can be marked as deprecated).
Third, any errors in the CI need to be checked again.

/**
* Digest of a RelDataType.
*/
public interface RelDataTypeDigest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be two interfaces, in case Digest is useful somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the hashCode caching in RelDigest and RexCall already addresses some of the hashCode caching/comparison latency issues. Could you please clarify what you mean by “be two interfaces”?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface HasDigestString {
   String getDigestString()
}

interface RelDataTypeDigest extends HasDigestString {
   RelDataType getType();
}

Other classes may implement HasDigestString.

}

@Override public boolean equals(@Nullable Object obj) {
@Override public boolean deepEquals(Object obj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need to leave the old implementations around too, they may break users who expect them to exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow user to implement equals/hashCode for nested type, cache for hashCode and pointer compare for deepEquals will not work for them.
This situation is similiar to https://issues.apache.org/jira/browse/CALCITE-3786


protected final @Nullable List<RelDataTypeField> fieldList;
protected @Nullable String digest;
protected RelDataTypeDigest digest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing protected fields is a breaking change, which has to be reflected in the semantic versioning. This would require a 2.0 release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, I don't know what else you can do to make the digest lazy. Maybe other people can weigh on this dilemma. This is why you should probably file a JIRA issue just for this topic and discuss the design there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can allow user to set String digest to define their own digest. otherwise, we use the RelDataTypeDigest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is that some protected/public variables and methods must still be guaranteed to work properly. For example, this
"String digest" might eventually call
"toString" from
"RelDataTypeDigest" to get the value? Because these variables are exposed to users, and someone might currently be using them, we need to provide a compatibility fix. If you think reassigning the original variable would go against your design intention (e.g., it might increase memory usage instead of reducing it), then you need to declare that this is a breaking change.

@zhuwenzhuang zhuwenzhuang changed the title [CALCITE-6907] Support RelDataTypeDigest [CALCITE-7393] Support RelDataTypeDigest Jan 23, 2026
@zhuwenzhuang zhuwenzhuang force-pushed the CALCITE-6907 branch 4 times, most recently from 1f17a42 to 955f537 Compare January 26, 2026 01:49
@zhuwenzhuang zhuwenzhuang marked this pull request as draft January 26, 2026 01:55
@zhuwenzhuang zhuwenzhuang reopened this Jan 26, 2026
@zhuwenzhuang zhuwenzhuang force-pushed the CALCITE-6907 branch 2 times, most recently from ac7bf74 to 078f2e4 Compare January 26, 2026 03:09
@zhuwenzhuang zhuwenzhuang marked this pull request as ready for review January 26, 2026 03:48
@mihaibudiu
Copy link
Contributor

I don't know how to review this; the change is reasonable in itself, but I don't know if this is an acceptable breaking change. I would appreciate comments from people who have dealt with similar breaking changes in the past.

Copy link
Member

@xiedeyantu xiedeyantu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change has a fairly wide impact. I'm wondering if it's possible to control the new/old logic by introducing configuration options. To be honest, I can't judge at this point how the current changes will affect future compatibility. Although your changes may be good, the community probably values ​​compatibility more.

@zhuwenzhuang
Copy link
Contributor Author

@xiedeyantu
Thanks, I think use CalciteSystemProperties to control compatibility is a good idea.
The patch is updated base on 'calcite.disable.generate.type.digest.string', and a benchmark TypeDigestBenchmark added.

Use structured innerDigest instead of string digest for composite/UDT types
to reduce memory and improve hashCode/equals latency. Controlled by
`calcite.disable.generate.type.digest.string` (default: false).

Legacy string digest is still used in hashCode/equals if explicitly set,
ensuring backward compatibility.

TestCase: TypeDigestBenchmark
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants