Skip to content

Conversation

@xiedeyantu
Copy link
Member


private static boolean multisetLiteralEquals(RexCall left, RexCall right) {
return HashMultiset.create(left.getOperands())
.equals(HashMultiset.create(right.getOperands()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just had a conversation about comparing maps for ordering, the rule that you implemented there is not compatible with this one: this can claim equality while ordering can claim inequality.

The question is: are maps ordered or not? I think the answer depends on the dialect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we just discussed. The current goal is to implement a simplification, but it doesn’t seem very logical to skip sorting. Logically speaking, should we also add sorting in the previous PR? Or should we also avoid sorting in the current PR? If it depends on the dialect, that kind of calculation logic doesn’t seem very straightforward to implement. Would it be better to add sorting to the map in the previous PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a tricky question, some database like Trino don't allow sorting on map columns (because they don't consider them comparable), and IIRC it's the same for Spark, other might handle it, so as Mihai says, it seems to be dialect dependent, when not comparable we skip these checks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reply. I have a question: do we need to support comparisons of complex types like MAP? I found that DuckDB and ClickHouse support it, but they don't perform sorting. Also, currently I'm using RexSimplify, which doesn't seem to have a way to handle dialect-based comparisons. Do we need to add a configuration to enable or disable comparisons of complex types? Or perhaps it would be good to have Calcite support simplified comparisons of complex types by default?

@xiedeyantu
Copy link
Member Author

xiedeyantu commented Dec 28, 2025

I think this PR can address the issues of CALCITE-5733 and CALCITE-5815. Although CALCITE-5814 is still under discussion, I believe it does not affect the progress of this PR. Excluding the simplification of MAP is a conservative and safe approach. We can proceed to resolve CALCITE-5814 at any time once the discussion reaches a clear conclusion.

@xiedeyantu
Copy link
Member Author

The semantics of ARRAY and MULTISET are currently clear. I have excluded the simplification of MAP. Should we wait until the handling of MAP is clarified before proceeding with this PR?

@mihaibudiu
Copy link
Contributor

The problem is that there is no spec about the meaning of a constructor call of a MAP, and thus the compiler cannot build a canonical value for constant maps to use in comparisons. (And I don't think there should be a single spec, different implementations have different requirements).

@xiedeyantu
Copy link
Member Author

The problem is that there is no spec about the meaning of a constructor call of a MAP, and thus the compiler cannot build a canonical value for constant maps to use in comparisons. (And I don't think there should be a single spec, different implementations have different requirements).

So, I think it's acceptable if we don't simplify MAP constants and temporarily disable the simplification capabilities for MAP? I've already removed the simplification for MAP constants. Or do we need to implement the simplification of these three complex types together?

@mihaibudiu
Copy link
Contributor

Yes, MAP simplifications will need a more extensive JIRA discussion.
It is true that MAP values that have a different key cannot be equal, but it's not obvious how this information makes it all the way to the simplification code.

@xiedeyantu
Copy link
Member Author

Yes, MAP simplifications will need a more extensive JIRA discussion. It is true that MAP values that have a different key cannot be equal, but it's not obvious how this information makes it all the way to the simplification code.

I agree with you that there shouldn't be a set standard; each system can design according to its own requirements. Based on this, could Calcite also choose a system as its default behavior, such as Spark (which doesn't allow MAP comparisons)? Of course, this might be a bit of a tangent for this PR. Returning to the current PR, the current implementation only involves ARRAY and MULTISET. Is this implementation acceptable?

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have some tests that involve nested arrays e.g., INT ARRAY ARRAY?
Will the isConstant function work for such objects correctly?

@xiedeyantu
Copy link
Member Author

could we have some tests that involve nested arrays e.g., INT ARRAY ARRAY? Will the isConstant function work for such objects correctly?

Done! Thanks!

@xiedeyantu
Copy link
Member Author

I will merge the comments in 48 hours if there are no other comments.

@sonarqubecloud
Copy link

@xiedeyantu xiedeyantu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Dec 31, 2025
@xiedeyantu xiedeyantu merged commit 92cfb30 into apache:main Jan 2, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants