-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-5733] Simplify "a = ARRAY[1,2] AND a = ARRAY[2,3]" to "false" #4706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| private static boolean multisetLiteralEquals(RexCall left, RexCall right) { | ||
| return HashMultiset.create(left.getOperands()) | ||
| .equals(HashMultiset.create(right.getOperands())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just had a conversation about comparing maps for ordering, the rule that you implemented there is not compatible with this one: this can claim equality while ordering can claim inequality.
The question is: are maps ordered or not? I think the answer depends on the dialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we just discussed. The current goal is to implement a simplification, but it doesn’t seem very logical to skip sorting. Logically speaking, should we also add sorting in the previous PR? Or should we also avoid sorting in the current PR? If it depends on the dialect, that kind of calculation logic doesn’t seem very straightforward to implement. Would it be better to add sorting to the map in the previous PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tricky question, some database like Trino don't allow sorting on map columns (because they don't consider them comparable), and IIRC it's the same for Spark, other might handle it, so as Mihai says, it seems to be dialect dependent, when not comparable we skip these checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your reply. I have a question: do we need to support comparisons of complex types like MAP? I found that DuckDB and ClickHouse support it, but they don't perform sorting. Also, currently I'm using RexSimplify, which doesn't seem to have a way to handle dialect-based comparisons. Do we need to add a configuration to enable or disable comparisons of complex types? Or perhaps it would be good to have Calcite support simplified comparisons of complex types by default?
|
I think this PR can address the issues of CALCITE-5733 and CALCITE-5815. Although CALCITE-5814 is still under discussion, I believe it does not affect the progress of this PR. Excluding the simplification of MAP is a conservative and safe approach. We can proceed to resolve CALCITE-5814 at any time once the discussion reaches a clear conclusion. |
|
The semantics of ARRAY and MULTISET are currently clear. I have excluded the simplification of MAP. Should we wait until the handling of MAP is clarified before proceeding with this PR? |
|
The problem is that there is no spec about the meaning of a constructor call of a MAP, and thus the compiler cannot build a canonical value for constant maps to use in comparisons. (And I don't think there should be a single spec, different implementations have different requirements). |
So, I think it's acceptable if we don't simplify MAP constants and temporarily disable the simplification capabilities for MAP? I've already removed the simplification for MAP constants. Or do we need to implement the simplification of these three complex types together? |
|
Yes, MAP simplifications will need a more extensive JIRA discussion. |
I agree with you that there shouldn't be a set standard; each system can design according to its own requirements. Based on this, could Calcite also choose a system as its default behavior, such as Spark (which doesn't allow MAP comparisons)? Of course, this might be a bit of a tangent for this PR. Returning to the current PR, the current implementation only involves ARRAY and MULTISET. Is this implementation acceptable? |
mihaibudiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we have some tests that involve nested arrays e.g., INT ARRAY ARRAY?
Will the isConstant function work for such objects correctly?
Done! Thanks! |
97e1b9c to
e438767
Compare
|
I will merge the comments in 48 hours if there are no other comments. |
|



See CALCITE-5733