-
Notifications
You must be signed in to change notification settings - Fork 382
fix: Reserved sort order ID cannot contain any fields #1978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Reserved sort order ID cannot contain any fields #1978
Conversation
- Closes apache#1963. This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values. Added the unit test described in the issue and verified that the check is now enforced.
|
@CTTY: Please review this PR when you get a chance, thanks. |
| if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID) | ||
| && !sort_order.fields.is_empty() | ||
| { | ||
| return Err(Error::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to fail, or do we want to drop the fields and emit a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't allow non-empty fields for reserved sort order ID: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/SortOrder.java#L269
IMO it feels safer to just report an error to the client that there's something "fishy" about their metadata. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think failing here is better and more consistent to Java's behavior
CTTY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm not very familiar with the code path and was wondering why we need to use try_normalize to do the validation when parsing TableMetadata. Looking at iceberg-java's TableMetadataParser, it just delegates the validation to sub-parsers like PartitionSpecParser and SortOrderBuilder, which I think is a much cleaner pattern so we don't have to rewrite the validation logic. In rust, we probably need to implement deserializer for SortOrder, PartitionSpec and so on individually to adopt this pattern.
This concern should not block this PR tho
| && !sort_order.fields.is_empty() | ||
| { | ||
| return Err(Error::new( | ||
| ErrorKind::DataInvalid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This should be ErrorKind::Unexpected according to https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/sort.rs#L157
| if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID) | ||
| && !sort_order.fields.is_empty() | ||
| { | ||
| return Err(Error::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think failing here is better and more consistent to Java's behavior
Which issue does this PR close?
What changes are included in this PR?
This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values.
Are these changes tested?
Added the unit test described in the issue and verified that the check is now enforced.