Skip to content

Conversation

@aditya-subrahmanyan
Copy link

Which issue does this PR close?

What changes are included in this PR?

This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values.

Are these changes tested?

Added the unit test described in the issue and verified that the check is now enforced.

- Closes apache#1963.

This change validates that table metadata with reserved sort order ID
(0) cannot contain fields associated with it. If this is found, we error
out instead of silently parsing arbitrary field values.

Added the unit test described in the issue and verified that the check
is now enforced.
@aditya-subrahmanyan aditya-subrahmanyan marked this pull request as ready for review January 2, 2026 19:51
@aditya-subrahmanyan
Copy link
Author

@CTTY: Please review this PR when you get a chance, thanks.

if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID)
&& !sort_order.fields.is_empty()
{
return Err(Error::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fail, or do we want to drop the fields and emit a warning?

Copy link
Author

@aditya-subrahmanyan aditya-subrahmanyan Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't allow non-empty fields for reserved sort order ID: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/SortOrder.java#L269

IMO it feels safer to just report an error to the client that there's something "fishy" about their metadata. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think failing here is better and more consistent to Java's behavior

Copy link
Collaborator

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I'm not very familiar with the code path and was wondering why we need to use try_normalize to do the validation when parsing TableMetadata. Looking at iceberg-java's TableMetadataParser, it just delegates the validation to sub-parsers like PartitionSpecParser and SortOrderBuilder, which I think is a much cleaner pattern so we don't have to rewrite the validation logic. In rust, we probably need to implement deserializer for SortOrder, PartitionSpec and so on individually to adopt this pattern.

This concern should not block this PR tho

&& !sort_order.fields.is_empty()
{
return Err(Error::new(
ErrorKind::DataInvalid,
Copy link
Collaborator

@CTTY CTTY Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should be ErrorKind::Unexpected according to https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/sort.rs#L157

if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID)
&& !sort_order.fields.is_empty()
{
return Err(Error::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think failing here is better and more consistent to Java's behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing validation for reserved sort order ID 0 during metadata JSON parsing

3 participants