Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Dec 26, 2025

Which issue does this PR close?

N/A

Rationale for this change

Benchmark Original (main) After reusable buffer After no-builder Total Improvement
size=1024
i32_random 15.53 µs 7.32 µs 6.41 µs -58.7%
i64_random 16.37 µs 8.01 µs 6.92 µs -57.7%
i64_large_values 15.65 µs 8.34 µs 7.06 µs -54.9%
size=4096
i32_random 57.11 µs 28.39 µs 24.66 µs -56.8%
i64_random 62.19 µs 31.10 µs 27.62 µs -55.6%
i64_large_values 61.10 µs 30.60 µs 26.98 µs -55.8%
size=8192
i32_random 118.71 µs 67.62 µs 51.45 µs -56.7%
i64_random 141.29 µs 62.20 µs 54.19 µs -61.6%
i64_large_values 123.05 µs 60.14 µs 53.45 µs -56.6%

What changes are included in this PR?

Avoid string allocations and re-use a mutable buffer.

Are these changes tested?

Are there any user-facing changes?

@andygrove andygrove added the performance Make DataFusion faster label Dec 26, 2025
@github-actions github-actions bot added the functions Changes to functions implementation label Dec 26, 2025
@andygrove andygrove changed the title perf: Improve performance of to_hex perf: Improve performance of to_hex by 2x Dec 26, 2025
);
}
result.append_value("");
let hex_str = value.to_hex(&mut buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be made faster by handling the values separately from the nulls / avoiding the builder API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it is even faster now! I updated the benchmark results in the PR description.

@andygrove andygrove changed the title perf: Improve performance of to_hex by 2x perf: Improve performance of to_hex (> 2x) Dec 26, 2025
// Start with offset 0
offsets.push(0);

// Process all values directly (including null slots - we write empty strings for nulls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think this code is writing empty strings for nulls, it seems to process whatever value is in the null slot normally

@Dandandan Dandandan added this pull request to the merge queue Dec 27, 2025
Merged via the queue into apache:main with commit e5ca510 Dec 27, 2025
28 of 29 checks passed
@Dandandan
Copy link
Contributor

Thanks @andygrove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants