Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Dec 26, 2025

Which issue does this PR close?

N/A

Rationale for this change

Use a re-usable string buffer instead of allocating a new string for each input value.

Benchmark Main (µs) Optimized (µs) Improvement
size=1024, repeat=3
repeat_string_view 76.51 70.14 -8.3%
repeat_string 78.63 71.41 -9.2%
repeat_large_string 76.40 71.08 -7.0%
size=1024, repeat=30
repeat_string_view 109.02 93.51 -14.2%
repeat_string 108.46 92.12 -15.1%
repeat_large_string 105.99 91.66 -13.5%
size=4096, repeat=3
repeat_string_view 139.44 113.95 -18.3%
repeat_string 133.62 112.25 -16.0%
repeat_large_string 131.94 108.41 -17.8%
size=4096, repeat=30
repeat_string_view 251.77 193.95 -23.0%
repeat_string 250.58 191.86 -23.4%
repeat_large_string 248.88 188.43 -24.3%
overflow tests
size=1024 58.14 58.02 ~0% (no change)
size=4096 58.26 58.08 ~0% (no change)

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Dec 26, 2025
@andygrove andygrove added the performance Make DataFusion faster label Dec 26, 2025
@Jefffrey
Copy link
Contributor

We could even try creating the values/offsets/null buffers manually, in order to copy the strings directly into the values buffer instead of the intermediate buffer to skip builder API completely, but perhaps gets too into the weeds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants