perf: Optimize startsWith and endsWith string functions #3000
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2973.
Rationale for this change
The
startsWithandendsWithstring functions were previously delegated to DataFusion's built-in scalar functions, which introduced unnecessary overhead and did not fully leverage Comet's native execution capabilities. This PR implements optimized native expressions to improve performance.What changes are included in this PR?
This PR introduces custom StartsWithExpr and EndsWithExpr physical expressions with the following optimizations:
startsWith:compute::starts_withkernel with a pre-allocated pattern array to avoid per-batch allocations.endsWith:StringArraydata, bypassing iterator overhead.memcmp).Files Changed:
How are these changes tested?
CometStringExpressionBenchmark:startsWith: 1.1X faster than Spark (Comet 1887ms vs Spark 2028ms)endsWith: 1.0X parity with Spark (Comet 3389ms vs Spark 3354ms)Benchmark Results
Environment: OpenJDK 64-Bit Server VM 11.0.29+7-LTS on Linux 6.11.0-1018-azure
Processor: AMD EPYC 7763 64-Core Processor
startsWith
endsWith
Summary:
startsWith: 1.1X faster than Spark (1546ms vs 1657ms)endsWith: 1.0X parity with Spark (1562ms vs 1625ms)