perf: optimize `NthValue` when `ignore_nulls` is true #19496

mzabaluev · 2025-12-25T22:28:53Z

Rationale for this change

The PartitionEvaluator implementation for NthValue in DataFusion has a few shortcomings:

When nulls are ignored (meaning the count should skip over them), the evaluation collects an array of all valid indices, to select at most one index accordingly to the First/Last/Nth case.
The memoize implementation gives up in the same condition, even after performing part of the logic!

What changes are included in this PR?

Use only as much iteration over the valid indices as needed for the function case, without collecting all indices.
The memoize implementation does the right thing for FirstValue with ignore_nulls set to true, or returns early for other function cases.

Are these changes tested?

All existing tests pass for FirstValue/LastValue/NthValue.

Are there any user-facing changes?

No.

Instead of collecting all valid indices per batch in PartitionEvaluator for NthValue, use the iterator as appropriate for the case. Even tn the worst case of negative index larger than 1, only a sliding window of N last valid indices is needed.

Handle the case when FirstValue is called with ignore_nulls set to true, can prune the partition on the first non-null value. Also return early for the other function cases in the same condition, rather than grinding some logic only to discard the results.

mzabaluev added 2 commits December 25, 2025 21:59

github-actions bot added the functions Changes to functions implementation label Dec 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize `NthValue` when `ignore_nulls` is true #19496

perf: optimize `NthValue` when `ignore_nulls` is true #19496

mzabaluev commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: optimize NthValue when ignore_nulls is true #19496

Are you sure you want to change the base?

perf: optimize NthValue when ignore_nulls is true #19496

Conversation

mzabaluev commented Dec 25, 2025

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: optimize `NthValue` when `ignore_nulls` is true #19496

perf: optimize `NthValue` when `ignore_nulls` is true #19496