Skip to content

Conversation

@treysp
Copy link
Collaborator

@treysp treysp commented Dec 22, 2025

CORR performs an internal calculation that has VARIANCE([column]) in the denominator. When that occurs, Snowflake returns NULL while DuckDB returns NaN.

To replicate the Snowflake's NULL in DuckDB, we must wrap the CORR call in ISNAN:

CASE WHEN ISNAN(CORR(a, b)) THEN NULL ELSE CORR(a, b) END

Naively wrapping in ISNAN generates invalid syntax for windowed and filtered function calls.

Therefore, we add special DuckDB generator handling to route exp.Window, exp.Window(exp.Filter), and exp.Filter to a corr_sql method designed to accept window and filtered expressions.

Examples:

  • Snowflake: SELECT CORR(a, b) OVER (PARTITION BY c)
    • DuckDB: SELECT CASE WHEN ISNAN(CORR(a, b) OVER (PARTITION BY c)) THEN NULL ELSE CORR(a, b) OVER (PARTITION BY c) END
  • Snowflake: SELECT CORR(a, b) FILTER (WHERE c > 0)
    • DuckDB: SELECT CASE WHEN ISNAN(CORR(a, b) FILTER(WHERE c > 0)) THEN NULL ELSE CORR(a, b) FILTER(WHERE c > 0) END
  • Snowflake: SELECT CORR(a, b) FILTER (WHERE c > 0) OVER (PARTITION BY d)
    • DuckDB: SELECT CASE WHEN ISNAN(CORR(a, b) FILTER(WHERE c > 0) OVER (PARTITION BY d)) THEN NULL ELSE CORR(a, b) FILTER(WHERE c > 0) OVER (PARTITION BY d) END

@treysp treysp requested a review from georgesittas December 22, 2025 22:44
@github-actions
Copy link
Contributor

github-actions bot commented Dec 22, 2025

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:trey/corr-sf-duckdb, sqlglot version: trey/corr-sf-duckdb)
  • baseline (main, sqlglot version: 28.5.1.dev43)

⚠️ Limited to dialects: duckdb, snowflake

By Dialect

dialect main sqlglot:trey/corr-sf-duckdb difference links
duckdb -> duckdb 4003/4003 passed (100.0%) 4003/4003 passed (100.0%) No change full result / delta
snowflake -> duckdb 626/1085 passed (57.7%) 626/1085 passed (57.7%) No change full result / delta
snowflake -> snowflake 981/1085 passed (90.4%) 981/1085 passed (90.4%) No change full result / delta

Overall

main: 6173 total, 5610 passed (pass rate: 90.9%), sqlglot version: 28.5.1.dev43

sqlglot:trey/corr-sf-duckdb: 6173 total, 5610 passed (pass rate: 90.9%), sqlglot version: trey/corr-sf-duckdb

Difference: No change

return super().window_sql(expression)

corr_no_null = corr.copy()
corr_no_null.set("null_on_zero_variance", False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set this to false here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this unwrapping and arg setting into a small helper - see what you think

@treysp treysp force-pushed the trey/corr-sf-duckdb branch from 65a57ea to 0d175bc Compare December 29, 2025 23:04
def _maybe_corr_null_to_false(
expression: t.Union[exp.Filter, exp.Window, exp.Corr],
) -> t.Optional[t.Union[exp.Filter, exp.Window, exp.Corr]]:
expr = expression.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copy necessary ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A window/filter AST containing CORR could be modified here then passed through to other rendering methods.

I was under the impression that, when modifying an AST, we should copy if it will then be passed to unknown downstream consumers.

It sounds like that's not correct?

Copy link
Collaborator

@geooo109 geooo109 Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, I see the point, ^ this works.

I was thinking that .set doesn't break anything here, so we could skip the copy in some cases.

What do you think @VaggelisD ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants