-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Allow logical optimizer to be run without evaluating now() & refactor SimplifyInfo #19505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0d4f96c to
062b0d9
Compare
|
This is an interesting proposition; I'm not as familiar with the underlying reasons for having datafusion/datafusion/expr/src/simplify.rs Lines 25 to 31 in 10db6b3
|
As I recall, the idea was to permit projects (maybe dask?) use the datafusion parser/frontend without also having to bring in the entire execution engine (e.g. having to have all the expression machinery). I think we have examples that cover this usage, but it would be good to double check As long as we aren't messing around with the crate dependencies the basic idea sounds good |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new API makes more sense than the old one.
However, looking at the old API, it seems like you could previously override per-expression level nullability and other properties
https://docs.rs/datafusion/latest/datafusion/logical_expr/simplify/trait.SimplifyInfo.html
Is there any way to leave the old API in place for a few releases to give people time to migrate? We already have a bunch of API changes queued up for 52 I think
Like what about doing something like this?
pub struct SimplifyContext {
schema: DFSchemaRef,
query_execution_start_time: Option<DateTime<Utc>>,
config_options: Arc<ConfigOptions>,
/// Backwards compatibility API (will be removed in a future release)
info: Option<Arc<dyn SimplifyInfo>>,
}
And then have a way to create a SimplifyContext with the old simplify info 🤔 | .with_current_time(); // Sets query_execution_start_time to Utc::now() | ||
| ``` | ||
|
|
||
| **Available `SimplifyContext` methods:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part is probably better left as a link to the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a docs link
| // 2. Type coercion with `ExprSimplifier::coerce`. | ||
| let context = SimplifyContext::new(&props).with_schema(Arc::new(df_schema.clone())); | ||
| let context = SimplifyContext::default() | ||
| .with_schema(Arc::new(df_schema.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably avoid this clone via https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html#method.inner
So like
| .with_schema(Arc::new(df_schema.clone())) | |
| .with_schema(Arc::clone(df_schema.inner())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we need a DFSchema not an (arrow) Schema
As far as I can tell the new API has similar or less requirements / dependencies. The
I did a GitHub search for any usages of |
062b0d9 to
de2ae01
Compare
Yes but I don't think that was ever used? I get the feeling (based on actual usage) that the idea was more to allow creating a context with or without a schema, with or without a config, etc. But that is better suited to this builder style API than a trait. I see that |
I don't think there's a great way to do that without quite a bit of faff for everyone. In particular for the |
In trying to fix #19418 I kept getting turned around about what was needed where. The
SimplifyInfotrait made it extra hard to understand. I ended up realizing that the main reason for the trait to exist was tests. Removing the trait and adding a builder style API toSimplifyContextmade it IMO more ergonomic for tests and other call sites, easier to track the code (no trait opaqueness) and clearer what simplification capabilities are available in each site. This got rid of e.g. some places where we were callingExecutionProps::new()just to pass that intoSimplifyContextwhich in turn would hand out references to the default query time, a defaultContigOptions, etc; or indatafusion/core/src/execution/session_state.rswhere we didlet dummy_schema = DFSchema::empty().This let me solve several problems:
Compared to #19426 this avoids adding a config option and is actually less lines of code (negative diff).
Fixes #19418, closes #19426 (replaces it).