Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Dec 27, 2025

In trying to fix #19418 I kept getting turned around about what was needed where. The SimplifyInfo trait made it extra hard to understand. I ended up realizing that the main reason for the trait to exist was tests. Removing the trait and adding a builder style API to SimplifyContext made it IMO more ergonomic for tests and other call sites, easier to track the code (no trait opaqueness) and clearer what simplification capabilities are available in each site. This got rid of e.g. some places where we were calling ExecutionProps::new() just to pass that into SimplifyContext which in turn would hand out references to the default query time, a default ContigOptions, etc; or in datafusion/core/src/execution/session_state.rs where we did let dummy_schema = DFSchema::empty().

This let me solve several problems:

  • Can store optimized logical plans for prepared statements
  • Users can optionally run an optimizer pass on logical plans without evaluating time functions

Compared to #19426 this avoids adding a config option and is actually less lines of code (negative diff).

Fixes #19418, closes #19426 (replaces it).

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate functions Changes to functions implementation spark labels Dec 27, 2025
@adriangb adriangb marked this pull request as ready for review December 27, 2025 02:31
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 28, 2025
@Jefffrey
Copy link
Contributor

This is an interesting proposition; I'm not as familiar with the underlying reasons for having SimplifyInfo/SimplifyContext, but I wonder what your thoughts are on the docstring calling out that SimplifyInfo is meant to help with extensibility?

/// Provides the information necessary to apply algebraic simplification to an
/// [Expr]. See [SimplifyContext] for one concrete implementation.
///
/// This trait exists so that other systems can plug schema
/// information in without having to create `DFSchema` objects. If you
/// have a [`DFSchemaRef`] you can use [`SimplifyContext`]
pub trait SimplifyInfo {

cc @alamb who added this docstring back in #3758

@alamb
Copy link
Contributor

alamb commented Dec 29, 2025

This is an interesting proposition; I'm not as familiar with the underlying reasons for having SimplifyInfo/SimplifyContext, but I wonder what your thoughts are on the docstring calling out that SimplifyInfo is meant to help with extensibility?

As I recall, the idea was to permit projects (maybe dask?) use the datafusion parser/frontend without also having to bring in the entire execution engine (e.g. having to have all the expression machinery).

I think we have examples that cover this usage, but it would be good to double check

As long as we aren't messing around with the crate dependencies the basic idea sounds good

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new API makes more sense than the old one.

However, looking at the old API, it seems like you could previously override per-expression level nullability and other properties

https://docs.rs/datafusion/latest/datafusion/logical_expr/simplify/trait.SimplifyInfo.html

Is there any way to leave the old API in place for a few releases to give people time to migrate? We already have a bunch of API changes queued up for 52 I think

Like what about doing something like this?

pub struct SimplifyContext {
    schema: DFSchemaRef,
    query_execution_start_time: Option<DateTime<Utc>>,
    config_options: Arc<ConfigOptions>,
    /// Backwards compatibility API (will be removed in a future release)
    info: Option<Arc<dyn SimplifyInfo>>,
}

And then have a way to create a SimplifyContext with the old simplify info 🤔 

.with_current_time(); // Sets query_execution_start_time to Utc::now()
```

**Available `SimplifyContext` methods:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is probably better left as a link to the documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a docs link

// 2. Type coercion with `ExprSimplifier::coerce`.
let context = SimplifyContext::new(&props).with_schema(Arc::new(df_schema.clone()));
let context = SimplifyContext::default()
.with_schema(Arc::new(df_schema.clone()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably avoid this clone via https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html#method.inner

So like

Suggested change
.with_schema(Arc::new(df_schema.clone()))
.with_schema(Arc::clone(df_schema.inner()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we need a DFSchema not an (arrow) Schema

@adriangb
Copy link
Contributor Author

adriangb commented Dec 29, 2025

As I recall, the idea was to permit projects (maybe dask?) use the datafusion parser/frontend without also having to bring in the entire execution engine (e.g. having to have all the expression machinery).

As far as I can tell the new API has similar or less requirements / dependencies. The SimplifyInfo trait required:

  • ExecutionProps (which requires `ConfigOptions)
  • DataType (from arrow)
    The struct has:
  • DFSchemaRef (maybe could be just SchemaRef?)
  • ConfigOptions

I did a GitHub search for any usages of SimplifyInfo and all I could find was VegaFusion which looks like it could just use the struct.

@github-actions github-actions bot added the common Related to common crate label Dec 29, 2025
@adriangb
Copy link
Contributor Author

Looking at the old API, it seems like you could previously override per-expression level nullability and other properties

Yes but I don't think that was ever used? I get the feeling (based on actual usage) that the idea was more to allow creating a context with or without a schema, with or without a config, etc. But that is better suited to this builder style API than a trait.

I see that trait SimplifyInfo was originally added in #1774. @jimexist I know it's been years since then but do you have any thoughts on it? I don't see any discussion on why it is a trait, etc.

@adriangb
Copy link
Contributor Author

Is there any way to leave the old API in place for a few releases to give people time to migrate? We already have a bunch of API changes queued up for 52 I think

Like what about doing something like this?

I don't think there's a great way to do that without quite a bit of faff for everyone. In particular for the simplify() method we'd have to add a new method, deprecate the existing one, etc. And I don't think we can create a SimplifyContext from a SimplifyInfo since the latter doesn't expose the DFSchema directly, only ExecutionProps and methods for checking nullability, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

option to disable evaluation of stable expressions in optimizer rules

3 participants