Skip to content

Conversation

@Chowdhury-Anik
Copy link

@Chowdhury-Anik Chowdhury-Anik commented Nov 8, 2025

This commit adds four data sink methods for Polars LazyFrame:

  • sink_parquet: Write LazyFrame to Parquet format
  • sink_csv: Write LazyFrame to CSV format
  • sink_ipc: Write LazyFrame to IPC/Feather format
  • sink_ndjson: Write LazyFrame to NDJSON format

These sinks allow users to write LazyFrames directly without needing to call .collect() first, improving performance for large datasets.

Fixes #791

Changes

How I tested this

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- maybe add a test?

Copy link
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need some tests please

@skrawcz
Copy link
Contributor

skrawcz commented Dec 28, 2025

@Chowdhury-Anik just bumping this PR :) would love some tests please.

Chowdhury-Anik and others added 2 commits January 1, 2026 07:29
This commit adds four data sink methods for Polars LazyFrame:
- sink_parquet: Write LazyFrame to Parquet format
- sink_csv: Write LazyFrame to CSV format
- sink_ipc: Write LazyFrame to IPC/Feather format
- sink_ndjson: Write LazyFrame to NDJSON format

These sinks allow users to write LazyFrames directly without needing to call .collect() first, improving performance for large datasets.

Fixes apache#791
@skrawcz skrawcz force-pushed the Chowdhury-Anik-patch-1 branch from a38a271 to 93a56f0 Compare December 31, 2025 20:29
@skrawcz
Copy link
Contributor

skrawcz commented Dec 31, 2025

once #1429 is merged we can rebase this and this should be good to go.

@skrawcz skrawcz self-requested a review December 31, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add data source sinks for Polars Lazyframe implementation

3 participants