Skip to content

Conversation

@Hind-M
Copy link
Member

@Hind-M Hind-M commented Dec 31, 2025

Fix #55
NOTE: This should be rebased/reworked after #73 is merged.

To be discussed:
In sparrow, the variable size binary view layout has the following buffers:

  • length_buffer: 16-byte view structures for each element
  • long_string_storage: Concatenated storage for strings > 12 bytes
  • buffer_sizes: Size information for variadic buffers

The latter is there to be compatible with the Arrow C data interface but is not supposed to be serialized.
The issue is that we serialize all buffers in the current sparrow-ipc codebase without any distinction.
Conditioning buffers serialization could complexify things here.

One solution could be to have the buffer of variadic buffers sizes as metadata in sparrow and not as an actual buffer in the layout - to be exported only with the arrow C data interface (see the apache arrow cpp implementation):
https://github.com/apache/arrow/blob/14bf84856de282fb54fcd3e7482eb7eb14cf521c/cpp/src/arrow/ipc/writer.cc#L498-L508 https://github.com/apache/arrow/blob/14bf84856de282fb54fcd3e7482eb7eb14cf521c/cpp/src/arrow/c/bridge.cc#L589-L613

NOTE/EDIT: After discussing this internally, not serializing the last buffer for view types is handled in sparrow-ipc instead of sparrow since the ArrowArray interface does not handle special buffer cases - not to be serialized.

@codecov-commenter
Copy link

codecov-commenter commented Dec 31, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 97.05882% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@587c242). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...pc/deserialize_variable_size_binary_view_array.hpp 97.29% 1 Missing ⚠️
src/array_deserializer.cpp 92.85% 1 Missing ⚠️
src/flatbuffer_utils.cpp 97.14% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #74   +/-   ##
=======================================
  Coverage        ?   84.44%           
=======================================
  Files           ?       45           
  Lines           ?     2077           
  Branches        ?        0           
=======================================
  Hits            ?     1754           
  Misses          ?      323           
  Partials        ?        0           
Flag Coverage Δ
unittests 84.44% <97.05%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@Alex-PLACET Alex-PLACET left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mergeable once #73 is merged

@Hind-M Hind-M force-pushed the var_size_bin_view branch from 4768ea0 to 5cebeed Compare January 6, 2026 16:25
@Hind-M Hind-M marked this pull request as ready for review January 6, 2026 16:44
@JohanMabille JohanMabille merged commit 6de389b into QuantStack:main Jan 6, 2026
27 of 29 checks passed
@Hind-M Hind-M deleted the var_size_bin_view branch January 6, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support of variable size binary view array

4 participants