Support variable_size_binary_view_array #74
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #55
NOTE: This should be rebased/reworked after #73 is merged.
To be discussed:
In
sparrow, the variable size binary view layout has the following buffers:length_buffer: 16-byte view structures for each elementlong_string_storage: Concatenated storage for strings > 12 bytesbuffer_sizes: Size information for variadic buffersThe latter is there to be compatible with the Arrow C data interface but is not supposed to be serialized.
The issue is that we serialize all buffers in the current
sparrow-ipccodebase without any distinction.Conditioning buffers serialization could complexify things here.
One solution could be to have the buffer of variadic buffers sizes as metadata in
sparrowand not as an actual buffer in the layout - to be exported only with the arrow C data interface (see the apache arrow cpp implementation):https://github.com/apache/arrow/blob/14bf84856de282fb54fcd3e7482eb7eb14cf521c/cpp/src/arrow/ipc/writer.cc#L498-L508 https://github.com/apache/arrow/blob/14bf84856de282fb54fcd3e7482eb7eb14cf521c/cpp/src/arrow/c/bridge.cc#L589-L613
NOTE/EDIT: After discussing this internally, not serializing the last buffer for view types is handled in
sparrow-ipcinstead ofsparrowsince the ArrowArray interface does not handle special buffer cases - not to be serialized.