Skip to content

Conversation

@rynewang
Copy link

@rynewang rynewang commented Jan 6, 2026

Rationale for this change

As noted by @wgtmac in #48717 (comment), calling std::memcpy with a nullptr source is undefined behavior according to the C++ standard, even when the size is 0.

When ByteArray has len=0 and ptr=nullptr (the default-constructed state per types.h:649), the current code invokes UB.

What changes are included in this PR?

Added guards to check len > 0 before calling memcpy in:

  • TypedStatisticsImpl<ByteArrayType>::Copy in statistics.cc
  • DictDecoderImpl<ByteArrayType>::SetDict in decoder.cc
  • Test utilities in statistics_test.cc and column_writer_test.cc

Are these changes tested?

Existing tests pass. The fix prevents potential UB that sanitizers or optimizing compilers could exploit.

Are there any user-facing changes?

No.

…nullptr

When ByteArray has len=0 and ptr=nullptr (the default-constructed state),
calling std::memcpy with nullptr is undefined behavior according to the
C++ standard, even when size is 0.

This commit adds guards to check len > 0 before calling memcpy in:
- TypedStatisticsImpl<ByteArrayType>::Copy (statistics.cc)
- DictDecoderImpl<ByteArrayType>::SetDict (decoder.cc)
- Test utilities (statistics_test.cc, column_writer_test.cc)

Claude-Generated-By: Claude Code (cli/claude-opus-4-5=100%)
Claude-Steers: 1
Claude-Permission-Prompts: 8
Claude-Escapes: 1
@rynewang rynewang requested a review from wgtmac as a code owner January 6, 2026 17:58
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

⚠️ GitHub issue #48744 has been automatically assigned in GitHub to PR creator.

@wgtmac
Copy link
Member

wgtmac commented Jan 8, 2026

BTW, are there other similar calls in the parquet folder? I just did a quick peek and it seems that

void ConcatenateBuffers(int64_t definition_levels_rle_size,
is a missing part?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 8, 2026
Co-authored-by: Gang Wu <ustcwg@gmail.com>
@rynewang
Copy link
Author

rynewang commented Jan 8, 2026

this is getting scattered all the places, so I think we can just provide a SafeMemcpy that does the size>0 check and do memcpy, will update the branch

Introduce a centralized SafeMemcpy function in types.h that guards against
calling std::memcpy with nullptr, which is undefined behavior even when
size is 0.

Replace memcpy calls throughout parquet with SafeMemcpy where the size
is dynamic and could be 0 (ByteArray lengths, buffer sizes, RLE sizes).
Keep std::memcpy for static sizes (sizeof, FLBA type_length) that are
guaranteed > 0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants