Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] CleanStatistic for ByteArray might prune the zero-sized array #45257

Open
mapleFU opened this issue Jan 14, 2025 · 0 comments
Open

Comments

@mapleFU
Copy link
Member

mapleFU commented Jan 14, 2025

Describe the bug, including details regarding any error messages, version, and platform.

The code[1] would cleanup the min-max stats in Parquet. For ByteArray, we may "Merge" multiple stats when reading from file. Things would be tricky in the code below when min = ""

  1. Code in [2] is empty, so PlainDecode will not be called, and has_min_max_ is true. But ByteArray keeps default constructor, which leaves ptr == nullptr [3]
  2. When call TypedStatistics::Merge, this will call Cleanup [1], and finally, the min-max statistics would leave unchanged.

So, when min = "" being merged, the min-max will keep the old statistics.

[1]

optional<std::pair<ByteArray, ByteArray>> CleanStatistic(

[2]
if (!encoded_min.empty()) {

[3]
ByteArray() : len(0), ptr(NULLPTR) {}

Component(s)

C++, Parquet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant