Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyArrow does not correctly read empty categorical columns from Parquet into Pandas DataFrames #45192

Open
narukaze132 opened this issue Jan 7, 2025 · 0 comments

Comments

@narukaze132
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

As documented in pandas-dev/pandas#48883, if a DataFrame with an empty categorical column is saved into a Parquet and subsequently loaded using pyarrow, the column's dtype reverts to object. This issue occurs regardless of what engine was used to save the Parquet, and does not occur when using fastparquet to load the file instead.

Component(s)

Parquet, Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant