PyArrow does not correctly read empty categorical columns from Parquet into Pandas DataFrames #45192

narukaze132 · 2025-01-07T17:12:14Z

Describe the bug, including details regarding any error messages, version, and platform.

As documented in pandas-dev/pandas#48883, if a DataFrame with an empty categorical column is saved into a Parquet and subsequently loaded using pyarrow, the column's dtype reverts to object. This issue occurs regardless of what engine was used to save the Parquet, and does not occur when using fastparquet to load the file instead.

Component(s)

Parquet, Python

The text was updated successfully, but these errors were encountered:

narukaze132 added the Type: bug label Jan 7, 2025

github-actions bot added Component: Parquet Component: Python labels Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyArrow does not correctly read empty categorical columns from Parquet into Pandas DataFrames #45192

PyArrow does not correctly read empty categorical columns from Parquet into Pandas DataFrames #45192

narukaze132 commented Jan 7, 2025

PyArrow does not correctly read empty categorical columns from Parquet into Pandas DataFrames #45192

PyArrow does not correctly read empty categorical columns from Parquet into Pandas DataFrames #45192

Comments

narukaze132 commented Jan 7, 2025

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)