Incorrectly raised SchemaError when validating multiindex DataFrame #1134
-
Given this minimal example
Expected behaviour is that the validation should pass, since the two index columns contain boolean fields. However, instead I get the following error:
This issue seems to only occur with MultiIndex DataFrames and with boolean fields. Changing type from bool to int magically resolves the issue. I would like to know if anyone knows a workaround for this, if I am misinterpreting anything about defining the schemas? Thanks in advance! I am using pandera version 0.14.4 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
hi @ErikLundin98 what version of pandas and python are you using? |
Beta Was this translation helpful? Give feedback.
-
@cosmicBboy, I'm using python 3.10.10 and pandas 1.3.5 |
Beta Was this translation helpful? Give feedback.
-
So unfortunately pandas 1.3.5 has a bunch of issues with index data types... see this StringDtype pandera/tests/core/test_schema_components.py Lines 838 to 856 in fe83c19 This is purely a pandas issue: In [1]: import pandas as pd
In [2]: pd.Index([True, False])
Out[2]: Index([True, False], dtype='object')
In [3]: pd.Index([True, False], dtype=bool)
Out[3]: Index([True, False], dtype='object') # it's still an "object"! Any chance you can update your pandas version? |
Beta Was this translation helpful? Give feedback.
-
Thank you for clarifying that it's a pandas issue! I will see if I can update pandas. |
Beta Was this translation helpful? Give feedback.
So unfortunately pandas 1.3.5 has a bunch of issues with index data types... see this StringDtype
xfail
test as an example:pandera/tests/core/test_schema_components.py
Lines 838 to 856 in fe83c19