Skip to content

Validate string datatype in the column #807

Answered by cosmicBboy
dineshkumar-23 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @dineshkumar-23, so this is a limitation of the pre- pandas 1.0 string representation... specifying dtype=str actually uses numpy object datatype instead of a logical string datatype to represent the array.

Recommendation

I'd highly recommend using pandas.StringDtype in this case (assuming you're using pandas>=1):

class Schema(pa.SchemaModel):
    device: Series[int]
    type: Series[float]
    message: Series[pd.StringDtype]

    class Config:
        coerce = True

edit: add coerce=True to config

In this case, pandera will complain even with your original dataframe:

  File "/Users/nielsbantilan/git/pandera/pandera/error_handlers.py", line 32, in collect_error
    raise schema_error fr…

Replies: 1 comment 9 replies

Comment options

You must be logged in to vote
9 replies
@cosmicBboy
Comment options

@dineshkumar-23
Comment options

@NickleDave
Comment options

@jeffzi
Comment options

jeffzi Apr 10, 2022
Collaborator

@NickleDave
Comment options

Answer selected by cosmicBboy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants