Performance issue with version 0.14.5 compared to 0.13.4 #1160
Unanswered
PetitLepton
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, we are using
pandera
to validate on the fly data that we then serve through an API. Yesterday, we switched from version 0.13.4 to 0.14.5 which, unexpectedly, lead to a significant increase in the latency of our endpoints. After a little bit of digging, we figured out that there was a difference of behavior when usingSeries[str]
. The following example, withpandas
version 1.4.4,leads to an average computation time,
numpy.array(times).mean()
, four times larger in version 0.14.5 compared to0.13.4
(on my local machine,x86_64
).If we convert the series into Python strings and use
pandera.STRING
, the performance of 0.13.4 and 0.14.5 are similar.What would be your advice? Should we switch to Python strings? Is it an expected behaviour?
Thanks in advance for your help!
Beta Was this translation helpful? Give feedback.
All reactions