Replies: 1 comment 3 replies
-
Hi oniht, I am by no means an expert in Pandera, but if I correctly understand your problem, then I may have a solution via a custom check. I am using the class based API but I believe you can achieve this through the object API if needed. Please see the code block below: import pandera.extensions as extensions
import pandas as pd
import pandera as pa
# define an example df
df = pd.DataFrame(
{
"x": [1, 2, 3, 10],
"y": [-3, 4, 5, 7],
"label": ["b", "a", "b", "a"]
}
)
@extensions.register_check_method(statistics=["check_key", "grouping_key"])
def check_groups_are_within_3_std(df, *, check_key: str, grouping_key: str):
"""
A custom check testing whether each groups median is less than or equal to their
3 sigma
"""
check = df.groupby([grouping_key])[check_key].apply(
lambda x: x.median() <= x.std() * 3
)
return check
class Schema(pa.DataFrameModel):
x: int
y: int
label: str
class Config:
# register the custom check
check_groups_are_within_3_std = {"check_key": "x", "grouping_key": "label"} Let me know how that goes, Im happy to follow up. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been going through the documentation and trying to figure out if it's possible to create an element-wise greater than check where the limit is different for each group defined by another column. Ideally, the limit would be inferred from the data. (e.g. median - 3 times standard deviation for that group)
So far I haven't found out any solution. I would appreciate help if it's indeed possible to such a check.
Beta Was this translation helpful? Give feedback.
All reactions