-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: dq_all_val_in_nutrition_are_identical #9320
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #9320 +/- ##
==========================================
+ Coverage 44.30% 48.99% +4.69%
==========================================
Files 64 66 +2
Lines 20333 20417 +84
Branches 4891 4903 +12
==========================================
+ Hits 9008 10004 +996
+ Misses 10150 9144 -1006
- Partials 1175 1269 +94 ☔ View full report in Codecov by Sentry. |
lib/ProductOpener/DataQualityFood.pm
Outdated
# $nutriments_values_occurences_max_key = $key; | ||
} | ||
} | ||
# raise error if all values are identical (this can only apply when they are 0 (because salt or sodium are automatically generated from each others)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what would be the difference with the already existing en:Nutrition all values zero ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see that before, actually. That already existing dq error facet has nothing: https://world.openfoodfacts.org/data-quality-error/en:all-nutrition-values-are-set-to-0
Because:
1)
if ($nid !~ /fruits-vegetables-nuts-estimate-from-ingredients/) {
if ($product_ref->{nutriments}{$nid} == 0) {
$nid_zero++;
}
else {
$nid_non_zero++;
}
}
does not exclude "fruits-vegetables-legumes-estimate-from-ingredients_100g"
comment: instead of adding this exclusion, that would be better to list which main nutrients we want to select. That way if tomorrow there is a new fruits-something-blabla_100g we will not have problem if we forget to update that condition.
there is a typo:
($nid_zero == $nid_n)
where
$nid_zero: number of $nid (nutrient) that contains "_100g" and equal to 0. That number is at most all "_100g" minus fruits-vegetables-nuts-estimate-from-ingredients_100g.
$nid_n: all nutrient including "_100g" as well as without "_100g". Example for proteins: "proteins", "proteins_100g", "proteins_unit", "proteins_value".
According to my comment, I am removing this dq facet to replace it by the one implemented in this PR.
push @{$product_ref->{data_quality_warnings_tags}}, "en:nutrition-3-or-more-values-are-identical"; | ||
last; | ||
} | ||
} | ||
|
||
# retrieve max number of occurences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can do something simpler that would also work when people put a non zero value (e.g. 1) in all fields:
Just check if you have only 1 key, or you have 2 keys and 1 of them is "salt_100g" or "sodium_100g". (and also that you have at least 3 or 4 entries in @major_nutriments_values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did something like that. But, because it is an error, I would not add that part "and also that you have at least 3 or 4 entries" for now.
Note that there are not much more rows if we count salt or sodium instead of both (about 10 more): mirabelle
I do not understand why test needed to be updated for that part: |
Note for later (maybe @CharlesNepote could be interested for Data Quality). If we count energy kcal or kj instead of both that leads to 10000 rows: mirabelle With many false positives:
|
Kudos, SonarCloud Quality Gate passed! |
What
New error, when all values in the nutrition table are equal.
Note that it can happen only when they are equal to 0 because otherwise salt and sodium values are automatically calculated from each other's value.
Additionally, not coherent naming in the data quality taxonomy (Error vs errrors) + noticed that the first part of the file lists errors and second part lists warnings, but some errors were listed in the second part, so I moved them up to the first part.
Screenshot
Link to deployed facet
https://world.openfoodfacts.org/data-quality-error/nutrition-values-are-all-identical
Related issue(s) and discussion