Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality error: language mismatch #10997

Open
aleene opened this issue Nov 8, 2024 · 0 comments
Open

Quality error: language mismatch #10997

aleene opened this issue Nov 8, 2024 · 0 comments
Labels
🧽 Data Quality - Products stored in French 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🌍 Multilingual products Product name, Generic name, Ingredients, Packaging text are multilingual fields.

Comments

@aleene
Copy link
Contributor

aleene commented Nov 8, 2024

Problem

The language of the product name and/or ingredients can be different from the language of the field. For the product name it does not have big consequences, but it can be annoying to the user when he sees a language he does not understand. A language mismatch for the ingredients has consequences for the NOVA calculation: not possible.

The easiest way to detect these errors for the product name is to make a word cloud.
Image
Just by looking at this english cloud for walnut product names one already sees German, Dutch and French texts.

For ingredients one has to go through all the ingredients in use for a specific category.

Proposed solution

Try to determine the language of product name and ingredients based on the taxonomies. Google translate works pretty well, but with our available information we should be able to make a more specific model(?).

If a mismatch is detected, flag it, so we know it exists and can repair.

Additional context

Some products use an english name, but do not have associated ingredients or nutritional values in that language. Usually another language is then used as main language, but the english is kept (Lidl for instance).

Number of products impacted

Would not be surprised if this is 10%.

Time per product

Requites an edit for each product to repair this.

@github-project-automation github-project-automation bot moved this to To discuss and validate in 🍊 Open Food Facts Server issues Nov 8, 2024
@teolemon teolemon added 🌍 Multilingual products Product name, Generic name, Ingredients, Packaging text are multilingual fields. 🧽 Data Quality - Products stored in French labels Nov 8, 2024
@CharlesNepote CharlesNepote added the 🧽 Data quality https://wiki.openfoodfacts.org/Quality label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧽 Data Quality - Products stored in French 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🌍 Multilingual products Product name, Generic name, Ingredients, Packaging text are multilingual fields.
Projects
Status: To discuss and validate
Status: To do
Development

No branches or pull requests

3 participants