Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Generate stores taxonomy from name-suggestion-index #9607

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

CloCkWeRX
Copy link
Contributor

@CloCkWeRX CloCkWeRX commented Dec 31, 2023

Fixes #7632 ?

What

This adds a snapshot of the BSD 3 clause licenced name-suggestion-index records for shop/supermarket, transformed into a stores.txt taxinomy format.

There may be some mapping inconsistencies with countries.txt, as I used ISO3601 2 letter codes and added 1 or 2 minor adjustments.

While I added this as a snapshot in git, it would be trivial to fetch the live data from upstream.

Running/generating:

clockwerx@LAPTOP-2K4VT916:~/openfoodfacts-server/taxonomies$ node external/name-suggestion-index/shop/supermarket.js >> unused/stores.txt

What should be done to make this better

  • If there is a central list of languages from an i18n package or similar, that has mappings from countries, that should be used
  • Use of countries.txt or similar
  • Evaluate how important it is to keep stores.txt as is, vs just say, adopting the name-suggestion-index format.
  • Is the wikidata syntax right?
  • eslint?

@github-actions github-actions bot added 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies JavaScript labels Dec 31, 2023
@CloCkWeRX CloCkWeRX requested a review from a team as a code owner December 31, 2023 08:03
@github-actions github-actions bot added the GitHub Actions Pull requests that update Github_actions code label Dec 31, 2023
@codecov-commenter
Copy link

codecov-commenter commented Dec 31, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.55%. Comparing base (dc04d18) to head (382bd70).
Report is 51 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #9607    +/-   ##
========================================
  Coverage   49.54%   49.55%            
========================================
  Files          67       67            
  Lines       20650    20765   +115     
  Branches     4980     4998    +18     
========================================
+ Hits        10231    10290    +59     
- Misses       9131     9185    +54     
- Partials     1288     1290     +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

sonarqubecloud bot commented Jan 2, 2024

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@alexgarel alexgarel changed the title feat: Generate stores taxinomy from name-suggestion-index feat: Generate stores taxonomy from name-suggestion-index Feb 6, 2024
@alexgarel
Copy link
Member

@teolemon can you review this PR ? I personally lack context to understand.

@teolemon teolemon requested a review from raphael0202 February 14, 2024 17:56
@teolemon
Copy link
Member

cc @raphodn

Copy link
Member

@teolemon teolemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really nice to me, I just required a review from the 2 Raphael from Open Prices, since this will potentially be really helpful for them.
What was your personal intent behind this PR ?

@CloCkWeRX
Copy link
Contributor Author

CloCkWeRX commented Feb 14, 2024

Bigger picture, I want to link a particular food item to "where can I get it".

Use cases:

  • https://github.com/danslimmon/oscar Throw out a package, add to shopping list, potentially you can do an online order/click and collect (because the store is known, and has wikidata, and a url)
  • Walk into an area with a geofence - https://github.com/alltheplaces/alltheplaces / open streetmap - have home assistant push a notification to get the three items from that Wikidata:brand store that are on your list
  • I need (uncommon ingredient or food); from the places it is sold which is closest to me?
  • I scanned an item. I'm standing inside an OSM shop/supermarket at (geofence). It has a Wikidata:brand. Would I like to add this as the store that sells this food?

Name suggestion index solves a lot of the hard work of "what are the stores that sell food around the world"; "what are their Wikidata IDs?", and given it is used heavily by open streetmap you get a growing list of stores maintained around the world.

Other use cases:

  • Provides a way via name suggestion index to get a clear logo for a given brand of it is a store brand
  • with more data captured in open food facts, Provides some insight into supply chains

Finally, there was a change of legal advice in the OSM community around scraping of facts from websites being fair game. If this project ever decided to go in a similar direction, having more Wikidata against stores plus using some of the many, many pre built spiders in alltheplaces could, in some circumstances, yield pricing and other product information as structured data.
IE if a local supermarket or vendor puts out a robots.txt/sitemap.xml and structured data on all of their products, then the problem of "index prices" becomes really one of "is there an agreement in place or legal use case to reuse the data as ODBL?" Rather than a technical one

@raphodn
Copy link
Member

raphodn commented Feb 14, 2024

Also : https://github.com/openfoodfacts/open-prices
For each price, we have a link between an OFF product and an OSM location

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@alexgarel
Copy link
Member

@raphodn can you validate the PR ? (If it fits openprice needs)

@aleene
Copy link
Contributor

aleene commented Mar 6, 2024

We should be careful here creating a taxonomy independent of OFF Prices. I would prefer to uses Prices as starting point.

@aleene
Copy link
Contributor

aleene commented Mar 6, 2024

Also integrate with the existing countries taxonomy.

@aleene
Copy link
Contributor

aleene commented Mar 6, 2024

I would move this to another taxonomy: store_brands in order to distinguish from actual individual stores.

@aleene
Copy link
Contributor

aleene commented Mar 6, 2024

@aleene
Copy link
Contributor

aleene commented Mar 6, 2024

We already have store data entered by the users. We should integrate that in someway.

@aleene
Copy link
Contributor

aleene commented Mar 7, 2024

I do not get where the data comes from.

@CloCkWeRX
Copy link
Contributor Author

CloCkWeRX commented Mar 7, 2024

I do not get where the data comes from.

So: OpenStreetMap + Wikipedia have found they have some intersecting interests.

One of those is wikidata, so there are stable identifiers for concepts like brand of store/chain of store.

OpenStreetMap has a large number of contributors who survey things on the ground, but of course everyone agreeing that a Carrefour is a Carrefour in the exact same way is difficult.

So; to help but not replace end user judgement, a number of the editing tools made the Name Suggestion Index.
https://nsi.guide/?t=brands

Example of it in use by an editor:
image

Most of the time, it's right, or an end user says it's not.

So, what does it have to do with OpenFoodFacts?

  • People in general have to eat
  • The openstreetmap folks who go to their local chain supermarket and observe the lat/lon are recording one facet of it
  • The wikipedia/wikidata folk who like to classify things are recording another facet of it
  • OpenFoodFact contributers who fill out "store" field are looking at the store brands ("I bought my peanut butter at a Carrefour"), but sometimes are physically standing in close proxmity to a specific store when they do it

So, we are left with:

  • A comunity maintaining if a chain of stores exists or has gone bankrupt (aka namesuggestionindex)
  • If a user consents, the potential to ask them if they bought product X from chain Y via a mobile app interaction
  • A bunch of real world stores/brands that exist because users have both surveyed them, and felt they were common enough to put in the name suggestion index.

Does that make sense?

@aleene
Copy link
Contributor

aleene commented Mar 7, 2024

Complex way of explaining. Note I am a contributor to OSM and Wikidata and taxonomy maintainer/developer for OFF. So I get all the details. I did not know this tool though, interesting.

@aleene
Copy link
Contributor

aleene commented Mar 7, 2024

I am all in favour of getting this taxonomy going. Checkout the wiki link above. Integration with prices.openfoodfacts.org is key however. We should introduce new taxonomies with small steps, one usecase at the time. Indeed using OSM and wikidata will be central in this. OSM for actual shop locations with OSM identifiers and Wikidata for shop brand information.

@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GitHub Actions Pull requests that update Github_actions code JavaScript 💥 Merge Conflicts 💥 Merge Conflicts Stores 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies
Development

Successfully merging this pull request may close these issues.

Taxonomize stores - Add autosuggest for stores
6 participants