Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider moving to a relational data model, like Postgres #8620

Open
6 of 12 tasks
john-gom opened this issue Jun 28, 2023 · 6 comments
Open
6 of 12 tasks

Consider moving to a relational data model, like Postgres #8620

john-gom opened this issue Jun 28, 2023 · 6 comments
Assignees
Labels

Comments

@john-gom
Copy link
Contributor

john-gom commented Jun 28, 2023

Problem

Currently the OFF data is in a lot of different places (taxonomy files, MongoDB, STO files) which makes it difficult to perform queries across the data sets.

Aggregated queries against MongoDB are also very slow and the author feels these would be considerably faster against a relational model

Proposed solution

Move to a relational model

Tasks

Preview Give feedback

Part of

@john-gom john-gom added the ✨ Feature Features or enhancements to Open Food Facts server label Jun 28, 2023
@john-gom
Copy link
Contributor Author

POC repo is https://github.com/john-gom/openfoodfacts-data

Using NestJS as a general framework with Mikro-ORM for data modelling / migrations and Postgraphile for GraphQL support

@john-gom john-gom added ♞ Epic An epic groups several tasks/issues. It should have a meaning for users. and removed ✨ Feature Features or enhancements to Open Food Facts server labels Jul 10, 2023
Copy link
Contributor

This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts

@github-actions github-actions bot added the Stale label Nov 23, 2023
@teolemon teolemon added ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. and removed Stale labels Nov 24, 2023
@john-gom
Copy link
Contributor Author

john-gom commented Jan 15, 2024

I've created a script to load all product "sto" files into Postgres. Branch is issues/8620-a.

@john-gom
Copy link
Contributor Author

Some products contain \u0000 in the data which is not compatible with postgres. SQL to fix was:

update revision set data = replace(data::text,'\u0000','')::json  
where code in ('04810513','3770001905075','4779030380333','4840237001946','6909995101119','7501058623256','7702024040040','7798305866775','7801620005191','7895000467013','7898142862043','8015057004453','8412600017975','9300617296614','9557789820127')
and data::text like '%\\u0000%';

@john-gom john-gom removed the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Jan 22, 2024
@john-gom
Copy link
Contributor Author

john-gom commented Jan 22, 2024

An example database has been uploaded here: https://static.openfoodfacts.org/data/pg/products.dmp

This can be restored using pg_restore

@john-gom john-gom self-assigned this Jan 22, 2024
@damil
Copy link
Contributor

damil commented Mar 26, 2024

That's a very interesting proposal.

Although I still haven't understood how OFF is organized, I definitely have the feeling that a regular relational model could bring many benefits like :

  • understandability of the datastructure for newcomers
  • less requirements for huge trees in memory -- the database could do the work
  • strength of the relational model for having multiple entry points into the data
  • integrity rules
  • ...

From what I heard during the march 2024 hackaton, there are some recursive relations within the data -- but that's not a hindrance : most database management systems support Common Table Expressions, which is the SQL way for expressing queries on recursive data.

So thanks for your work, I'm eager to look at your postgres data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: To discuss and validate
Development

No branches or pull requests

3 participants