Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: serve API product GET requests from an async server #10732

Open
raphael0202 opened this issue Aug 26, 2024 · 2 comments
Open

Proposal: serve API product GET requests from an async server #10732

raphael0202 opened this issue Aug 26, 2024 · 2 comments
Labels
🪶 Apache We use Apache as a server to run Open Food Facts API Refactor infrastructure https://wiki.openfoodfacts.org/Infrastructure 🚅 Performance Perl Related to the Perl code of the ProductOpener server

Comments

@raphael0202
Copy link
Contributor

raphael0202 commented Aug 26, 2024

The performance issues we're currently experiencing led us to analyze what requests are taking most processing time on Apache server: https://docs.google.com/document/d/13rYXR0TxR2hUc0XEKzKcBT6ndcd5_L3yeP_L6UjZwzs/edit.
The analysis revealed that facet-related queries were the most costly.

We only have 50 Apache workers, so when most workers are busy waiting for MongoDB or off-query, we can't respond to basic GET /api/v*/products/{code} queries that only require a disk access (to fetch the sto file) and a bit of RAM to get the translations. These requests account for 15% of all requests handled by Product Opener.
This route is the most-used API endpoint by our own mobile app and reusers.

My proposal would be to use a new asynchronous service (written with FastAPI in Python, for example), to handle read-only GET /api/v*/products/{code} requests.

Having a distinct service that takes care of read-only API queries would make sure that our own app (or third-party apps) won't fail even if ProductOpener does. Asynchronicity means that:

  • we can fully exploit the potential of the server, the bottleneck becoming I/O instead of the number of available Apache workers.
  • we limit RAM consumption compared to Apache, as translations + taxonomies are only stored once in memory.

The addition of knowledge panels could also be migrated to this new service later.

I think it's a better alternative than #8934 that, while being faster (served directly by nginx), is more disk-hungry, won't be available on all products and doesn't play nicely with taxonomized fields translations.

This could also be a first step to tackle #5170.
Write queries are not very common (0.25% of queries handle by Product Opener), and most of the complexity of the codebase comes from data processing/score computation associated with write queries.

That's why I think it's better to keep POST queries out of the scope of this proposal for now.

Limits

This service wouldn't account for the 53% of queries that are product HTML pages.
Serving these pages through this async service would be much more difficult, as it would mean to migrate all the HTML logic there.

@raphael0202 raphael0202 added 🚅 Performance 🪶 Apache We use Apache as a server to run Open Food Facts Perl Related to the Perl code of the ProductOpener server labels Aug 26, 2024
@github-project-automation github-project-automation bot moved this to To discuss and validate in 🍊 Open Food Facts Server issues Aug 26, 2024
@john-gom
Copy link
Contributor

We could potentially start storing the full Product JSON in Postgres. I did a POC on this a while ago (#8620). The main issue is the additional database space but if that is OK then having the data in a relational database would make it much easier to use different languages than Perl.

@openfoodfacts openfoodfacts deleted a comment Aug 26, 2024
@CharlesNepote
Copy link
Member

Good idea! Could we try to compare other solutions? Note I don't have a clear opinion on what's the best solution. I have tried to be objective for both solutions, but maybe I don't have sufficient knowledge to do so.

Don't hesitate to edit this table.

Static JSON + nginx Async server Comments
RAM Winner? (but what order?) - nginx is known to be very efficient on that side; but FastAPI + PostGresql seems to consume few RAM for folksonomy engine (with very low traffic, that said)
Disk usage 300k products x 100KB? = 30 MB Winner The difference is not so big, does it really matter? All these data could be in nginx cache
Performance Clear winner (x100?) - Isn't it the main issue we're facing?
Products' perimeter 300K products All products 300k products represent 75% of all requests; probably more than 1 million products are never called with the API
Functional's perimeter What about translations? Clear winner This needs to be evaluated. I don't understand the impacts. This might be the clear or even mandatory bonus for the async server. How big would be a JSON with all the translations?
Implementation Few days? ?
Complexity Winner: no new services Needs to code or deploy (and maintain) a new server
Maintenance ? ? Any idea? Not sure, but intuitively, maintaining a new server is more costly
Scalability Better/easier scalability thanks to nginx Scalability needs more code I would say JSON + nginx is a clear winner but needs to be confirmed. Eg. couldn't JSON files be stored on another server like images?
Resilience Better/easier fallbacks thanks to nginx Resilience needs more code Idem
Sustainability A bit more technical debt in Perl More technical debt, but in a more widespread language

@github-project-automation github-project-automation bot moved this to To be triaged in 🛠️ - Server - API Nov 6, 2024
@teolemon teolemon added the infrastructure https://wiki.openfoodfacts.org/Infrastructure label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪶 Apache We use Apache as a server to run Open Food Facts API Refactor infrastructure https://wiki.openfoodfacts.org/Infrastructure 🚅 Performance Perl Related to the Perl code of the ProductOpener server
Projects
Status: No status
Status: To discuss and validate
Status: To be triaged
Development

No branches or pull requests

7 participants
@CharlesNepote @teolemon @raphael0202 @john-gom and others