-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big memory leak when adding / updating / importing products #2563
Comments
I've spent some time trying to profile the import products Perl script. In general, I would recommend having as few variables in global scope that are exported and shared between packages as possible. I noticed a lot of global scoping and exports which could be a concern. These are less likely to be garbage-collected. You should prefer variables in methods as they will be garbage-collected once the method is done executing. I found a nice module called Devel::NYTProf. It does not profile memory so may not be ideal for our case. However, it has a detailed and expressive analysis of code execution times. It gives a graphical output that is very intuitive and rich. The output revealed a couple of things that might be worth looking further at:
Here is a screenshot of just some of the report when I ran the script on 100 products that had already been imported: Here is a tree map of the time spent in each function. It is very informative but may not be helpful for our problems: |
You may also want to make sure all of your CPAN modules are updated to the latest versions as older versions may have bugs that contribute to the memory leaks. Make sure you don't have any circular references in your code. Perl cannot garbage collect them A good page with some references to Perl best practices to memory management: I am also looking at the Devel::MAT which seems like the most robust memory profiler suite for Perl. Here is a good tutorial if you want to get started: You can also use Memory::Stats to measure memory at checkpoints you specify in your code. You could see how the memory changes before and after certain blocks of code: |
Hello @zigouras ! Thank you very much for taking the time to do this analysis! The reports from Devel::NYTProf are very interesting, knowing which function is called the most and from where is going to be very useful, I'll definitely investigate what we can do for ProductOpener::Display::remove_tags_and_quote I read about the circular references in the past but I wasn't able to identify where I'm creating some, I don't think I create them (at least not intentionaly). |
Generally, according to I also tried outputting |
I ran the latest import csv script on all 1000 items and it only used 0.5 GB of memory and was not increasing on my laptop. My Devel::MAT analysis:
Thanks @hangy for his analysis. We may want to review how we use MongoDB in Perl. Database connections are always a likely place for performance issues. Do we use connection pooling? |
It looks like certain Perl regular expressions can take up a lot of memory. If a regex contains I am trying to get https://metacpan.org/pod/Devel::FindAmpersand installed but I am getting errors. It doesn't seem to work on perl later than 5.18.4 per the CPAN unit test so I am downloading that and trying to build it. If anyone can get Update: I built perl 5.18.4 and successfully installed Devel::FindAmpersand with it but couldn't get it to work with other project dependencies. |
$client reference. As per https://metacpan.org/pod/MongoDB::MongoClient, the actual socket is opened lazily and should be open until it goes out of scope. We don't really clean up the $client (which I think is for the web app to reuse the client?), but there shouldn't be any additional instances.
Really? It looks to be constant for me; After 50 products:
After 100 products:
Maybe I did something wrong (too small sample size might be a problem), but I only see a ~0.2 MiB increase in SCALAR. |
@hangy Sorry I read the |
Here is another file to import with 100 real products, with full ingredients lists etc. They trigger much more processing involving ingredients analysis, taxonomies etc. |
Something I forgot to mention: if you run the import multiple times, then the import script should detect that the product did not change, and there will be less processing (and the products won't be saved again in the disk and Mongo). For the import script, the only MongoDB use should be when adding/updating each product. |
Thanks for the update @stephanegigandet . I did my memory analysis on a run that was adding 1000 new test products to the system. I deleted all the products on disk and cleared the mongodb table before my run. I will try with your more extensive data set. |
@zigouras Indeed, as you found, I've been using "In general, apply /^(.)(pattern)(.)$/s and use |
I agree that this should be tested. However, it's probably not as bad as the 2008 Stack Overflow answer makes it sound: According to perlvar, the gravest problems with those variables were supposedly fixed in 5.18 (2014) and 5.20 (2015):
|
Performance wise (maybe not memory wise), it would probably be useful to see if we can optimize get_string_id_for_lang() which is by far the function that is called the most. E.g. in one of my import test runs:
We could try to merge most of the =~ s// calls, and turn as many as possible into tr// calls instead which seem to be less costly. |
Sounds good guys. Note that Devel::NYTProf reports on the problematic regexes. @hangy makes a good point that newer versions of Perl fixed the issue with these regex classes. However, I still think if we have memory issues it would be worth trying to refactor out these regexes to see if it improves memory. Question: is the dev and production Perl compiled with thread support or not? |
Stale issue message |
Should we close this issue? |
This issue should be tested to see if we still have the bug. See bug description. |
There is a big memory leak when we update a product. It can be seen in particular when we launch the import_csv_file.pl script with a large (e.g. 1000) amount of products : the process keeps growing until it runs out of memory.
For a real product import of a big producer (e.g. 5000 products), the process grows to 10 Gb of memory.
There are many different things that happen when a product is updated: the ingredients lists are processed, allergens and additives detected, quality is measured etc. Each of those steps involve calling many different subroutines, in particular many functions from Tags.pm related to taxonomies.
At this point, we do not know what exact part of the pre-saving processing causes the memory leak (it is very likely that several parts contribute to it).
This needs further investigation, maybe by testing each part individually (e.g. running each function tens of thousands of times and measuring memory), and/or with tools like Devel::Gladiator etc.
Steps to see the memory leak:
in /srv/off/scripts/ , run:
./import_csv_file.pl --user_id debug --org debug --source_id debug --source_name debug --source_url debug --define lc=en --csv_file debug-code-product_name-ingredients2.csv
debug-code-product_name-ingredients2.csv.zip
Related discussions/issues: #2053 #2054
This issue is very important to resolve, because it also happens (more gradually) in production on the web site: whenever a product is updated, the Apache mod_perl process grows, and eventually it needs to be restarted.
Part of
The text was updated successfully, but these errors were encountered: