Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve crawler impact #125

Closed
kyzh opened this issue May 18, 2015 · 2 comments
Closed

Improve crawler impact #125

kyzh opened this issue May 18, 2015 · 2 comments
Labels
infrastructure https://wiki.openfoodfacts.org/Infrastructure 🚅 Performance 🕷️ SEO

Comments

@kyzh
Copy link

kyzh commented May 18, 2015

What

  • We could improve and limit crawler impact by changing the robots.txt and creating a sitemap.
  • The first one we could use to tell robot not to index some part like created graphs etc..
  • The second could help us put emphasis on places that are the most relevant like the country landing page.

Part of

@stephanegigandet
Copy link
Contributor

Search queries, graphs etc. are disallowed in the current robots.txt:

User-agent: *
Disallow: /cgi

Our URLs are user friendly (as in readable, understandable and typable) : /[tag type]/[tag value]/[other tag type]/[other tag value]

But they are not well suited for robots.txt rules: too many combinations at the first level already. no follow rules at the link level are probably more suited. e.g. to no follow the [other tag] links and make engine index only one level.

@raphael0202
Copy link
Contributor

I'm closing this issue, as we already have more detailed issues that describe indexing and crawling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure https://wiki.openfoodfacts.org/Infrastructure 🚅 Performance 🕷️ SEO
Projects
Development

No branches or pull requests

4 participants