What is the maximum number of metadata files a JKAN site can handle? #290

pzwsk · 2024-12-24T09:18:39Z

Hi there,

I am considering to continue using JKAN for the Risk Data Library catalog https://jkan.riskdatalibrary.org/datasets/ (currently an MVP product).

However, we would potentially have a lot of files to handle. Hopefully several thousands at some point.

I am therefore trying to understand the limitation in terms of number of metadata files for a JKAN site to function.

Hosting

This of course depends on server size. In our case, the site is stored on GitHub pages for now though we might consider an alternative. According to GitHub, it is advised not to have a repo more than 1GB so if we consider 2KB per metada files, this is about 500,000 files limit.

Search and filtering

Not sure what search engine is used for JKAN but I am guessing file limit should be much less than the storage limit as index needs to be downloaded and processed on the client side?

Any help appreciated.

BryanQuigley · 2024-12-26T05:34:31Z

I think it should be fine with a few 1000s, but I am not aware of any jkan sites using more then 700 datasets - so not sure anyone has tested it.

As I think you identified:
Hosting - really shouldn't be a problem
Search - maybe an issue at that scale (still guessing not), but it's something we've discussed ways to improve and would be open to better ideas - AND there is nothing preventing you from just using an external search engine.

Happy to review any PRs you want to merge back too.

timwis · 2025-01-13T09:46:18Z

Hey @pzwsk ! 👋🏻 A few thousand would be fine. A few hundred thousand would probably be jittery. The two bottlenecks are:

The /datasets.json file is loaded into memory when viewing the Datasets page (e.g. https://jkan.riskdatalibrary.org/datasets.json)
We don't currently have pagination on the Datasets page, so it will show every item at the moment

Adding pagination to the Datasets page is pretty straightforward, but if you want search to still work, you'll need to use something like algolia or swiftype (or an open source, self-hosted alternative). You'd give the search engine that datasets.json file and tell it how to render the results on the Datasets page. We haven't done that because we aimed to keep the setup simple, but it wouldn't be too difficult to implement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the maximum number of metadata files a JKAN site can handle? #290

What is the maximum number of metadata files a JKAN site can handle? #290

pzwsk commented Dec 24, 2024

BryanQuigley commented Dec 26, 2024

timwis commented Jan 13, 2025

What is the maximum number of metadata files a JKAN site can handle? #290

What is the maximum number of metadata files a JKAN site can handle? #290

Comments

pzwsk commented Dec 24, 2024

BryanQuigley commented Dec 26, 2024

timwis commented Jan 13, 2025