-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blosc Compression-Performance Tips #231
Comments
Performance tuning is difficult and what works for one dataset might not work for another. Is your data similar to the one used for the graph? I'll have a look at the blosc bug when I am back at a computer. |
All my datasets are identical. Each HDF5 file consists of at least 3 Datasets, one 1D ndarray and two 2D ndarrays with additional 1D ndarrays. All of them have the same size in the first dimension. Not entirely sure what data they plot in the image, but i would imagine that arrays are the easiest to compress (?). I found an issue in the h5py github about this. It seems like that even if the number of threads is set, the program chooses to use serial compression if the chunk-size is insufficiently large. However, even if i set the chunk size to the size of the array i still don't see any improvement. |
I can trace it back to https://github.com/Blosc/c-blosc/blob/d306135aaf378ade04cd4d149058c29036335758/blosc/blosc.c#L913. One can force a block size by calling e.g. |
Hey All,
I've been playing around with the
blosc
compression implemented in 0.8.0 and i have some questions regarding the performance.So basically, for my 400 MB csv-file that i convert to a HDF5 file i see compression of ~84%, which is amazing, however the performance doesn't seem to be affected at all.
Looking at this graph of the official HDF5 website:
I should see an enormous boost in the throughput.
Looking a bit at my CPU usage, the program is only using a single thread, despite me setting the number of blosc threads with
blosc_set_nthreads
Additionally,
blosc_get_nthreads
only returns one single thread, which makes me think if there is an additional flag that needs to be set?Overall, i wished there was some kind of performance-guide on this topic, is that something that would be possible to include in the documentation?
Best wishes,
Dominik
The text was updated successfully, but these errors were encountered: