MD5 SIMD #56558

ronag · 2025-01-11T08:59:15Z

What is the problem this feature will solve?

MD5 hashing is quite commonly used for file handling in HTTP. However, the algorithm is quite slow and is not able to fully utilize modern hardware.

What is the feature you are proposing to solve the problem?

While it's not possible to do much more to optimize a single hashing instance there are techniques such as https://github.com/minio/md5-simd which is able to run multiple hashing instances over SIMD which can process 8 MD5 hashes in parallel using SIMD instructions.

In a real world application such as a http file server/client (using Content-MD5) with many parallel request this would provide a 2-6x real-world performance improvement. In some of our applications the MD5 hash takes more than 50% of cpu time.

This should be possible to implement without changing our API's.

What alternatives have you considered?

Run hashing in a thread pool. One does not necessarily exclude the other. Using a thread pool would be more about avoiding latency spikes as in terms of throughput just forking the http server provides similar results.

ronag · 2025-01-11T08:59:32Z

@nodejs/performance @mcollina @anonrig @lemire

mcollina · 2025-01-11T10:04:15Z

How are you doing md5? In order support simd in this fashion, you'd need to batch the md5 requests and execute them in a separate thread anyway (maybe a libuv worker). It has to be asynchronous because you want other http requests to come in and request md5 hashing.

Literally this is worth doing only if your application is doing many md5 hashes concurrently (to the point it's mostly doing md5s).

Have you tried wrapping that module in a native addon and experimenting?

tniessen · 2025-01-11T12:50:24Z

In a real world application such as a http file server/client (using Content-MD5) with many parallel request this would provide a 2-6x real-world performance improvement. In some of our applications the MD5 hash takes more than 50% of cpu time.

Could you share the JavaScript/Node.js code that led to these numbers?

lemire · 2025-01-11T15:10:47Z

@tniessen My suspicion is that @ronag refers to credible/potential gains and not to an actual Node.js implementation.

mcollina · 2025-01-11T18:42:27Z

I think the information and discussion in fastify/fastify-etag#91 can be relevant.

tniessen · 2025-01-13T14:14:00Z

FWIW, MD5 is not a good hash function by any means and new applications should avoid it except to interact with older protocols. That being said, even for hash functions that should be used, I am not sure that this kind of optimization makes sense within Node.js. I am not particularly optimistic about OpenSSL adding this internally, but that still seems like a better place to start optimizing the hash function implementation.

ronag · 2025-01-13T15:21:34Z

FWIW, MD5 is not a good hash function by any means and new applications should avoid it except to interact with older protocols.

Indeed. However, Content-MD5 is commonly used https://datatracker.ietf.org/doc/html/rfc1864

I am not sure that this kind of optimization makes sense within Node.js.

Why?

I am not particularly optimistic about OpenSSL adding this internally,

Why does it need to live in OpenSSL?

but that still seems like a better place to start optimizing the hash function implementation.

I don't think the MD5 hash implementation can be much more optimized than it already is in terms of processing a single stream.

lemire · 2025-01-13T15:29:30Z

I don't think the MD5 hash implementation can be much more optimized than it already is in terms of processing a single stream.

On ARM hardware, Bun is measurably faster than Node at MD5 hashing.

JavaScript hashing speed comparison: MD5 versus SHA-256

So I would not rule out small gains (e.g. 20%).

tniessen · 2025-01-13T15:54:48Z

Indeed. However, Content-MD5 is commonly used https://datatracker.ietf.org/doc/html/rfc1864

Yes, this falls under the "older protocols" I mentioned above, but of course I did not mean to imply that these older protocols are not being used anymore. The unfortunate popularity of MD5 is also why OpenSSL has not removed it yet despite being fundamentally broken and slow.

Why does it need to live in OpenSSL?

We already use OpenSSL's implementation of MD5, so if that can be made faster for all downstream consumers, that'd be my first approach. Avoiding yet another dependency or even our own bespoke implementation of cryptographic primitives is a good rule of thumb in my opinion.

joyeecheung · 2025-01-14T08:51:33Z

On ARM hardware, Bun is measurably faster than Node at MD5 hashing.

If you are using crypto.createHash() the overhead primarily comes from wrapper management/V8's GC performance, instead of having something to do with actual hashing, like what we already found in nodejs/performance#136 - to eliminate these factors, use crypto.hash() (which is why it's significantly faster than crypto.createHash() for one off hashes, though this doesn't matter all that much when hashing a big stream in chunks which is what crypto.createHash() is really for). Even using cppgc-based wrappers would be enough to make crypto.createHash() 8-10% faster in a microbenchmark that misuse it for one-off hashing, which shows how little this have to do with actual hashing and more having to do with GC...

ronag added the feature request Issues that request new features to be added to Node.js. label Jan 11, 2025

github-project-automation bot added this to Node.js feature requests Jan 11, 2025

github-project-automation bot moved this to Awaiting Triage in Node.js feature requests Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MD5 SIMD #56558

MD5 SIMD #56558

ronag commented Jan 11, 2025 •

edited

Loading

ronag commented Jan 11, 2025

mcollina commented Jan 11, 2025

tniessen commented Jan 11, 2025

lemire commented Jan 11, 2025

mcollina commented Jan 11, 2025

tniessen commented Jan 13, 2025

ronag commented Jan 13, 2025

lemire commented Jan 13, 2025

tniessen commented Jan 13, 2025

joyeecheung commented Jan 14, 2025 •

edited

Loading

MD5 SIMD #56558

MD5 SIMD #56558

Comments

ronag commented Jan 11, 2025 • edited Loading

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

ronag commented Jan 11, 2025

mcollina commented Jan 11, 2025

tniessen commented Jan 11, 2025

lemire commented Jan 11, 2025

mcollina commented Jan 11, 2025

tniessen commented Jan 13, 2025

ronag commented Jan 13, 2025

lemire commented Jan 13, 2025

tniessen commented Jan 13, 2025

joyeecheung commented Jan 14, 2025 • edited Loading

ronag commented Jan 11, 2025 •

edited

Loading

joyeecheung commented Jan 14, 2025 •

edited

Loading