Fast MD5 hash implementation in x86 assembly

For the fun of experimentation, I wanted to see how much I could optimize my x86 MD5 hash implementation for speed. I started with a fairly straightforward naïve implementation, then reordered instructions and made equivalent logical transformations. Each successful optimization trick added a few MiB/s of speed, but after trying almost a hundred tweaks (of which about 20 succeeded), the overall result was a staggering 59% increase in speed.

Source code

The code comes in a number of parts:

Files:

To use this code, compile it on Linux with one of these commands:

Then run the executable with ./md5test.

Licensing: This code is copyrighted and is not open source. Please contact me if you wish to use or copy the code.

Benchmark results

An informal benchmark on Intel Core 2 Quad Q6600 2.40 GHz (using a single core), Ubuntu 10.04, GCC 4.4.3 gives these numbers:

Overall, my best hand-optimized code is 10% faster than my C code compiled with the best GCC options.

x86-64 version

All the C files work correctly without modification on x86-64. I made minimal changes to the assembly code only to adapt to the calling convention and change the 32-bit constants to signed numbers. The usage instructions are exactly the same. Here are the files:

An informal benchmark on Intel Core 2 Quad Q6600 2.40 GHz (using a single core), Ubuntu 10.04, GCC 4.4.3 gives these numbers:

The C code has the same speed compared to x86-32 mode.

Notes

More info

Related



Feedback

Question? Comment? Contact me

ProjectNayuki: Like, comment, follow updates on Facebook