Hacker & Clever
Hey, have you thought about how we could push the speed of hashing functions using SIMD and assembly? I was playing with AVX2 last night and the numbers got wild.
Nice! AVX2 is great for data‑parallel hashing, but you have to keep the pipeline fed—no stray branches, keep the registers alive, and use bit‑shuffle tricks. Try packing the hash state into 256‑bit lanes and rolling a circular shift instead of a table lookup. If you inline the core mix step in assembly you can drop the call overhead, but make sure to keep the memory accesses aligned; unaligned stores break the throughput. Also, look at the Intel AVX2 intrinsics for _mm256_shuffle_epi8 – that one can emulate a 4‑byte S‑box in zero instructions. If you can get the mix step in one 32‑cycle cycle, you’ll be racing the hardware. Just watch for those sneaky memory stalls; they’re the real speed killer.