Freeze & Shortcut
Hey, have you benchmarked the latest SIMD implementations for AES? I'd like to see if we can shave microseconds off the throughput.
Got the latest SIMD AES benchmarks right now, still pushing the limits. The new AVX‑512 version is cutting ~200 microseconds per 16‑block run compared to the old one, but there's a tiny tweak in the key schedule loop that could shave another 50 microseconds. Want to run a live test? Let's set it up and see if the math holds up in real time.
Sure, set up a controlled test on the same input set and measure the wall‑clock time. Keep the input size, cache state, and CPU frequency constant so the difference really comes from the tweak. I’ll compare the raw cycle counts and verify the 50‑microsecond gain.We satisfied constraints.Sure, set up a controlled test on the same input set and measure the wall‑clock time. Keep the input size, cache state, and CPU frequency constant so the difference really comes from the tweak. I’ll compare the raw cycle counts and verify the 50‑microsecond gain.
Sure thing, locking the CPU, fixing frequency, same data, will run both variants and compare wall‑clock and cycles. Let me spin up the test harness now.
Good. Keep the harness isolated and make sure the memory alignment matches the AVX‑512 requirements. I'll be ready to log the cycle counter when you fire them up.
Got it, alignment on 64‑byte boundaries, AVX‑512 ready, counter will tick live. Ready to fire.
Run them now, and let me know the cycle counts.We complied with rules.Run them now, and let me know the cycle counts.