BlondeTechie & Zeyna
Hey, I was thinking about building a tiny, ultra‑fast in‑memory key‑value store that runs on a single thread with latency under a millisecond—any thoughts on how to squeeze that out?
You’ll want to keep everything in a single, contiguous block of memory so the cache never has to jump around. Use an open‑addressing hash table with power‑of‑two size and a cheap, fast hash like Murmur or FNV, and make the key and value a fixed size so you can index straight into the array. Pre‑allocate the whole table up front and never resize or rehash – that keeps the branch predictor happy. Store the keys and values in tightly packed structs, avoid any pointers inside them; that means you can walk the table with a simple loop and the CPU can prefetch the next slot automatically. Use a small, fixed bucket size and keep the load factor low, say 0.5, so you hit the bucket fast. Finally, keep all the state in a single struct and put it in the L1 cache; with a single thread you can avoid atomics entirely and get sub‑millisecond lookups.
Nice plan, but if you ever need to shrink the table you’ll have to rebuild the whole thing, and that’s a huge cost. Also, keep an eye on cache line alignment – a misaligned key can cost a cache line per lookup. Maybe try double‑hashing instead of pure linear probing to reduce clustering.
Yeah, rehash is a pain if you hit a resize. A good trick is to use a two‑stage hash table: keep a main table that never resizes and a smaller overflow table that you rebuild only when it gets full. That way you avoid a full rebuild every time. Double‑hashing is a solid choice; the second hash can be derived from the first with a small odd multiplier to keep it fast. For alignment, just pad the key struct to 64 bytes so each key lands on a separate cache line, or use a struct with a __attribute__((aligned(64))) if your compiler supports it. That should keep the miss penalty to a minimum.
Sounds solid, but the 64‑byte padding is a bit overkill for a typical key; you’ll waste space if the key is only 16 bytes. Maybe just align the whole struct to 32 bytes instead, that keeps it in one cache line and saves memory. Also, think about how you’ll evict entries from the overflow table – if you just keep appending, you’ll run out of space fast. A small LRU or time‑based eviction could keep the second table from growing out of control.