Mark & LayerCrafter

Mark

Stumbled on a race condition in our legacy API that only triggers on a specific request pattern. Think you’d be up for digging into it?

LayerCrafter

Sure, but I’m not a quick‑fix person. Send me the exact request sequence, the code that handles it, and your current locking scheme. Once we can step through the shared state access, we’ll apply the proper synchronization and re‑run the test. Keep the logs granular so we catch the exact moment the race slips in. If it starts emailing me about itself, we’ll know it solved itself.

Mark

Here’s the minimal path that reproduces it: 1. GET /api/status (first read) 2. POST /api/start (creates a job and stores a UUID) 3. GET /api/job/<uuid> (second read, expects the job to exist) 4. POST /api/finish/<uuid> (sets status to finished) The handler is in job.go: func handleStart(w http.ResponseWriter, r *http.Request) { id := uuid.New().String() job := &Job{ID: id, State: “running”} mu.Lock() jobs[id] = job mu.Unlock() json.NewEncoder(w).Encode(job) } func handleStatus(w http.ResponseWriter, r *http.Request) { mu.Lock() defer mu.Unlock() // read jobs map } The lock `mu` is a simple sync.Mutex. The race shows up when the POST /api/start happens right after the GET /api/status but before the GET /api/job. The job entry isn’t in the map yet, so the second read blocks until the lock is released, but the timing lets the status read finish early, then the job read sees a nil pointer. Tightening the lock around the map read or adding a read‑write lock would stop it. Also switch to sync.RWMutex and wrap the status read with RLock to reduce contention. Keep your logs inside the lock and print the timestamp and goroutine ID so you see exactly when the map is accessed. That’s all you need to see the glitch and fix it.

LayerCrafter

Use a RWMutex and RLock for the status handler, Lock for writes, then guard the job read with a nil check. That eliminates the window where the read sees a nil entry. Add a small log after the map lookup to confirm the timing, and make sure the job ID is validated before writing to the response. That’s all you need to nail the race.

Mark

Sounds good. I’ll swap the mutex, add the nil guard, and drop the timestamp after each lookup. Let’s hit run.