fix(agent-presence): upsert atomically — kill first-time-insert race #3
Reference in New Issue
Block a user
Delete Branch "fix/presence-upsert-race"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
AgentPresenceService.setStatus()was a classic read-modify-write race:Two concurrent first-time writes for the same
userId:findOne→ undefined →INSERTfindOne→ undefined →INSERT→Duplicate entry '<userId>' for key 'agent_presences.PRIMARY'→ 500Caught in prod (2026-05-25 23:23:35Z)
The 10-ms-apart pair comes from
Fabric.OpenclawPlugin's presence-sync emitting two PUTs from overlapping ticks (nav/Fabric.OpenclawPlugin#presence-sync-tick-mutex fixes the plugin side). Even with that fix landing, this race-prone code path stays exposed to any future caller (hf-plugin, manual curl, monitoring, etc.), so the backend should be the source of truth for atomicity.Fix
repo.upsert()compiles to MySQLINSERT … ON DUPLICATE KEY UPDATE— atomic at the storage engine, no read needed, no race window. Return the synthesized entity since the controller only reads{userId, status}off it (no SELECT round-trip).Sim test
Rebuilt
fabric-backend-guild1from this branch, fired 5 parallel PUTs to a freshuserId:Previous setStatus() did read-modify-write: findOne → if-exists save / else create+save Two concurrent first-time writes for the same userId both saw no row, both INSERT'd, second hit unique-key (agent_presences.PRIMARY) and 500'd with "Duplicate entry '<userId>' for key 'agent_presences.PRIMARY'" — visible in prod (2026-05-25 23:23:35Z) when Fabric.OpenclawPlugin's presence-sync emitted two PUTs ~10 ms apart for the same agent (its tick-overlap is being fixed separately in nav/Fabric.OpenclawPlugin). Replace with repo.upsert(values, ['userId']) — compiles to MySQL `INSERT … ON DUPLICATE KEY UPDATE`, atomic at the storage engine, no read needed, no race window. Synthesize the returned entity from the values we just wrote rather than a SELECT round-trip; controller only reads {userId, status} off it. Sim verified with 5 parallel PUTs to a fresh userId: all 200, no Duplicate errors in guild log (was: 1 × 200 + 4 × 500 with the old code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>