fix(presence-sync): /api prefix + Bearer guildAccessToken (not x-api-key) #7
Reference in New Issue
Block a user
Delete Branch "fix/presence-sync-api-prefix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Two layered bugs in the
presence-syncloop, both making every PUT fail forever in prod (silent log spam, busy-discard never actually applied to announce-type channels):1. Missing
/apiprefixURL was
${guildBaseUrl}/agents/<id>/presencebut the guild backend sets a global prefix inmain.ts:Every other REST call in this plugin (channel.ts channels list, fabric-client.ts postMessage, canvas) already prepends
/api/— onlypresence-syncmissed it. Returned 404"Cannot PUT /agents/...".2. Wrong auth scheme
Plugin sent
x-api-key: <fabricApiKey>, but the endpoint sits behind the globalAPP_GUARD = ApiKeyGuardwhich actually expectsAuthorization: Bearer <guildAccessToken>(despite the misleading guard name on the backend side).Confirmed via:
Fix
presence-sync.ts: doagent-loginon demand to obtain a freshguildAccessToken, cache it per-agent for 13 min (under the 15-min JWT TTL), use it as Bearer for the PUT. 401 response invalidates the cache so the next tick re-logs-in.inbound.ts:firstGuildEndpointByAgent→firstGuildByAgentstoring bothendpointANDnodeId(presence-sync needsnodeIdto pick the right token out ofguildAccessTokens[]).index.ts: passFabricClienttoPresenceSyncconstructor.Verified in sim
Before:
(404 was first symptom; 401 surfaced after the /api fix.)
After:
0 failed pushes. Confirmed via
grep -c 'presence-sync .* →' /tmp/gw.log= 1,grep -c 'PUT .* failed' = 0.Prod impact
This restores busy-discard on
announce-type channels (the whole point of presence-sync). Without it, announce broadcasts were going to busy agents too. No code change ON the receiver side — just makes the existing receiver logic actually receive accurate status.Two layered bugs in the presence-sync loop, both causing every PUT to fail forever in prod: 1. **Missing /api prefix.** URL was `${guildBaseUrl}/agents/<id>/presence` but the guild backend sets a global prefix 'api' in main.ts `setGlobalPrefix('api')`. Every other REST call in this plugin (channel.ts channels list, fabric-client.ts postMessage, canvas) already prepends /api/ — only presence-sync missed it. Returned 404 "Cannot PUT /agents/...". 2. **Wrong auth scheme.** Plugin sent `x-api-key: <fabricApiKey>`, but the endpoint sits behind the global APP_GUARD = ApiKeyGuard, which actually expects `Authorization: Bearer <guildAccessToken>` (despite its name — confusing naming on the backend side). With /api added, error became 401 "missing bearer token". Confirmed by `docker exec fabric-backend-guild grep APP_GUARD /app/dist/app.module.js` and manual curl: Bearer guild token → 200 OK. **Fix** - presence-sync.ts: do agent-login on demand to obtain a fresh guildAccessToken, cache it per-agent for 13 min (under the 15-min JWT TTL), use it as Bearer for the PUT. 401 response invalidates the cache so the next tick re-logs-in. Pushes are gated on status changes (rare), so the login overhead is negligible. - inbound.ts: firstGuildEndpointByAgent → firstGuildByAgent storing both endpoint and nodeId (presence-sync needs nodeId to pick the right token out of guildAccessTokens[]). - index.ts: pass FabricClient to PresenceSync constructor. **Verified in sim** After restart, gateway log shows `fabric: presence-sync recruiter → idle` (200 OK), zero failed PUTs, where previously it would log a 404 every ~5s per agent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>