fix(presence-sync): /api prefix + Bearer guildAccessToken (not x-api-key) #7

Merged
hzhang merged 1 commits from fix/presence-sync-api-prefix into main 2026-05-25 23:17:45 +00:00
Contributor

Two layered bugs in the presence-sync loop, both making every PUT fail forever in prod (silent log spam, busy-discard never actually applied to announce-type channels):

1. Missing /api prefix

URL was ${guildBaseUrl}/agents/<id>/presence but the guild backend sets a global prefix in main.ts:

app.setGlobalPrefix('api');

Every other REST call in this plugin (channel.ts channels list, fabric-client.ts postMessage, canvas) already prepends /api/ — only presence-sync missed it. Returned 404 "Cannot PUT /agents/...".

2. Wrong auth scheme

Plugin sent x-api-key: <fabricApiKey>, but the endpoint sits behind the global APP_GUARD = ApiKeyGuard which actually expects Authorization: Bearer <guildAccessToken> (despite the misleading guard name on the backend side).

Confirmed via:

$ docker exec fabric-backend-guild grep APP_GUARD /app/dist/app.module.js
# ApiKeyGuard

$ curl -X PUT -H 'x-api-key: <key>'    .../api/agents/<id>/presence  # 401 missing bearer token
$ curl -X PUT -H 'Authorization: Bearer <guildAccessToken>' .../api/agents/<id>/presence  # 200 OK

Fix

  • presence-sync.ts: do agent-login on demand to obtain a fresh guildAccessToken, cache it per-agent for 13 min (under the 15-min JWT TTL), use it as Bearer for the PUT. 401 response invalidates the cache so the next tick re-logs-in.
    • Pushes are gated on status changes (rare), so the login overhead is negligible.
  • inbound.ts: firstGuildEndpointByAgentfirstGuildByAgent storing both endpoint AND nodeId (presence-sync needs nodeId to pick the right token out of guildAccessTokens[]).
  • index.ts: pass FabricClient to PresenceSync constructor.

Verified in sim

Before:

[plugins] fabric: presence-sync PUT recruiter failed: 404 url=https://fabric-t3.../api/agents/<id>/presence body={"message":"missing bearer token","error":"Unauthorized","statusCode":401}

(404 was first symptom; 401 surfaced after the /api fix.)

After:

[plugins] fabric: presence-sync recruiter → idle

0 failed pushes. Confirmed via grep -c 'presence-sync .* →' /tmp/gw.log = 1, grep -c 'PUT .* failed' = 0.

Prod impact

This restores busy-discard on announce-type channels (the whole point of presence-sync). Without it, announce broadcasts were going to busy agents too. No code change ON the receiver side — just makes the existing receiver logic actually receive accurate status.

Two layered bugs in the `presence-sync` loop, both making every PUT fail forever in prod (silent log spam, busy-discard never actually applied to announce-type channels): ## 1. Missing `/api` prefix URL was `${guildBaseUrl}/agents/<id>/presence` but the guild backend sets a global prefix in `main.ts`: ```ts app.setGlobalPrefix('api'); ``` Every other REST call in this plugin (channel.ts channels list, fabric-client.ts postMessage, canvas) already prepends `/api/` — only `presence-sync` missed it. Returned 404 `"Cannot PUT /agents/..."`. ## 2. Wrong auth scheme Plugin sent `x-api-key: <fabricApiKey>`, but the endpoint sits behind the global `APP_GUARD = ApiKeyGuard` which actually expects `Authorization: Bearer <guildAccessToken>` (despite the misleading guard name on the backend side). Confirmed via: ```bash $ docker exec fabric-backend-guild grep APP_GUARD /app/dist/app.module.js # ApiKeyGuard $ curl -X PUT -H 'x-api-key: <key>' .../api/agents/<id>/presence # 401 missing bearer token $ curl -X PUT -H 'Authorization: Bearer <guildAccessToken>' .../api/agents/<id>/presence # 200 OK ``` ## Fix - `presence-sync.ts`: do `agent-login` on demand to obtain a fresh `guildAccessToken`, cache it per-agent for 13 min (under the 15-min JWT TTL), use it as Bearer for the PUT. 401 response invalidates the cache so the next tick re-logs-in. - Pushes are gated on status changes (rare), so the login overhead is negligible. - `inbound.ts`: `firstGuildEndpointByAgent` → `firstGuildByAgent` storing both `endpoint` AND `nodeId` (presence-sync needs `nodeId` to pick the right token out of `guildAccessTokens[]`). - `index.ts`: pass `FabricClient` to `PresenceSync` constructor. ## Verified in sim Before: ``` [plugins] fabric: presence-sync PUT recruiter failed: 404 url=https://fabric-t3.../api/agents/<id>/presence body={"message":"missing bearer token","error":"Unauthorized","statusCode":401} ``` (404 was first symptom; 401 surfaced after the /api fix.) After: ``` [plugins] fabric: presence-sync recruiter → idle ``` 0 failed pushes. Confirmed via `grep -c 'presence-sync .* →' /tmp/gw.log` = 1, `grep -c 'PUT .* failed' = 0`. ## Prod impact This restores busy-discard on `announce`-type channels (the whole point of presence-sync). Without it, announce broadcasts were going to busy agents too. No code change ON the receiver side — just makes the existing receiver logic actually receive accurate status.
hzhang added 1 commit 2026-05-25 22:55:02 +00:00
Two layered bugs in the presence-sync loop, both causing every PUT to
fail forever in prod:

1. **Missing /api prefix.** URL was `${guildBaseUrl}/agents/<id>/presence`
   but the guild backend sets a global prefix 'api' in main.ts
   `setGlobalPrefix('api')`. Every other REST call in this plugin
   (channel.ts channels list, fabric-client.ts postMessage, canvas)
   already prepends /api/ — only presence-sync missed it. Returned 404
   "Cannot PUT /agents/...".

2. **Wrong auth scheme.** Plugin sent `x-api-key: <fabricApiKey>`, but
   the endpoint sits behind the global APP_GUARD = ApiKeyGuard, which
   actually expects `Authorization: Bearer <guildAccessToken>` (despite
   its name — confusing naming on the backend side). With /api added,
   error became 401 "missing bearer token". Confirmed by `docker exec
   fabric-backend-guild grep APP_GUARD /app/dist/app.module.js` and
   manual curl: Bearer guild token → 200 OK.

**Fix**

- presence-sync.ts: do agent-login on demand to obtain a fresh
  guildAccessToken, cache it per-agent for 13 min (under the 15-min
  JWT TTL), use it as Bearer for the PUT. 401 response invalidates
  the cache so the next tick re-logs-in. Pushes are gated on status
  changes (rare), so the login overhead is negligible.

- inbound.ts: firstGuildEndpointByAgent → firstGuildByAgent storing
  both endpoint and nodeId (presence-sync needs nodeId to pick the
  right token out of guildAccessTokens[]).

- index.ts: pass FabricClient to PresenceSync constructor.

**Verified in sim**

After restart, gateway log shows `fabric: presence-sync recruiter →
idle` (200 OK), zero failed PUTs, where previously it would log a 404
every ~5s per agent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hzhang merged commit 79b29db26c into main 2026-05-25 23:17:45 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nav/Fabric.OpenclawPlugin#7