Commit Graph

3 Commits

Author SHA1 Message Date
9419d270e5 fix(presence-sync): tick mutex so setInterval overlap can't spawn parallel ticks
The presence-sync tick iterates accounts serially with await on each
agent-login + PUT round-trip — a single tick can easily run 20+s when
there are several accounts. setInterval(intervalMs) does NOT wait for
the previous tick to finish, so on a busy gateway the next tick fires
on top of a still-running one and two parallel iterations each PUT
the same agentId within ~10 ms. That tipped the guild backend's
first-time-insert race (separate fix in nav/Fabric.Backend.Guild) into
500s on prod (caught in t2 gateway 2026-05-25 23:23:35Z; 6 of 6 agents
showed paired log lines 4-10 ms apart for the same agent → idle).

Fix: a simple `inflight` boolean. tick() returns immediately if
already running; the next interval beat catches up. lastStatus !==
bridge.get gating already means status changes catch the next tick
anyway, so skipping a beat costs nothing the next beat won't fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 02:25:08 +01:00
a87de27cff fix(presence-sync): use /api prefix + Bearer guildAccessToken (not x-api-key)
Two layered bugs in the presence-sync loop, both causing every PUT to
fail forever in prod:

1. **Missing /api prefix.** URL was `${guildBaseUrl}/agents/<id>/presence`
   but the guild backend sets a global prefix 'api' in main.ts
   `setGlobalPrefix('api')`. Every other REST call in this plugin
   (channel.ts channels list, fabric-client.ts postMessage, canvas)
   already prepends /api/ — only presence-sync missed it. Returned 404
   "Cannot PUT /agents/...".

2. **Wrong auth scheme.** Plugin sent `x-api-key: <fabricApiKey>`, but
   the endpoint sits behind the global APP_GUARD = ApiKeyGuard, which
   actually expects `Authorization: Bearer <guildAccessToken>` (despite
   its name — confusing naming on the backend side). With /api added,
   error became 401 "missing bearer token". Confirmed by `docker exec
   fabric-backend-guild grep APP_GUARD /app/dist/app.module.js` and
   manual curl: Bearer guild token → 200 OK.

**Fix**

- presence-sync.ts: do agent-login on demand to obtain a fresh
  guildAccessToken, cache it per-agent for 13 min (under the 15-min
  JWT TTL), use it as Bearer for the PUT. 401 response invalidates
  the cache so the next tick re-logs-in. Pushes are gated on status
  changes (rare), so the login overhead is negligible.

- inbound.ts: firstGuildEndpointByAgent → firstGuildByAgent storing
  both endpoint and nodeId (presence-sync needs nodeId to pick the
  right token out of guildAccessTokens[]).

- index.ts: pass FabricClient to PresenceSync constructor.

**Verified in sim**

After restart, gateway log shows `fabric: presence-sync recruiter →
idle` (200 OK), zero failed PUTs, where previously it would log a 404
every ~5s per agent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 23:54:38 +01:00
a15dc880af feat(plugin): add presence-sync module (Phase 1 partial wire)
Drops the PresenceSync class file under src/. Reads each agents HF
status from globalThis.__hfAgentStatus (exposed by
HarborForge.OpenclawPlugin) every 30s and PUTs deltas to
Fabric.Backend.Guild PUT /agents/:userId/presence so the backend can
do busy-discard on announce channel deliveries.

Implementation:
- Diffs against in-memory lastStatus map per agentId; PUT only on
  change. No-op when __hfAgentStatus is undefined (HF plugin not
  loaded) — degrades gracefully, backend defaults presence to
  unknown which means no busy filtering.
- Per-account context: {agentId, fabricUserId, guildBaseUrl,
  fabricApiKey}. Uses x-api-key header so it goes through the
  existing ApiKeyGuard path on the backend.

NOT YET WIRED into index.ts gateway_start lifecycle. To finish
wiring, the registerFull block needs to:
  1. After FabricInbound.start() resolves, harvest each agents
     fabric user id (introspected by Center during session login —
     available on FabricSession.user.id).
  2. Build PresenceSyncAccount[] from those + the existing accounts
     list (which already has agentId + fabricApiKey + guildBaseUrl).
  3. presence = new PresenceSync(api.logger); presence.setAccounts(...);
     presence.start();
  4. presence.stop() on gateway_stop.

Reason for splitting: wiring needs the FabricInbound public API to
expose per-account session metadata, which is a small but separate
refactor. Module ships standalone now so the dependency direction is
clear and the wire-up patch is small.

See /home/hzhang/arch/DIALECTIC-V2-DESIGN.md section 7 (resolved
push-model design).
2026-05-23 11:32:24 +01:00