fix: dynamically sync inbound channel subscriptions #1

Merged
hzhang merged 1 commits from fix/inbound-dynamic-channel-sync into main 2026-05-21 06:56:49 +00:00
Contributor

The fabric inbound previously called joinAll() once on socket.io connect — fetched the agent's channel list via GET /api/channels?guildId=... and emitted join_channel for each. Any channel the agent joined after connect (e.g. a fresh DM created by another user that includes this agent) was unreachable until the gateway restarted: the socket was never subscribed to that room, so backend message.created push events never arrived.

Real-world failure (prod server.t2): hzhang created a DM channel with agent manager shortly after gateway restart. Manager's inbound was already connected with 0 channels subscribed; the new DM's messages went into a backend room nobody on this socket was listening to. Restart of the gateway picked up the new channel; without that, messages were silently lost.

Fix

Backend doesn't emit a user-scoped channel.joined event we could piggy-back on (grep emit /app/dist/realtime/realtime.gateway.js only shows message.created), so this is a poll-based reconciliation:

  • New channelSyncTimers field + 60s setInterval per (agent, guild) socket
  • syncChannels(kind): re-fetches /api/channels?guildId=... using a fresh guild token (freshGuildToken, survives 15-min token TTL)
  • Diff against a local joined: Set<string>
    • current - joinedsocket.emit('join_channel', {channelId}) + add
    • joined - currentsocket.emit('leave_channel', {channelId}) + remove
  • On connect: joined.clear() + run syncChannels('initial') (the server forgets subscriptions on reconnect)
  • stop(): clears all channelSyncTimers alongside socket disconnect

Logs distinguish initial (joined N channel(s)) vs delta (channel resync ... +N -M).

Trade-off

60s upper bound on detect-new-channel latency. A user-scoped backend channel.joined event would let us do this push-based with zero latency, but that requires backend work (separate change in Fabric backend-guild).

🤖 Generated with Claude Code

The fabric inbound previously called `joinAll()` once on socket.io `connect` — fetched the agent's channel list via `GET /api/channels?guildId=...` and emitted `join_channel` for each. **Any channel the agent joined *after* connect** (e.g. a fresh DM created by another user that includes this agent) was unreachable until the gateway restarted: the socket was never subscribed to that room, so backend `message.created` push events never arrived. Real-world failure (prod server.t2): hzhang created a DM channel with agent `manager` shortly after gateway restart. Manager's inbound was already connected with 0 channels subscribed; the new DM's messages went into a backend room nobody on this socket was listening to. Restart of the gateway picked up the new channel; without that, messages were silently lost. ## Fix Backend doesn't emit a user-scoped `channel.joined` event we could piggy-back on (`grep emit /app/dist/realtime/realtime.gateway.js` only shows `message.created`), so this is a poll-based reconciliation: - New `channelSyncTimers` field + 60s `setInterval` per `(agent, guild)` socket - `syncChannels(kind)`: re-fetches `/api/channels?guildId=...` using a fresh guild token (`freshGuildToken`, survives 15-min token TTL) - Diff against a local `joined: Set<string>` - `current - joined` → `socket.emit('join_channel', {channelId})` + add - `joined - current` → `socket.emit('leave_channel', {channelId})` + remove - On `connect`: `joined.clear()` + run `syncChannels('initial')` (the server forgets subscriptions on reconnect) - `stop()`: clears all `channelSyncTimers` alongside socket disconnect Logs distinguish initial (`joined N channel(s)`) vs delta (`channel resync ... +N -M`). ## Trade-off 60s upper bound on detect-new-channel latency. A user-scoped backend `channel.joined` event would let us do this push-based with zero latency, but that requires backend work (separate change in Fabric backend-guild). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hzhang added 1 commit 2026-05-21 06:46:27 +00:00
The fabric inbound previously called `joinAll()` once on socket.io
`connect` — it fetched the agent's channel list via
`GET /api/channels?guildId=...` and emitted `join_channel` for each.
Any channel the agent joined *after* connect (e.g. a fresh DM created
by another user that includes this agent) was unreachable until the
gateway restarted: the socket was never subscribed to that room, so
backend `message.created` push events never arrived.

Backend doesn't emit a user-scoped `channel.joined` event we could
piggy-back on (only `message.created`), so the fix is to poll. Every
60s the agent's channel list is re-fetched and diffed against a local
`joined` set:
- new channel ids → `socket.emit('join_channel', {channelId})` + add
- ids in `joined` but absent from the fresh list → `leave_channel`
  emit + remove (best-effort; cleans subs if the agent is removed from
  a channel)

Re-uses `freshGuildToken()` so the resync fetch survives token
expiry (15-min TTL). Initial `connect` resets the local `joined`
set since the server forgets prior room subscriptions on reconnect.

Timers are tracked in `channelSyncTimers` and cleared in `stop()`
alongside socket disconnect.

Verified against prod server.t2 scenario: hzhang creates DM channel
including agent 'manager' → without this fix, manager only sees the
message after a gateway restart; with this fix, manager receives the
message within at most 60s (next resync tick).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hzhang merged commit 8224975119 into main 2026-05-21 06:56:49 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nav/Fabric.OpenclawPlugin#1