The socket.io client was sending its own EIO ping frames every
pingInterval (default 25s). That's wrong for engine.io v4: in v4 the
SERVER initiates pings and the CLIENT must respond with pong inside
pingTimeout, else the server closes the connection. Client-initiated
pings get misinterpreted by Fabric's NestJS socket.io backend, which
quietly closes the connection — producing the warn-flap every ~25s:
inbound: socket ended; reconnecting
err="read: failed to get reader: received close frame:
status = StatusNoStatusRcvd and reason = \"\""
Fix:
- delete pingLoop() entirely
- delete the pingPeriod/pingTimeout struct fields + their assignments
in recvOpen (server enforces both anyway; client doesn't need them)
- keep the eioPing case in handlePacket (already correct — responds
with pong)
- drop the now-unused "time" import
End-to-end verified on live Fabric:
- Restarted Plexum at 20:17:35; watched for 90+ seconds
- ZERO "socket ended" events (vs. ~3-4 per 90s before the fix)
- Channel inbound still delivers: alice posted seq=20 → gem agent
(gemini CLI) replied seq=21 "pong"
The plugin no longer flaps. Reconnect backoff machinery (1s→60s)
stays in place as a safety net for genuine network drops.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end Fabric inbound→Plexum→Fabric outbound now works against a
live Fabric stack:
alice posts in bt2-clean (Fabric REST)
→ guild emits message.created over socket.io
→ plugin's wakeup gate decides dispatch
→ notifications/plexum/channel/inbound to host
→ Plexum agent runs (echo provider)
→ outbound `send` tool posts via Fabric REST
→ fabrictester reply visible in channel
internal/socketio/ (~280 LOC + 2 tests):
- Minimal Engine.IO v4 + Socket.IO v5 client over websocket
- WebSocket-only transport (skip polling upgrade dance)
- AuthFunc callback re-evaluated on every (re)connect — fixes the
stale-JWT-on-reconnect bug openclaw plugin documented for the JS
client's single-shot auth, which the available Go socket.io
library (zishang520) doesn't address either
- PING/PONG per server-supplied interval
- Caller-driven reconnect: Connect returns on close, supervisor
re-dials with fresh token
internal/tokens/ (~95 LOC + 9 tests):
- Per-agent session cache with 8min TTL (matches openclaw's
TOKEN_TTL_MS); guild tokens are ~15min so 8min keeps a margin
- Invalidate forces re-login (used by inbound when CONNECT auth fires)
- GuildToken helper picks the per-guild JWT from the cached session;
if the guild is missing from the cache, invalidate + retry once
internal/inbound/ (~290 LOC):
- Supervisor: one socket.io conn per (agent, guild); reconnect with
fresh token on drop; ChannelSyncInterval (60s) polling + push
channel.joined/channel.left handlers
- Wakeup gate: dm channels deliver any non-self message; other
x_types require wakeup=true (record-only for non-wake non-dm
deferred — Plexum has no history-injection equivalent in v1)
- Self-author filter on selfUserId from cached session
- Per-(agent,msgId) dedup bounded to 5000 entries
- Per-channel serial queue with 5s idle drain so concurrent inbounds
on the same channel run one-at-a-time (matches openclaw plugin)
- Emits notifications/plexum/channel/inbound with session_id =
"s_fab_<fabric_channel_id>" for stable per-channel session continuity
cmd/plexum-fabric-channel-plugin:
- Wires inbound supervisor at Init; runs in a background goroutine
for the plugin's lifetime
- Replaces F-1's sessions map with tokens.Cache (same warm-sessions
behavior, now backed by TTL)
- hostLogHandler: bridges slog records from inbound supervisor to
HostAPI.Log notifications
F-2 deferred to F-3+:
- record-only history injection (Plexum v1 has no equivalent)
- tools.ts port (15 MCP tools — channel/canvas/sub-discussion family)
- presence-sync, command-sync, attachments, coalesce parity
Tests: 22 (5 identity + 6 config + 9 tokens + 2 socketio).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>