openclaw's `channelManager.getRuntimeSnapshot()` — called every minute
by the channel-health-monitor — runs accounts through
`applyDescribedAccountFields(next, plugin.config.describeAccount?.(...))`.
When the callback is missing it defaults `configured: true`. Fabric
never defined it, so every health-monitor cycle:
snapshot = { enabled: true, configured: true, running: false }
For fabric's synthetic 'default' account (returned by
`listFabricAccountIds` when `channels.fabric.accounts` is empty —
the prod shape, where per-agent api-keys live in
`~/.openclaw/fabric-identity.json` and the channel framework never
runs `startAccount` so `running` stays false):
isManagedAccount({enabled:true, configured:true}) === true
-> not-running -> 'stopped' -> restart every ~10 min, logging
'[fabric:default] health-monitor: restarting (reason: stopped)'
The restart is a no-op (fabric's `gateway.startAccount` is absent so
`startChannelInternal` returns early), but the log is loud and
operators chasing real outages keep wasting time on it.
Mirror `isConfigured` from describeAccount so the snapshot
truthfully reports configured:false for any account without a
fabricApiKey. The fabric plugin still self-manages real agents via
`gateway_start` -> `FabricInbound.start()`; the framework just no
longer thinks 'default' is something it should restart.
Verified in sim (this patch alone, no debug instrumentation):
- gateway up 8+ minutes, 0 restart events
- pre-patch sim with same config restarted at 5min mark
- evaluateChannelHealth snapshot for both 'default' and 'recruiter'
accountId reads configured:false (instrumented with temporary
console.log in channel-health-policy, since reverted)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend issues short-lived guildAccessToken (TTL=900s). The previous
`auth: { token: tok }` shape captured the JWT once in connectAgent's
closure: after socket.io's auto-reconnect the backend kept getting the
same expired JWT and silently rejected the handshake at the application
layer (RealtimeGateway logs 'socket rejected: <id>'). The client's
'connect' event still fired (TCP succeeded) so the plugin happily ran
the channel-resync, emitted join_channel into the void, and logged
'joined N channel(s)' while the backend was actually broadcasting
message.created to a room with zero subscribers. End-user symptom:
DMs/group messages to agents silently dropped 15 min after gateway
start, with no error anywhere on the agent side.
Switch to the callback form, which socket.io re-evaluates on every
(re)connect — same call site we already use for the HTTP path via
freshGuildToken/tokenCache.
Verified in sim (commit 2acb084 + this patch):
1. Connect new DM channel + post msg -> dispatch + reply ✓
2. `docker restart fabric-backend-guild` to force socket disconnect
3. Plugin reconnects automatically and logs
'fabric: agent recruiter joined 12 channel(s) on sim-guild-1' ✓
(without the fix this reconnect was silently rejected; sim used to
log 'WARN socket rejected: <id>' on the guild backend)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The presence-sync tick iterates accounts serially with await on each
agent-login + PUT round-trip — a single tick can easily run 20+s when
there are several accounts. setInterval(intervalMs) does NOT wait for
the previous tick to finish, so on a busy gateway the next tick fires
on top of a still-running one and two parallel iterations each PUT
the same agentId within ~10 ms. That tipped the guild backend's
first-time-insert race (separate fix in nav/Fabric.Backend.Guild) into
500s on prod (caught in t2 gateway 2026-05-25 23:23:35Z; 6 of 6 agents
showed paired log lines 4-10 ms apart for the same agent → idle).
Fix: a simple `inflight` boolean. tick() returns immediately if
already running; the next interval beat catches up. lastStatus !==
bridge.get gating already means status changes catch the next tick
anyway, so skipping a beat costs nothing the next beat won't fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two layered bugs in the presence-sync loop, both causing every PUT to
fail forever in prod:
1. **Missing /api prefix.** URL was `${guildBaseUrl}/agents/<id>/presence`
but the guild backend sets a global prefix 'api' in main.ts
`setGlobalPrefix('api')`. Every other REST call in this plugin
(channel.ts channels list, fabric-client.ts postMessage, canvas)
already prepends /api/ — only presence-sync missed it. Returned 404
"Cannot PUT /agents/...".
2. **Wrong auth scheme.** Plugin sent `x-api-key: <fabricApiKey>`, but
the endpoint sits behind the global APP_GUARD = ApiKeyGuard, which
actually expects `Authorization: Bearer <guildAccessToken>` (despite
its name — confusing naming on the backend side). With /api added,
error became 401 "missing bearer token". Confirmed by `docker exec
fabric-backend-guild grep APP_GUARD /app/dist/app.module.js` and
manual curl: Bearer guild token → 200 OK.
**Fix**
- presence-sync.ts: do agent-login on demand to obtain a fresh
guildAccessToken, cache it per-agent for 13 min (under the 15-min
JWT TTL), use it as Bearer for the PUT. 401 response invalidates
the cache so the next tick re-logs-in. Pushes are gated on status
changes (rare), so the login overhead is negligible.
- inbound.ts: firstGuildEndpointByAgent → firstGuildByAgent storing
both endpoint and nodeId (presence-sync needs nodeId to pick the
right token out of guildAccessTokens[]).
- index.ts: pass FabricClient to PresenceSync constructor.
**Verified in sim**
After restart, gateway log shows `fabric: presence-sync recruiter →
idle` (200 OK), zero failed PUTs, where previously it would log a 404
every ~5s per agent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inbound was hardcoding `peer: { kind: 'group' }` and `ChatType: 'group'`
for every fabric channel regardless of xType. As a result:
- sessionKey for a DM was `agent:<id>:fabric:group:<chan>` instead of
`agent:<id>:fabric:direct:<chan>`
- ctx.ChatType='group' caused user-prompt metadata to render
`is_group_chat: true` on a DM
- openclaw's `isDirectMessage()` check (ChatType==='direct') returned
false, so DM-specific prompt and turn behavior never engaged
Caught by recruiter test in session 40c51de2: the model's thinking trace
acknowledged "fabric DM channel" (from the ClawPrompts chat-injector
hook) but the surrounding user-prompt metadata contradicted it with
`is_group_chat: true`, and the model reasoned its way out of running
`workflow_start`.
Fix factors a small helper `fabricPeerRoutingForXType` (and a cache-
backed `fabricPeerRoutingForChannel` for outbound) in channel.ts that
maps:
- 'dm' → { peerKind: 'direct', chatType: 'direct' }
- rest → { peerKind: 'group', chatType: 'group' } (no change)
Inbound uses m.xType directly (live, authoritative). Outbound has no
xType in its call signature, so it consults the channel-meta cache
populated by inbound (same `getChannelType` already exposed via
__fabric). Cache miss falls back to 'group' — the pre-fix default, no
regression. The proactive-DM-without-prior-inbound edge case still
routes that one outbound as 'group'; the next round agrees on 'direct'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inbound `message.created` already carries `xType` (dm / triage / group /
broadcast / etc.) — record it in a per-channel cache so other plugins
can answer "is this channel a DM?" without poking the Center API.
New module src/channel-meta.ts:
- in-memory Map<channelId, xType>
- lazily loaded from ~/.openclaw/fabric-channel-meta.json on first
access (so first-ever DM after a fresh gateway start still hits
cache from the previous run)
- debounced 250ms flush on dirty; force-flush on gateway_stop
- recordChannelType(channelId, xType): called from inbound
- getChannelType(channelId): null if unknown — caller MUST treat null
as "don't know", NOT as "assume DM" (would re-introduce the false-
positive on group channels we're trying to eliminate)
Wiring:
- inbound.ts socket.on('message.created'): records xType BEFORE the
self-author / dedup gates (channel type is observer-agnostic)
- index.ts: installs globalThis.__fabric = { getChannelType } on
registerFull(); flushes on gateway_stop
Consumer: ClawPrompts' fabric-chat-injector will start gating its prompt
injection on getChannelType(channelId) === 'dm' (companion PR on
ClawPrompts). Removes the phase-1 "any fabric channel" false-positive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenClaw plugin-sdk's registerTool execute signature is:
execute: async (_id: string, params) => { ... }
Fabric tools were calling it as `(p) => { ... }`, so `p` held the
call id (a string) and the real args were silently dropped onto the
floor. Every tool that read a required field from `p` failed with
the field surfacing as undefined.
fabric-guild-list (just added) appeared to work because all its
properties are optional — `p.nameFilter` and `p.purposeFilter`
both being undefined produced empty filter needles, which let the
unfiltered guild list through. The real bug surfaced the moment
fabric-channel-list (required: guildNodeId) was invoked: the
ctxGuild helper saw `undefined` and reported `agent not a member
of guild undefined`.
Compare dialectic plugin's tools.ts which has always used the
correct `async (_id: string, params) => {...}` shape and worked
end-to-end. Aligning the fabric signature to match.
Verified end-to-end on sim:
- fabric-guild-list returns 1 guild with the purpose set via the
new `cli node set-purpose`
- fabric-channel-list returns 3 channels including a now-populated
`purpose` field on each row
- fabric-channel-set-purpose successfully patches a channel and
the subsequent fabric-channel-list shows the new purpose
Adds two agent-facing tools that close the discoverability loop:
- fabric-guild-list — enumerates guilds the agent belongs to with
name + purpose + status (no api calls beyond the existing agentLogin
response). Optional nameFilter/purposeFilter for narrowing.
- fabric-channel-set-purpose — PATCH /api/channels/:id { purpose }
so agents can backfill or update an existing channel's purpose.
Extends existing tools:
- fabric-channel-list now returns purpose on each row.
- create-{chat,work,report,discussion}-channel accept optional purpose.
FabricClient + FabricSession type changes carry the new field through.
Manifest contracts.tools updated (jiti loader needs both manifest entry
and onStartup activation to register).
Lets workflows that previously needed hardcoded channel ids instead say
'find a guild whose purpose mentions debate, then a channel of x_type
announce whose purpose covers public debate broadcasts.'
Companion to nav/Fabric.Backend.Guild#<TBD> which adds the server-side
emitToUser broadcast on channel membership changes. Before, the inbound
only learned about new channels via the 60s polling resync (worst-case
60s lag). Now the backend tells us directly so sub/unsub is realtime.
socket.on('channel.joined', evt) → join the socket.io room for evt.channelId
and add to the local 'joined' set.
socket.on('channel.left', evt) → leave + remove from 'joined'.
Both events are idempotent (`if (joined.has(id))` / `if (!joined.has(id))`)
so duplicate emits from server are safe. Polling resync still runs every
60s as a safety net for transient socket drops between emit and
reconnect, partial server failures, etc.
When backend lacks this support (older deployments), nothing breaks —
the event simply never fires and polling carries the load as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fabric inbound previously called `joinAll()` once on socket.io
`connect` — it fetched the agent's channel list via
`GET /api/channels?guildId=...` and emitted `join_channel` for each.
Any channel the agent joined *after* connect (e.g. a fresh DM created
by another user that includes this agent) was unreachable until the
gateway restarted: the socket was never subscribed to that room, so
backend `message.created` push events never arrived.
Backend doesn't emit a user-scoped `channel.joined` event we could
piggy-back on (only `message.created`), so the fix is to poll. Every
60s the agent's channel list is re-fetched and diffed against a local
`joined` set:
- new channel ids → `socket.emit('join_channel', {channelId})` + add
- ids in `joined` but absent from the fresh list → `leave_channel`
emit + remove (best-effort; cleans subs if the agent is removed from
a channel)
Re-uses `freshGuildToken()` so the resync fetch survives token
expiry (15-min TTL). Initial `connect` resets the local `joined`
set since the server forgets prior room subscriptions on reconnect.
Timers are tracked in `channelSyncTimers` and cleared in `stop()`
alongside socket disconnect.
Verified against prod server.t2 scenario: hzhang creates DM channel
including agent 'manager' → without this fix, manager only sees the
message after a gateway restart; with this fix, manager receives the
message within at most 60s (next resync tick).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
inbound: FabricMessage gains xType; the wakeup gate is bypassed when
xType==='dm' (self messages are already filtered upstream), so a 1:1
dm always reaches the model regardless of wakeup metadata.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenClaw delivers an agent turn whose blocks are text -> thinking/tool
-> text via multiple inbound deliver() calls (a non-text block is a
delivery boundary), so one turn became N Fabric messages.
Fix: buffer deliver() segments per channel (src/coalesce.ts) and flush
them as ONE postMessage at a deterministic boundary — the finally after
dispatchInboundReplyWithBase() resolves, which provably runs only after
every deliver() of the turn (verified: deliver,deliver -> dispatch
returned -> flush). No hooks, no timers, no idle guessing. The
agent_end hook was rejected: it fires BEFORE deliver(). gateway_stop
flushes any leftover; a long safety timeout is a leak-guard only.
channels.fabric.coalesce=false restores raw per-segment posting.
Verified on local openclaw + Fabric with a fake text/thinking/text
model: single trigger -> exactly one merged message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The slash-command sync secret now comes from
channels.fabric.commandsSyncKey (configSchema marks it required) and
is no longer read from FABRIC_COMMANDS_SYNC_KEY env. command-sync
resolves it from config and threads it into client.syncCommands;
when absent, sync is skipped with a clear warning. README updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
syncCommands attaches the FABRIC_COMMANDS_SYNC_KEY header when the
operator sets it, so the guild can restrict slash-command catalog
writes to this plugin. No-op / backward compatible when unset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- command-sync.ts: buildFabricCommandSpecs(cfg) reads OpenClaw native
command specs via openclaw/plugin-sdk/native-command-registry
(listNativeCommandSpecsForConfig + findCommandByNativeName), resolves
dynamic arg choices to a static snapshot (resolveCommandArgChoices) —
same data Discord registers as slash commands.
- syncFabricCommands(): on gateway_start, after inbound starts, PUT the
catalog to each connected guild (FabricClient.syncCommands ->
PUT /api/commands; idempotent, one per guild).
- Fabric stays a TEXT-command surface (no nativeCommands capability):
execution still flows as a /<cmd> message into OpenClaw's command
system; this catalog only drives frontend autocomplete.
Verified: 41 specs built (args/choices incl. dynamic), synced to
test-guild1, GET /api/commands round-trips count=41.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One tool, three actions backed by FabricClient channelMembers (GET
/channels/:id/members -> [{userId,bypass}]), joinChannel, and new
leaveChannel (POST /channels/:id/leave).
Verified: client-level smoke against the running guild — members
initial=[tester], after join echo2 present, after leave echo2 gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- bin/fabric-register.mjs: only AGENT_ID is read from the environment;
--api-key is flag-only (no FABRIC_API_KEY); dropped FABRIC_CENTER_API_BASE
/ FABRIC_IDENTITY_FILE / OPENCLAW_PATH env fallbacks (flags + sensible
defaults; --center still falls back to openclaw.json).
- New fabric-canvas tool (one tool, four actions): read / share / update /
close the channel's single pinned canvas. Backed by FabricClient
get/share/update/removeCanvas (GET/PUT/PATCH/DELETE; empty 2xx body ->
null). update/close are sharer-only server-side.
- README updated.
Verified: client-level smoke against the running guild —
read(empty→null) → share(v1) → read → update(v2) → close(→null) all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Binding an agent's Fabric API key was an OpenClaw tool; make it a
self-contained Node script installed to ~/.openclaw/bin/fabric-register
instead.
- bin/fabric-register.mjs: no plugin deps; AGENT_ID env wins, else
--agent-id required; --api-key validated via POST /auth/agent/login;
on success upserts ~/.openclaw/fabric-identity.json (format matches
IdentityRegistry). Flags/env for center, identity-file, openclaw-path.
- install.mjs: copy the script to ~/.openclaw/bin (chmod 0755) on
install, remove on uninstall; Next-steps updated.
- tools.ts: drop the fabric-register tool; ctxGuild error now points to
the script / static accounts config.
- README updated.
Verified: missing-id -> exit 2; --agent-id and AGENT_ID both bind and
write a valid identity file; bad key -> 401, no write.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously a non-wakeup message returned immediately and was fully
discarded — the agent kept zero record of it, so when later woken in a
discuss/work channel it replied without the conversation context.
Now non-wakeup messages are ingested into the agent's OpenClaw session
via recordInboundSession (createIfMissing) WITHOUT dispatch: the real
model is not invoked and nothing is sent back to Fabric. This is
correct for the turn engine — only the woken speaker emits a normal
message or /no-reply; non-woken agents stay silent — while still
giving the agent full channel context whenever it IS woken.
Verified live: report-channel (all recipients wakeup=false) message
logs 'recorded (no wakeup, history only)' with 0 dispatch/deliver/
posted; wakeup path unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenClaw defaults group-chat replies to sourceReplyDeliveryMode
'message_tool_only', which suppresses auto-delivery of the agent's
text reply (it expects the agent to call a message tool). With
ChatType 'group', the Fabric plugin's deliver callback was therefore
NEVER invoked — the agent ran but no reply ever returned to Fabric.
Fabric already gates *when* an agent speaks via the per-recipient
wakeup flag, so once a turn is dispatched the reply must always flow
back. Pass replyOptions.sourceReplyDeliveryMode='automatic' so
OpenClaw delivers the agent's reply through regardless of
the group default (source-reply-delivery-mode honors a truthy
requested mode).
Verified live end-to-end: human posts -> wakeup -> agent runs ->
'fabric: deliver' + 'fabric: posted reply' -> agent message appears
in the Fabric channel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Guild access tokens are short-lived (~15 min); the inbound socket
survives via socket.io reconnect but the token captured at connect
time goes stale, so attachment downloads (and reply posts) start
401ing on long-lived agents. Re-login with the agent's Fabric API key
on a short TTL and use the fresh token for fetch + post.
Verified live: 'fabric: fetched 1 attachment(s)' now succeeds where it
previously logged 'attachment fetch 401'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live round-trip test showed openclaw's SSRF guard blocking the
localhost guild file URL passed via MediaUrls. We already download the
bytes with the agent's guild token, so MediaUrls is redundant and
noisy — provide only local MediaPaths/MediaTypes. Verified: plugin
logs 'fetched N attachment(s)' and the SSRF WARN is gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unlike Discord, Fabric has no message-length cap. Single-chunk chunker
(text -> [text]), textChunkLimit=MAX_SAFE_INTEGER, capabilities
blockStreaming=false, replyOptions.disableBlockStreaming=true -> every
agent reply delivered as exactly one Fabric message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real channel-turn dispatch (resolveAgentRoute + finalizeInboundContext +
dispatchInboundReplyWithBase), wakeup->drop/dispatch, messaging target
grammar (fabric:<id>) + outbound.sendText, tools use execute/parameters.
Verified live: human msg in Fabric -> wakeup -> openclaw agent runs ->
reply posted back into the Fabric channel as the agent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>