OpenClaw plugins return tool results in one of two shapes:
(a) AgentToolResult — { content: [{type:'text', text:'...'}] }
used when the plugin wraps via asContent() helper. Every
Dialectic.OpenclawPlugin tool follows this pattern.
(b) raw JSON-able object — { ok:true, ...domain fields }
used when the plugin returns data directly. Every
Fabric.OpenclawPlugin tool follows this pattern
(fabric-channel-list, fabric-guild-list, fabric-send-message,
fabric-channel-set-purpose, etc).
The bridge's /mcp/execute handler only handled shape (a). When a
contractor agent (developer / contractor-test) called any fabric
tool through Claude Code, the bridge ran the tool successfully but
fell back to the literal string '(no result)' because
toolResult.content was undefined. Claude Code then dutifully
rendered '(no result)' as the tool result.
Reproduced on prod:
openclaw agent --agent developer -m 'Call fabric-channel-list ...'
→ claude code session called mcp__openclaw__fabric-channel-list
→ bridge logged: mcp/execute tool=fabric-channel-list ...
→ bridge replied: { result: '(no result)' }
→ claude code rendered: ''
Fix: normalize the result in the bridge. If toolResult is null →
empty string; if it has a .content array → join the text segments
(shape a); if it's a string → use directly; else → JSON.stringify
the whole thing (shape b). Falls back to '(no result)' only when
all of those produce empty string.
Verified on prod after fix:
agent receives real {"ok":true,"count":1,"channels":[...]}
JSON payload (one real prod-push-test channel) in the response.
OpenClaw's LLM idle watchdog (default 120s) fires on lack of *model
progress*, not lack of bytes — an SSE comment frame (": keepalive\n\n")
keeps the TCP socket alive but isn't recognized as progress, so a long
quiet tool-call phase still idles out. When that happens OpenClaw falls
back to re-sending the prior turn's assistant text (pi-embedded:1308
fallbackAnswerText), producing duplicate-Discord-message symptoms.
Heartbeat now emits a real chat.completion.chunk with an empty content
delta every 30s. Clients drop empty deltas; the upstream idle watchdog
should count it as model progress because it's a real event on the
canonical streaming channel.
scripts/install.mjs now spreads the existing provider entry before
overriding script-managed fields, so user-added fields like
timeoutSeconds survive reinstall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The May 7 fix made the bridge detect /new turns by scanning messages
for the bare-reset marker ("A new session was started via /new or
/reset"). That handles the case where /new is the body of the
current user turn, but misses a very common path: the user types
`/new` as a standalone slash command. OpenClaw processes those in a
side lane (e.g. agent:<id>:discord:slash:<chat>) that doesn't go
through the bridge — it just renames the old session file aside.
The follow-up real message then lands on a brand-new OpenClaw
session, but as a normal turn with `softResetTriggered=false`,
non-empty body, not bare /new — so isBareSessionReset is false in
OpenClaw (get-reply isBareSessionReset condition) and the marker is
never injected. The bridge keeps resuming the long-stale
claudeSessionId from before the reset.
OpenClaw always sends the full conversation history each turn
(system + user/assistant pairs + latest user). A request with zero
assistant turns in messages[] is therefore a positive signal that
the OpenClaw session is brand-new and any prior claudeSessionId we
hold belongs to an abandoned OpenClaw session.
Treat "no assistant history" as equivalent to bareSessionReset:
removeSession + existingEntry = null, so dispatchToClaude is called
without --resume and claude starts a fresh CLI session whose id we
then store. Also covers any future OpenClaw reset path that resets
the session without injecting the marker (idle timeout new-session,
admin tooling, etc.).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coordinated fixes for the duplicate-Discord-message bug where the
same prompt would be answered by two different claude subprocesses
running in parallel.
Root cause: handleChatCompletions had no concurrency control and no
way to detect when OpenClaw closed the upstream HTTP connection. When
OpenClaw's idle watchdog tripped (default 120s of stream silence), it
would close the socket and retry the prompt — but the original claude
subprocess kept running, and the bridge spawned a second one alongside
it. Both eventually streamed back, both got delivered to Discord.
Native (non-bridge) flow doesn't hit this because OpenClaw's fetch is
abort-aware end-to-end: attempt timeout fires AbortSignal, fetch closes
the socket, the model provider sees it, work stops. Bridge broke the
chain at "spawn subprocess" — this restores it.
Changes:
* SSE heartbeat (server.ts): write a `: keepalive\n\n` SSE comment
every 30s while a turn is in flight. Counts as bytes on the wire so
upstream idle timer resets, but is a spec-mandated no-op for the
OpenAI stream parser. Eliminates the 120s-silence trigger that was
causing OpenClaw to give up on long tool-call sequences in the first
place.
* Abort propagation (server.ts + both adapters): hook req.on('close')
to an AbortController and pass signal: through to dispatchToClaude /
dispatchToGemini. Adapters listen on signal abort and call markDone
→ scheduleCleanup which SIGTERMs the child process group (3s grace
for claude, 5s for gemini) then SIGKILLs. Mirrors what native fetch
does when its caller aborts.
* Per-sessionKey FIFO queue (server.ts): same-session turns serialize
via a Map<sessionKey, Promise<void>> chain so a user firing multiple
Discord messages back-to-back gets them processed in order rather
than spawning concurrent subprocesses (which would corrupt the shared
--resume session file). Cross-session requests live on independent
chains and run in parallel.
Subtle correctness points:
* getSession() moved to head-of-queue so we resume into the latest
claudeSessionId from the just-finished prior turn instead of a stale
request-arrival snapshot.
* Aborted turns skip session-map persistence — the subprocess may have
already updated its own session file on disk, so the next retry
resumes from there.
* Queue chain GC uses Map identity check so we don't delete an entry
that a later request has already chained onto.
* prev.then(() => mySlot, () => mySlot) tolerates a crashed prior turn
so the chain doesn't poison forever.
* writeHead(200) before queue wait so OpenClaw sees response status
immediately; heartbeat covers the queue-wait quiet period.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bridge was keying claudeSessionId by agentId alone, so every Discord
channel, DM, and cron run for a single agent shared one Claude CLI
session. Two consequences in the wild:
- Cross-channel context bleed: 8.7MB session for `developer` mixed
references from channels 1474327736242798612 and 1498579994044010566
plus the operator DM all in one --resume thread.
- `/new` had no effect on the CLI side. OpenClaw rotated its session
file but the bridge kept --resume-ing the same long-lived
claudeSessionId, eventually crossing the 1M model context (debug log
showed `prompt is too long: 1179616 tokens > 1000000 maximum`).
Changes:
* input-filter: extract `chat_id` from the Conversation-info
untrusted-metadata block (scanning all messages, since runtimeOnly
turns put it in the system prompt) and detect bare `/new`/`/reset`
via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey
`${agentId}::${chatId}` and resolveDispatchPrompt fallback for the
empty user message that OpenClaw sends on bare resets.
* server: use the composite session key for getSession/putSession;
on bareSessionReset, removeSession before dispatching so the CLI
starts a fresh session; on a CLI result_error (typically
prompt_too_long) drop the entry too so the next turn doesn't
re-resume into the poisoned context.
* claude/sdk-adapter: surface CLI terminal errors via a new
`result_error` event (carries reason + sessionId) so the bridge
can react instead of just streaming the synthetic
"Prompt is too long" assistant text and silently re-using the
same session.
* index: convert register() to synchronous (OpenClaw rejects async
register with "plugin register must be synchronous"); replace the
pre-bind port probe with a server-level EADDRINUSE handler.
* .gitignore: ignore node_modules/ and dist/.
- Replace --dangerously-skip-permissions with --allowedTools whitelist
to support running Claude Code as root (root blocks the former flag)
- Fix /mcp/execute tool lookup for plugins that register tools via
factory functions (e.g. padded-cell pcexec) where the global registry
names array is empty — now falls back to instantiating factories and
matching by returned tool name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Migrate src/ → plugin/ (plugin/core/, plugin/web/, plugin/commands/)
and src/mcp/ → services/ per OpenClaw plugin dev spec
- Add Gemini CLI backend (plugin/core/gemini/sdk-adapter.ts) with GEMINI.md
system-prompt injection
- Inject bootstrap as stateless system prompt on every turn instead of
first turn only: Claude via --system-prompt, Gemini via workspace/GEMINI.md;
eliminates isFirstTurn branch, keeps skills in sync with OpenClaw snapshots
- Fix session-map-store defensive parsing (sessions ?? []) to handle bare {}
reset files without crashing on .find()
- Add docs/TEST_FLOW.md with E2E test scenarios and expected outcomes
- Add docs/claude/BRIDGE_MODEL_FINDINGS.md with contractor-probe results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>