fix(bridge): recover inline-prefixed metadata in user message body

OpenClaw's canonical convention is to emit metadata envelopes (chat_id, sender, reply target, …) as SEPARATE user-role messages folded into the openai-completions request right after the real one — `extractLatestUserMessage` skips those whole. Fabric.OpenclawPlugin's dispatch does not split: it passes metadata blocks and the real user content as ONE merged user-message body, separated by blank lines. With the prior filter that meant the entire turn was dropped with "no user message found" (HTTP 400) because the first line matched a sentinel — the actual prompt sitting after the metadata blocks never reached the bridge. When the whole-body check fails for a single-message body, walk past leading sentinel-prefixed blocks (sentinel header + optional ```json code fence + blank-line separator) and use whatever non-metadata block follows. Falls back to the previous "skip entirely" semantics when the body is metadata-only. End-user symptom that surfaced this: every contractor agent (Claude / Gemini) subscribed to a Fabric channel silently failed to reply to sub-discussion messages during recruitment — fabric dispatch said "completed" in 1.6s but trajectory had `assistantTexts: []`, `terminalError: non_deliverable_terminal_turn`, `errorMessage: "400 \"no user message found\""`. Surfaced recruiting developer1 on prod-t2 2026-05-31.
fix(bridge): /mcp/execute handles raw-object tool results (not just AgentToolResult)
2026-05-31 20:52:15 +01:00 · 2026-05-24 09:33:12 +01:00 · 2026-05-14 08:53:22 +00:00 · 2026-05-13 17:03:02 +00:00 · 2026-05-13 16:40:16 +00:00 · 2026-05-08 07:54:11 +00:00
8 changed files with 529 additions and 123 deletions
--- a/plugin/commands/register-cli.ts
+++ b/plugin/commands/register-cli.ts
@@ -1,4 +1,4 @@
-import type { OpenClawPluginApi } from "openclaw/plugin-sdk";
+import type { OpenClawPluginApi } from "openclaw/plugin-sdk/core";
 import { runContractorAgentsAdd } from "./contractor-agents-add.js";

 export function registerCli(api: OpenClawPluginApi): void {
--- a/plugin/core/claude/sdk-adapter.ts
+++ b/plugin/core/claude/sdk-adapter.ts
@@ -37,6 +37,15 @@ export type ClaudeDispatchOptions = {
  bridgePort?: number;
  /** Bridge API key for MCP proxy callbacks */
  bridgeApiKey?: string;
+  /**
+   * Abort signal from the bridge. When fired (typically because the upstream
+   * HTTP client closed the socket — OpenClaw's attempt-level retry / cancel),
+   * we kill the claude subprocess group and break out of the iterator
+   * promptly so a stale subprocess doesn't keep streaming into a dead socket
+   * (or worse, get its output multiplexed with a fresh subprocess started by
+   * a retry).
+   */
+  signal?: AbortSignal;
 };

 // Resolve the MCP server script path relative to this file.
@@ -109,6 +118,7 @@ export async function* dispatchToClaude(
    openclawTools,
    bridgePort = 18800,
    bridgeApiKey = "",
+    signal,
  } = opts;

  // NOTE: put prompt right after -p, before --mcp-config.
@@ -202,6 +212,18 @@ export async function* dispatchToClaude(
    }
  };

+  // Hook the upstream abort signal: when the bridge's HTTP client (OpenClaw)
+  // closes the socket, propagate that into our process tree by SIGTERM/SIGKILL
+  // (via scheduleCleanup) and break out of the iterator (via markDone). This
+  // prevents stale subprocesses from outliving the request that started them.
+  if (signal) {
+    if (signal.aborted) {
+      markDone();
+    } else {
+      signal.addEventListener("abort", () => markDone(), { once: true });
+    }
+  }
+
  rl.on("line", (line: string) => {
    if (!line.trim()) return;
    let event: Record<string, unknown>;
--- a/plugin/core/gemini/sdk-adapter.ts
+++ b/plugin/core/gemini/sdk-adapter.ts
@@ -29,6 +29,13 @@ export type GeminiDispatchOptions = {
  openclawTools?: OpenAITool[];
  bridgePort?: number;
  bridgeApiKey?: string;
+  /**
+   * Abort signal from the bridge. Mirror of dispatchToClaude's `signal` —
+   * see that file for rationale. When fired, we kill the gemini subprocess
+   * and break the iterator promptly so a stale process doesn't outlive
+   * the upstream request.
+   */
+  signal?: AbortSignal;
 };

 /**
@@ -116,6 +123,7 @@ export async function* dispatchToGemini(
    openclawTools,
    bridgePort = 18800,
    bridgeApiKey = "",
+    signal,
  } = opts;

  // Write system-level instructions to workspace/GEMINI.md every turn.
@@ -166,6 +174,43 @@ export async function* dispatchToGemini(
  let done = false;
  let resolveNext: (() => void) | null = null;

+  // Cleanup helper: SIGTERM the child first, then SIGKILL after a grace
+  // period if it hasn't exited. Idempotent — safe to call multiple times.
+  let cleanupScheduled = false;
+  const scheduleCleanup = (): void => {
+    if (cleanupScheduled) return;
+    cleanupScheduled = true;
+    if (child.exitCode !== null) return;
+    try { child.kill("SIGTERM"); } catch { /* already gone */ }
+    const killTimer = setTimeout(() => {
+      try { child.kill("SIGKILL"); } catch { /* already gone */ }
+    }, 5000);
+    killTimer.unref?.();
+    child.once("close", () => clearTimeout(killTimer));
+  };
+
+  const markDone = (): void => {
+    if (done) return;
+    done = true;
+    scheduleCleanup();
+    if (resolveNext) {
+      const r = resolveNext;
+      resolveNext = null;
+      r();
+    }
+  };
+
+  // Hook the upstream abort signal: when the bridge's HTTP client (OpenClaw)
+  // closes the socket, kill the gemini subprocess and break the iterator.
+  // See dispatchToClaude for the full rationale.
+  if (signal) {
+    if (signal.aborted) {
+      markDone();
+    } else {
+      signal.addEventListener("abort", () => markDone(), { once: true });
+    }
+  }
+
  rl.on("line", (line: string) => {
    if (!line.trim()) return;
    let event: Record<string, unknown>;
--- a/plugin/index.ts
+++ b/plugin/index.ts
@@ -1,6 +1,7 @@
 import fs from "node:fs";
 import path from "node:path";
-import type { OpenClawPluginApi } from "openclaw/plugin-sdk";
+import { definePluginEntry } from "openclaw/plugin-sdk/plugin-entry";
+import type { OpenClawPluginApi } from "openclaw/plugin-sdk/core";
 import { normalizePluginConfig } from "./core/types/contractor.js";
 import { resolveContractorAgentMetadata } from "./core/contractor/metadata-resolver.js";
 import { createBridgeServer } from "./web/server.js";
@@ -19,9 +20,10 @@ const OPENCLAW_CONFIG_KEY = "_contractorOpenClawConfig";

 // ── Plugin entry ─────────────────────────────────────────────────────────────

-export default {
+export default definePluginEntry({
  id: "contractor-agent",
  name: "Contractor Agent",
+  description: "Turns Claude Code into an OpenClaw-managed contractor agent",
  // OpenClaw requires register() to be synchronous — returning a Promise
  // surfaces as `Error: plugin register must be synchronous` and the plugin
  // ends up in `error` state. We avoid `await` here and instead let the
@@ -60,6 +62,11 @@ export default {
    if (!_G[LIFECYCLE_KEY]) {
      _G[LIFECYCLE_KEY] = true;

+      // Bind the bridge server only when the gateway boots, NOT eagerly at
+      // register-time. register() also runs in one-shot CLI subprocesses
+      // (e.g. `openclaw completion`, `openclaw doctor`); spawning a long-
+      // lived listener there would prevent those commands from exiting.
+      api.on("gateway_start", () => {
        const server = createBridgeServer({
          port: config.bridgePort,
          apiKey: config.bridgeApiKey,
@@ -68,9 +75,7 @@ export default {
          logger: api.logger,
        });

-      // EADDRINUSE → another gateway/CLI process already owns the port; that's
-      // fine, we just don't double-bind. Any other error is logged but does
-      // not crash registration.
+        // EADDRINUSE → another gateway already owns the port; fine, skip bind.
        server.on("error", (err: NodeJS.ErrnoException) => {
          if (err.code === "EADDRINUSE") {
            api.logger.info(
@@ -81,7 +86,13 @@ export default {
          api.logger.warn(`[contractor-agent] bridge server error: ${err.message ?? String(err)}`);
        });

+        // Defense in depth: even if this code path is somehow reached outside
+        // the gateway, .unref() prevents the listener from pinning the host's
+        // event loop and blocking process exit.
+        server.unref();
+
        _G[SERVER_KEY] = server;
+      });

      api.on("gateway_stop", () => {
        const s = _G[SERVER_KEY] as http.Server | undefined;
@@ -95,4 +106,4 @@ export default {

    api.logger.info(`[contractor-agent] plugin registered (bridge port: ${config.bridgePort})`);
  },
-};
+});
--- a/plugin/openclaw.plugin.json
+++ b/plugin/openclaw.plugin.json
@@ -1,9 +1,13 @@
 {
  "id": "contractor-agent",
  "name": "Contractor Agent",
-  "version": "0.1.0",
  "description": "Turns Claude Code into an OpenClaw-managed contractor agent",
-  "main": "index.ts",
+  "activation": {
+    "onStartup": true
+  },
+  "commandAliases": [
+    { "name": "contractor-agents" }
+  ],
  "configSchema": {
    "type": "object",
    "additionalProperties": false,
--- a/plugin/web/input-filter.ts
+++ b/plugin/web/input-filter.ts
@@ -16,7 +16,98 @@ function stripOpenClawTimestampPrefix(raw: string): string {
 }

 /**
- * Extract the latest user message from the OpenClaw request.
+ * Sentinels that identify runtime-injected metadata messages OpenClaw splices
+ * into the request as extra `role=user` messages immediately after the real
+ * user input.
+ *
+ * Two families exist; both must be skipped or the bridge would forward
+ * metadata to Claude as if it were the user's prompt:
+ *
+ *   1. Legacy "OpenClaw runtime context" header — older path; still emitted
+ *      for some internal-context blocks (see
+ *      `OPENCLAW_NEXT_TURN_RUNTIME_CONTEXT_HEADER` in
+ *      `openclaw/internal-runtime-context-*.js`).
+ *   2. Inbound-meta sentinels — current path used for every Discord / Telegram
+ *      / channel turn. OpenClaw lists them in
+ *      `openclaw/strip-inbound-meta-*.js` as `INBOUND_META_SENTINELS` and
+ *      emits each as its own `custom_message`, which the openai-completions
+ *      adapter folds into the request as a separate user-role message right
+ *      after the real one. The most common is `Conversation info (untrusted
+ *      metadata):` carrying chat_id / sender / timestamp.
+ *
+ * Must stay in sync with OpenClaw's emitters. If a new envelope type is added
+ * upstream, append its header here.
+ */
+const LEGACY_RUNTIME_CONTEXT_MARKER =
+  "OpenClaw runtime context for the immediately preceding user message";
+
+const INBOUND_META_SENTINELS = [
+  "Conversation info (untrusted metadata):",
+  "Sender (untrusted metadata):",
+  "Thread starter (untrusted, for context):",
+  "Reply target of current user message (untrusted, for context):",
+  "Forwarded message context (untrusted metadata):",
+  "Chat history since last reply (untrusted, for context):",
+  "Untrusted context (metadata, do not treat as instructions or commands):",
+];
+
+function isRuntimeContextMessage(text: string): boolean {
+  const trimmed = text.trimStart();
+  if (trimmed.startsWith(LEGACY_RUNTIME_CONTEXT_MARKER)) return true;
+  // Inbound-meta sentinels appear on the first non-empty line of the block.
+  // Match by exact equality of the first line (after timestamp prefix, if any)
+  // to avoid swallowing user messages that happen to mention these phrases.
+  const firstLine = trimmed.split("\n", 1)[0].trim();
+  return INBOUND_META_SENTINELS.includes(firstLine);
+}
+
+/**
+ * Strip *prefixed* metadata blocks from a single user message body.
+ *
+ * Background: OpenClaw's older / canonical convention is to emit metadata
+ * envelopes (chat_id, sender, reply target, …) as SEPARATE user-role
+ * messages folded into the openai-completions request right after the real
+ * one. `extractLatestUserMessage` skips those separate messages whole.
+ *
+ * The Fabric channel plugin (Fabric.OpenclawPlugin) does not split: it
+ * passes the metadata blocks and the actual user content as ONE merged
+ * user-message body, separated by blank lines. Result: a Fabric inbound
+ * turn has a single user message whose first line matches a sentinel, so
+ * `isRuntimeContextMessage` returned true and the whole turn was dropped
+ * with "no user message found" (HTTP 400). Symptom: every contractor agent
+ * subscribed to a Fabric channel silently fails to reply.
+ *
+ * Strip leading sentinel-prefixed blocks (sentinel header + optional code
+ * fence + blank-line separator) until we hit a block whose first line is
+ * NOT a sentinel — that's the real prompt. Returns "" if the entire body
+ * is metadata-only (still a no-op turn, same as before).
+ */
+function stripPrefixedMetadataBlocks(raw: string): string {
+  // Split on blank-line block boundaries. Within a metadata block the JSON
+  // code fence has only single newlines, so it stays in the same chunk as
+  // its sentinel header.
+  const blocks = raw.split(/\n\n+/);
+  const kept: string[] = [];
+  let stillStripping = true;
+  for (const block of blocks) {
+    if (stillStripping) {
+      const trimmed = block.trimStart();
+      const firstLine = trimmed.split("\n", 1)[0].trim();
+      if (
+        INBOUND_META_SENTINELS.includes(firstLine) ||
+        trimmed.startsWith(LEGACY_RUNTIME_CONTEXT_MARKER)
+      ) {
+        continue; // drop this prefix block
+      }
+      stillStripping = false;
+    }
+    kept.push(block);
+  }
+  return kept.join("\n\n").trim();
+}
+
+/**
+ * Extract the latest user-authored message from the OpenClaw request.
 *
 * OpenClaw accumulates all user messages and sends the full array every turn,
 * but assistant messages may be missing if the previous response wasn't streamed
@@ -26,16 +117,35 @@ function stripOpenClawTimestampPrefix(raw: string): string {
 * OpenClaw prefixes user messages with a timestamp: "[Day YYYY-MM-DD HH:MM TZ] text"
 * We strip the timestamp prefix before forwarding.
 *
- * Returns "" if no user messages exist or the latest user message is empty
- * (e.g. a bare /new turn — see also extractRequestContext.bareSessionReset).
+ * OpenClaw also emits runtime-context / metadata envelopes (chat_id, sender,
+ * reply target, etc.) as extra `role=user` messages after each real user
+ * message. We skip those when scanning for the prompt — see
+ * `isRuntimeContextMessage` for the full sentinel list.
+ *
+ * Returns "" if no user-authored messages exist (e.g. a bare /new turn — see
+ * also extractRequestContext.bareSessionReset).
 */
 export function extractLatestUserMessage(req: BridgeInboundRequest): string {
  const userMessages = req.messages.filter((m) => m.role === "user");
-  if (userMessages.length === 0) return "";
-
-  const raw = messageText(userMessages[userMessages.length - 1]);
+  for (let i = userMessages.length - 1; i >= 0; i -= 1) {
+    const raw = messageText(userMessages[i]);
+    if (!raw) continue;
+    // First try the separate-message convention (Discord/Telegram path): a
+    // user message whose entire body is one metadata envelope — skip whole.
+    if (isRuntimeContextMessage(raw)) {
+      // ...but also try inline-prefixed-metadata recovery (Fabric path):
+      // some channels splice metadata + real content into one body. Walk
+      // past leading sentinel blocks; if any non-metadata block follows,
+      // that's the prompt. If nothing follows, this turn really is pure
+      // metadata and we keep skipping.
+      const stripped = stripPrefixedMetadataBlocks(raw);
+      if (stripped) return stripOpenClawTimestampPrefix(stripped);
+      continue;
+    }
    return stripOpenClawTimestampPrefix(raw);
  }
+  return "";
+}

 export type RequestContext = {
  agentId: string;
--- a/plugin/web/server.ts
+++ b/plugin/web/server.ts
@@ -88,6 +88,28 @@ function parseBody(req: http.IncomingMessage): Promise<BridgeInboundRequest> {
  return parseBodyRaw(req) as Promise<BridgeInboundRequest>;
 }

+/**
+ * Per-sessionKey FIFO queue: same-session turns serialize so a user firing
+ * multiple Discord messages back-to-back gets them processed in order rather
+ * than spawning concurrent claude subprocesses. Cross-session requests bypass
+ * each other's chains and run in parallel.
+ *
+ * The chain is `prev.then(() => mySlot)` per request — `mySlot` is released
+ * in the request's finally block, so the next request waits until our work
+ * completes (success or failure) before starting.
+ */
+const queueBySession = new Map<string, Promise<void>>();
+
+/**
+ * SSE heartbeat cadence. Bridge writes an empty-content `chat.completion.chunk`
+ * (a no-op for OpenAI stream parsers, but a real model-progress event on the
+ * canonical streaming channel) on this interval while a turn is in flight or
+ * queued. This keeps OpenClaw's LLM idle watchdog (default 120s) from firing
+ * during long quiet tool-call phases or while we're waiting our turn in the
+ * per-session queue. See the heartbeat block in handleChatCompletions for
+ * details on why an SSE comment frame is insufficient.
+ */
+const HEARTBEAT_MS = 30_000;

 const _G = globalThis as Record<string, unknown>;

@@ -156,22 +178,151 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
    // Detect backend from body.model: "contractor-gemini-bridge" → Gemini, else → Claude
    const isGemini = typeof body.model === "string" && body.model.includes("gemini");

-    // Look up existing session (shared structure for both Claude and Gemini).
-    // On a bare /new or /reset turn we deliberately drop the existing entry so
-    // the CLI starts a fresh session — otherwise --resume would bring back the
-    // very history the user just asked to abandon.
-    let existingEntry = sessionKey ? getSession(workspace, sessionKey) : null;
-    if (bareSessionReset && existingEntry && sessionKey) {
+    // ── Abort propagation ────────────────────────────────────────────────────
+    // OpenClaw's LLM client cancels in-flight HTTP requests via AbortSignal
+    // when an attempt fails or the user cancels. On the wire this manifests
+    // as the client closing the socket, surfaced here as `req.on('close')`.
+    // We mirror that signal into our own AbortController and propagate it
+    // down to the dispatchTo{Claude,Gemini} adapter, which kills the
+    // subprocess. Without this, OpenClaw silently drops the response while
+    // the subprocess keeps running, and a retry spawns another subprocess
+    // alongside the orphan — the root cause of the duplicate-Discord-message
+    // bug we hit in May 2026.
+    const abortController = new AbortController();
+    let clientDisconnected = false;
+    const onClose = (): void => {
+      if (clientDisconnected) return;
+      clientDisconnected = true;
      logger.info(
-        `[contractor-bridge] bare /new detected — dropping prior CLI session sessionKey=${sessionKey} prevClaudeSessionId=${existingEntry.claudeSessionId}`,
+        `[contractor-bridge] client disconnected sessionKey=${sessionKey} — aborting in-flight work`,
+      );
+      abortController.abort();
+    };
+    req.on("close", onClose);
+
+    // Start SSE response immediately so OpenClaw sees the response status
+    // quickly. We may still spend time waiting in the per-session queue
+    // before any real chunk goes out — heartbeat (below) keeps that quiet
+    // window from tripping the upstream idle watchdog.
+    res.writeHead(200, {
+      "Content-Type": "text/event-stream",
+      "Cache-Control": "no-cache",
+      "Connection": "keep-alive",
+      "Transfer-Encoding": "chunked",
+    });
+
+    const completionId = `chatcmpl-bridge-${randomUUID().slice(0, 8)}`;
+
+    // ── SSE heartbeat ────────────────────────────────────────────────────────
+    // OpenClaw's LLM idle watchdog (default 120s) fires on lack of *model
+    // progress*, not lack of bytes — concretely "no content delta through
+    // SSE for 120s". An SSE comment frame (`: keepalive\n\n`) keeps the TCP
+    // socket alive but does NOT register as model progress, so a long quiet
+    // tool-call phase still idles out. When that happens OpenClaw falls back
+    // to re-sending the prior turn's assistant text (see
+    // pi-embedded-Bcz04p2i.js:1308 `fallbackAnswerText`), producing the
+    // duplicate-Discord-message symptom observed 2026-05-14.
+    //
+    // We emit a real `chat.completion.chunk` with an empty content delta
+    // every HEARTBEAT_MS. Clients drop empty deltas, but the upstream idle
+    // watchdog should count it as model progress because it's a real event
+    // on the canonical streaming channel. If empty content turns out to be
+    // filtered, the next step is a zero-width-space "".
+    const heartbeat = setInterval(() => {
+      if (clientDisconnected || res.writableEnded) return;
+      try {
+        sseWrite(res, buildChunk(completionId, ""));
+      } catch {
+        /* socket dead, ignore */
+      }
+    }, HEARTBEAT_MS);
+    heartbeat.unref?.();
+
+    // ── Per-sessionKey FIFO queue ────────────────────────────────────────────
+    // Same-sessionKey turns serialize: if a user fires multiple Discord
+    // messages quickly, we process them in order rather than starting
+    // concurrent claude subprocesses (which would corrupt the shared session
+    // file and produce overlapping replies). Cross-sessionKey requests run
+    // in parallel — they live on independent chains.
+    let releaseSlot: () => void = () => {};
+    const mySlot = new Promise<void>((r) => {
+      releaseSlot = r;
+    });
+    const prev = sessionKey
+      ? queueBySession.get(sessionKey) ?? Promise.resolve()
+      : Promise.resolve();
+    // `myChainTail` advances only after `releaseSlot()` (called in our
+    // finally block). Tolerate prev rejecting so a crashed earlier turn
+    // doesn't poison the chain forever.
+    const myChainTail = prev.then(() => mySlot, () => mySlot);
+    if (sessionKey) queueBySession.set(sessionKey, myChainTail);
+    let newSessionId = "";
+    let hasError = false;
+    let resultErrorReason: string | null = null;
+    let existingEntry: ReturnType<typeof getSession> = null;
+
+    try {
+      // Block until the previous turn on this sessionKey finishes.
+      // Cross-session requests skip this wait (their `prev` is a fresh
+      // Promise.resolve()).
+      await prev.catch(() => {});
+
+      // While queued, OpenClaw may have given up on this request (attempt
+      // timeout, user cancel, etc.) and closed the socket. If so, exit
+      // cleanly without dispatching — the client wouldn't see the output
+      // anyway.
+      if (clientDisconnected) {
+        logger.info(
+          `[contractor-bridge] dropped queued request sessionKey=${sessionKey} (client disconnected during queue wait)`,
+        );
+        return;
+      }
+
+      // Look up the session-map entry NOW (at the head of the queue), not
+      // earlier — the previous turn on the same sessionKey may have just
+      // updated `claudeSessionId` and we want to resume into the latest
+      // one rather than a stale snapshot from request-arrival time.
+      existingEntry = sessionKey ? getSession(workspace, sessionKey) : null;
+
+      // Detect a fresh OpenClaw session even when the bare-reset marker is
+      // absent. The marker only arrives when `/new` is the body of *this*
+      // user turn (see get-reply isBareSessionReset). When `/new` is sent
+      // as a separate slash command (e.g. via Discord's slash UI), OpenClaw
+      // processes the reset in a side lane that doesn't hit the bridge —
+      // it just renames the prior session file aside. The follow-up real
+      // message then arrives on a brand-new OpenClaw session, but as a
+      // normal turn with no marker. Without this check, the bridge happily
+      // resumes the long-stale claudeSessionId from before the reset.
+      //
+      // OpenClaw sends the full conversation history every turn (system +
+      // user/assistant pairs + latest user). A request with zero assistant
+      // turns is therefore a positive signal that the OpenClaw session is
+      // brand-new and any prior claudeSessionId we hold is from a previous
+      // OpenClaw session that the user already abandoned.
+      const hasAssistantHistory = body.messages.some((m) => {
+        if (m.role !== "assistant") return false;
+        if (typeof m.content === "string") return m.content.trim().length > 0;
+        return m.content.some(
+          (c) => c.type === "text" && (c.text ?? "").trim().length > 0,
+        );
+      });
+      const isFreshOpenClawSession = !hasAssistantHistory;
+
+      if ((bareSessionReset || isFreshOpenClawSession) && existingEntry && sessionKey) {
+        const reason = bareSessionReset
+          ? "bare /new detected"
+          : "fresh OpenClaw session (no assistant history in messages[])";
+        logger.info(
+          `[contractor-bridge] ${reason} — dropping prior CLI session sessionKey=${sessionKey} prevClaudeSessionId=${existingEntry.claudeSessionId}`,
        );
        removeSession(workspace, sessionKey);
        existingEntry = null;
      }
-    let resumeSessionId = existingEntry?.state === "active" ? existingEntry.claudeSessionId : null;
+      const resumeSessionId =
+        existingEntry?.state === "active" ? existingEntry.claudeSessionId : null;

      // Bootstrap is passed as the system prompt on every turn (stateless — not persisted in session files).
-    // For Claude: --system-prompt fully replaces any prior system prompt each invocation.
+      // For Claude: --append-system-prompt appends to the built-in prompt each invocation.
      // For Gemini: written to workspace/GEMINI.md, read dynamically by Gemini CLI each turn.
      // This keeps persona and skills current without needing to track first-turn state.
      const systemPrompt = buildBootstrap({
@@ -182,22 +333,8 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
        workspaceContextFiles,
      });

-    // Start SSE response
-    res.writeHead(200, {
-      "Content-Type": "text/event-stream",
-      "Cache-Control": "no-cache",
-      "Connection": "keep-alive",
-      "Transfer-Encoding": "chunked",
-    });
-
-    const completionId = `chatcmpl-bridge-${randomUUID().slice(0, 8)}`;
-    let newSessionId = "";
-    let hasError = false;
-    let resultErrorReason: string | null = null;
-
      const openclawTools = body.tools ?? [];

-    try {
      const dispatchIter = isGemini
        ? dispatchToGemini({
            prompt: dispatchPrompt,
@@ -208,6 +345,7 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
            openclawTools,
            bridgePort: port,
            bridgeApiKey: apiKey,
+            signal: abortController.signal,
          })
        : dispatchToClaude({
            prompt: dispatchPrompt,
@@ -219,17 +357,21 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
            openclawTools,
            bridgePort: port,
            bridgeApiKey: apiKey,
+            signal: abortController.signal,
          });

+      try {
        for await (const event of dispatchIter) {
          if (event.type === "text") {
-          sseWrite(res, buildChunk(completionId, event.text));
+            if (!clientDisconnected) sseWrite(res, buildChunk(completionId, event.text));
          } else if (event.type === "done") {
            newSessionId = event.sessionId;
          } else if (event.type === "result_error") {
-          // CLI returned a terminal error (typically context overflow). The
-          // text was already streamed via prior `text` events; record the
-          // session so we can drop it below and log the reason.
+            // CLI signaled a fatal-but-graceful error (context overflow,
+            // refusal, billing, etc.) via `is_error: true` on the result
+            // event. The text was already streamed via prior `text` events;
+            // record the reason so the bridge can drop the session-map entry
+            // below.
            logger.warn(
              `[contractor-bridge] ${isGemini ? "gemini" : "claude"} result_error reason=${event.reason} sessionId=${event.sessionId} message=${event.message.substring(0, 200)}`,
            );
@@ -238,18 +380,23 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
          } else if (event.type === "error") {
            logger.warn(`[contractor-bridge] ${isGemini ? "gemini" : "claude"} error: ${event.message}`);
            hasError = true;
+            if (!clientDisconnected) {
              sseWrite(res, buildChunk(completionId, `[contractor-bridge error: ${event.message}]`));
            }
          }
+        }
      } catch (err) {
        logger.warn(`[contractor-bridge] dispatch error: ${String(err)}`);
        hasError = true;
+        if (!clientDisconnected) {
          sseWrite(res, buildChunk(completionId, `[contractor-bridge dispatch failed: ${String(err)}]`));
        }
+      }

+      if (!clientDisconnected && !res.writableEnded) {
        sseWrite(res, buildStopChunk(completionId));
        sseWrite(res, "[DONE]");
-    res.end();
+      }

      // Session-map persistence:
      //  - Successful turn → upsert with the latest claudeSessionId so the next
@@ -259,7 +406,13 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
      //    into the same poisoned session and re-erroring.
      //  - Stream/transport error before any sessionId was captured → mark the
      //    prior entry orphaned (existing behavior).
-    if (resultErrorReason && sessionKey) {
+      //  - Aborted (client disconnected) → don't touch the session-map; the
+      //    subprocess may have already updated its own session file on disk,
+      //    and the next turn will resume from whatever it left there.
+      if (clientDisconnected) {
+        // No-op: aborted turn; preserve whatever the prior entry was so the
+        // next turn (likely an OpenClaw retry of the same prompt) can resume.
+      } else if (resultErrorReason && sessionKey) {
        logger.info(
          `[contractor-bridge] dropping CLI session after terminal error sessionKey=${sessionKey} reason=${resultErrorReason}`,
        );
@@ -282,6 +435,29 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
      } else if (hasError && sessionKey && existingEntry) {
        markOrphaned(workspace, sessionKey);
      }
+    } finally {
+      clearInterval(heartbeat);
+      req.off("close", onClose);
+
+      // Release our slot so the next request on this sessionKey can proceed.
+      // Must happen even on abort/throw, otherwise the chain stalls forever.
+      releaseSlot();
+
+      // Garbage-collect the queue entry if no later request chained on us.
+      // The Map identity check is the only safe way to tell — newer requests
+      // overwrite the entry with their own chain tail.
+      if (sessionKey && queueBySession.get(sessionKey) === myChainTail) {
+        queueBySession.delete(sessionKey);
+      }
+
+      if (!res.writableEnded) {
+        try {
+          res.end();
+        } catch {
+          /* ignore */
+        }
+      }
+    }
  }

  const server = http.createServer(async (req, res) => {
@@ -420,14 +596,47 @@ export function createBridgeServer(config: BridgeServerConfig): http.Server {
          return;
        }

-        const execFn = (toolInstance as { execute: (id: string, args: unknown) => Promise<{ content?: Array<{ type: string; text?: string }> }> }).execute;
-        const toolResult = await execFn(randomUUID(), toolArgs);
+        const execFn = (toolInstance as { execute: (id: string, args: unknown) => Promise<unknown> }).execute;
+        const toolResultRaw = await execFn(randomUUID(), toolArgs);

-        // Extract text content from AgentToolResult
-        const text = (toolResult.content ?? [])
+        // Normalize the result shape. OpenClaw plugins return one of:
+        //   (a) AgentToolResult — { content: [{type:'text', text:'...'}, ...] }
+        //       used by tools that wrap via asContent() (e.g. all
+        //       Dialectic.OpenclawPlugin tools).
+        //   (b) raw JSON-able object — { ok:true, ...domain fields }
+        //       used by tools that return data directly (e.g. every
+        //       Fabric.OpenclawPlugin tool: fabric-channel-list,
+        //       fabric-guild-list, fabric-send-message, etc).
+        //   (c) null / undefined — fire-and-forget (rare).
+        //
+        // Pre-2026-05-24 the bridge ONLY handled shape (a); shape (b)
+        // silently returned "(no result)" to Claude Code (bug:
+        // fabric-channel-list and friends were unusable from contractor
+        // agents even though the tool actually ran successfully).
+        let text = "";
+        if (toolResultRaw == null) {
+          text = "";
+        } else if (
+          typeof toolResultRaw === "object" &&
+          Array.isArray((toolResultRaw as { content?: unknown }).content)
+        ) {
+          // Shape (a)
+          const content = (toolResultRaw as { content: Array<{ type: string; text?: string }> }).content;
+          text = content
            .filter((c) => c.type === "text" && c.text)
            .map((c) => c.text as string)
            .join("\n");
+        } else if (typeof toolResultRaw === "string") {
+          text = toolResultRaw;
+        } else {
+          // Shape (b) — serialize the whole object as JSON. Claude Code
+          // is happy to parse JSON tool results.
+          try {
+            text = JSON.stringify(toolResultRaw);
+          } catch {
+            text = String(toolResultRaw);
+          }
+        }

        sendJson(res, 200, { result: text || "(no result)" });
      } catch (err) {
--- a/scripts/install.mjs
+++ b/scripts/install.mjs
@@ -67,10 +67,15 @@ function install() {
  // 3. Update openclaw.json
  const cfg = readConfig();

-  // Add provider
+  // Add provider — spread existing first so user-added fields
+  // (e.g. timeoutSeconds, extraHeaders) survive reinstall. Script-managed
+  // fields (baseUrl/apiKey/api/models) are then overridden authoritatively
+  // since they're tied to the constants and model catalog above.
  cfg.models = cfg.models ?? {};
  cfg.models.providers = cfg.models.providers ?? {};
+  const existingProvider = cfg.models.providers[PLUGIN_ID] ?? {};
  cfg.models.providers[PLUGIN_ID] = {
+    ...existingProvider,
    baseUrl: `http://127.0.0.1:${BRIDGE_PORT}/v1`,
    apiKey: BRIDGE_API_KEY,
    api: "openai-completions",
Author	SHA1	Message	Date
hzhang	0f49edf59c	fix(bridge): recover inline-prefixed metadata in user message body OpenClaw's canonical convention is to emit metadata envelopes (chat_id, sender, reply target, …) as SEPARATE user-role messages folded into the openai-completions request right after the real one — `extractLatestUserMessage` skips those whole. Fabric.OpenclawPlugin's dispatch does not split: it passes metadata blocks and the real user content as ONE merged user-message body, separated by blank lines. With the prior filter that meant the entire turn was dropped with "no user message found" (HTTP 400) because the first line matched a sentinel — the actual prompt sitting after the metadata blocks never reached the bridge. When the whole-body check fails for a single-message body, walk past leading sentinel-prefixed blocks (sentinel header + optional ```json code fence + blank-line separator) and use whatever non-metadata block follows. Falls back to the previous "skip entirely" semantics when the body is metadata-only. End-user symptom that surfaced this: every contractor agent (Claude / Gemini) subscribed to a Fabric channel silently failed to reply to sub-discussion messages during recruitment — fabric dispatch said "completed" in 1.6s but trajectory had `assistantTexts: []`, `terminalError: non_deliverable_terminal_turn`, `errorMessage: "400 \"no user message found\""`. Surfaced recruiting developer1 on prod-t2 2026-05-31.	2026-05-31 20:52:15 +01:00
hzhang	037e92b421	fix(bridge): /mcp/execute handles raw-object tool results (not just AgentToolResult) OpenClaw plugins return tool results in one of two shapes: (a) AgentToolResult — { content: [{type:'text', text:'...'}] } used when the plugin wraps via asContent() helper. Every Dialectic.OpenclawPlugin tool follows this pattern. (b) raw JSON-able object — { ok:true, ...domain fields } used when the plugin returns data directly. Every Fabric.OpenclawPlugin tool follows this pattern (fabric-channel-list, fabric-guild-list, fabric-send-message, fabric-channel-set-purpose, etc). The bridge's /mcp/execute handler only handled shape (a). When a contractor agent (developer / contractor-test) called any fabric tool through Claude Code, the bridge ran the tool successfully but fell back to the literal string '(no result)' because toolResult.content was undefined. Claude Code then dutifully rendered '(no result)' as the tool result. Reproduced on prod: openclaw agent --agent developer -m 'Call fabric-channel-list ...' → claude code session called mcp__openclaw__fabric-channel-list → bridge logged: mcp/execute tool=fabric-channel-list ... → bridge replied: { result: '(no result)' } → claude code rendered: '' Fix: normalize the result in the bridge. If toolResult is null → empty string; if it has a .content array → join the text segments (shape a); if it's a string → use directly; else → JSON.stringify the whole thing (shape b). Falls back to '(no result)' only when all of those produce empty string. Verified on prod after fix: agent receives real {"ok":true,"count":1,"channels":[...]} JSON payload (one real prod-push-test channel) in the response.	2026-05-24 09:33:12 +01:00
zhi	0b24330787	fix(bridge): emit empty content delta as heartbeat; preserve user provider fields on reinstall OpenClaw's LLM idle watchdog (default 120s) fires on lack of model progress, not lack of bytes — an SSE comment frame (": keepalive\n\n") keeps the TCP socket alive but isn't recognized as progress, so a long quiet tool-call phase still idles out. When that happens OpenClaw falls back to re-sending the prior turn's assistant text (pi-embedded:1308 fallbackAnswerText), producing duplicate-Discord-message symptoms. Heartbeat now emits a real chat.completion.chunk with an empty content delta every 30s. Clients drop empty deltas; the upstream idle watchdog should count it as model progress because it's a real event on the canonical streaming channel. scripts/install.mjs now spreads the existing provider entry before overriding script-managed fields, so user-added fields like timeoutSeconds survive reinstall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 08:53:22 +00:00
zhi	1b7cd6b215	fix(bridge): skip new untrusted-metadata envelopes too, not just legacy header extractLatestUserMessage used isRuntimeContextMessage to skip envelopes OpenClaw splices into the request as extra role=user messages. It only recognized the legacy "OpenClaw runtime context for the immediately preceding user message" header, but current OpenClaw emits a different family of envelopes — INBOUND_META_SENTINELS in strip-inbound-meta-*.js: "Conversation info (untrusted metadata):", "Sender (untrusted metadata):", reply target / thread starter / forwarded / chat history / untrusted context. These slipped through the filter, so the newest-first scan picked the Conversation info envelope as the "latest user message" and forwarded only chat_id / sender JSON to claude. Claude saw no actual prompt and replied with a stock greeting, while the user's real message a few slots earlier was ignored. Add the seven inbound-meta headers to isRuntimeContextMessage, matched by exact equality of the trimmed first line to avoid swallowing user text that happens to mention the phrase. Must stay in sync with INBOUND_META_SENTINELS in OpenClaw's strip-inbound-meta module — any new envelope type added upstream needs to be appended here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:03:02 +00:00
zhi	cce85a9be8	fix(bridge): reset claude session when OpenClaw sends no assistant history The May 7 fix made the bridge detect /new turns by scanning messages for the bare-reset marker ("A new session was started via /new or /reset"). That handles the case where /new is the body of the current user turn, but misses a very common path: the user types `/new` as a standalone slash command. OpenClaw processes those in a side lane (e.g. agent:<id>:discord:slash:<chat>) that doesn't go through the bridge — it just renames the old session file aside. The follow-up real message then lands on a brand-new OpenClaw session, but as a normal turn with `softResetTriggered=false`, non-empty body, not bare /new — so isBareSessionReset is false in OpenClaw (get-reply isBareSessionReset condition) and the marker is never injected. The bridge keeps resuming the long-stale claudeSessionId from before the reset. OpenClaw always sends the full conversation history each turn (system + user/assistant pairs + latest user). A request with zero assistant turns in messages[] is therefore a positive signal that the OpenClaw session is brand-new and any prior claudeSessionId we hold belongs to an abandoned OpenClaw session. Treat "no assistant history" as equivalent to bareSessionReset: removeSession + existingEntry = null, so dispatchToClaude is called without --resume and claude starts a fresh CLI session whose id we then store. Also covers any future OpenClaw reset path that resets the session without injecting the marker (idle timeout new-session, admin tooling, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 16:40:16 +00:00
zhi	5fca8f5da1	fix: don't block one-shot openclaw subcommands; migrate to current plugin SDK Bridge server lifecycle: - Move createBridgeServer() out of register() into an api.on("gateway_start", ...) handler. register() runs in every CLI subprocess that loads plugins (e.g. `openclaw completion`, `openclaw doctor`); eagerly binding the bridge HTTP listener there could pin those processes when no gateway is already holding the port. - Call server.unref() so the listener never pins the host's event loop, even if startup somehow runs outside the gateway. Plugin SDK convention update: - Wrap default export with definePluginEntry({ id, name, description, register }) per the current openclaw plugin authoring contract. - Switch imports from the deprecated root barrel "openclaw/plugin-sdk" to focused "openclaw/plugin-sdk/core" / "openclaw/plugin-sdk/plugin-entry". - Modernize openclaw.plugin.json: drop version/main, add activation.onStartup so gateway_start fires for this plugin at boot, declare commandAliases for the contractor-agents CLI command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 07:54:11 +00:00
zhi	2e64e9ce02	fix(bridge): abort propagation, SSE heartbeat, per-session FIFO queue Three coordinated fixes for the duplicate-Discord-message bug where the same prompt would be answered by two different claude subprocesses running in parallel. Root cause: handleChatCompletions had no concurrency control and no way to detect when OpenClaw closed the upstream HTTP connection. When OpenClaw's idle watchdog tripped (default 120s of stream silence), it would close the socket and retry the prompt — but the original claude subprocess kept running, and the bridge spawned a second one alongside it. Both eventually streamed back, both got delivered to Discord. Native (non-bridge) flow doesn't hit this because OpenClaw's fetch is abort-aware end-to-end: attempt timeout fires AbortSignal, fetch closes the socket, the model provider sees it, work stops. Bridge broke the chain at "spawn subprocess" — this restores it. Changes: * SSE heartbeat (server.ts): write a `: keepalive\n\n` SSE comment every 30s while a turn is in flight. Counts as bytes on the wire so upstream idle timer resets, but is a spec-mandated no-op for the OpenAI stream parser. Eliminates the 120s-silence trigger that was causing OpenClaw to give up on long tool-call sequences in the first place. * Abort propagation (server.ts + both adapters): hook req.on('close') to an AbortController and pass signal: through to dispatchToClaude / dispatchToGemini. Adapters listen on signal abort and call markDone → scheduleCleanup which SIGTERMs the child process group (3s grace for claude, 5s for gemini) then SIGKILLs. Mirrors what native fetch does when its caller aborts. * Per-sessionKey FIFO queue (server.ts): same-session turns serialize via a Map<sessionKey, Promise<void>> chain so a user firing multiple Discord messages back-to-back gets them processed in order rather than spawning concurrent subprocesses (which would corrupt the shared --resume session file). Cross-session requests live on independent chains and run in parallel. Subtle correctness points: * getSession() moved to head-of-queue so we resume into the latest claudeSessionId from the just-finished prior turn instead of a stale request-arrival snapshot. * Aborted turns skip session-map persistence — the subprocess may have already updated its own session file on disk, so the next retry resumes from there. * Queue chain GC uses Map identity check so we don't delete an entry that a later request has already chained onto. * prev.then(() => mySlot, () => mySlot) tolerates a crashed prior turn so the chain doesn't poison forever. * writeHead(200) before queue wait so OpenClaw sees response status immediately; heartbeat covers the queue-wait quiet period. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 23:58:17 +00:00
h z	b94e0d25f6	Merge pull request 'fix(bridge): skip OpenClaw runtime-context envelope when picking prompt' (#3 ) from fix/skip-runtime-context-message into main Reviewed-on: #3	2026-04-29 08:32:52 +00:00
zhi	91acce9b32	fix(bridge): skip OpenClaw runtime-context envelope when picking prompt OpenClaw emits its runtime-context block as a separate custom_message; the openai-completions adapter folds that into the request as an extra role=user message after the real user input. extractLatestUserMessage was taking the last user message unconditionally, so Claude received only the metadata envelope and replied "your message came through empty". Walk user messages backward, skip ones starting with the runtime-context marker, and return the most recent real user message instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:19:59 +00:00