fix(bridge): scope CLI sessions per OpenClaw session and reset on /new #2

Merged
hzhang merged 1 commits from fix/per-openclaw-session-cli-mapping into main 2026-04-29 07:27:10 +00:00
Contributor

The bridge was keying claudeSessionId by agentId alone, so every Discord
channel, DM, and cron run for a single agent shared one Claude CLI
session. Two consequences in the wild:

  • Cross-channel context bleed: 8.7MB session for developer mixed
    references from channels 1474327736242798612 and 1498579994044010566
    plus the operator DM all in one --resume thread.
  • /new had no effect on the CLI side. OpenClaw rotated its session
    file but the bridge kept --resume-ing the same long-lived
    claudeSessionId, eventually crossing the 1M model context (debug log
    showed prompt is too long: 1179616 tokens > 1000000 maximum).

Changes:

  • input-filter: extract chat_id from the Conversation-info
    untrusted-metadata block (scanning all messages, since runtimeOnly
    turns put it in the system prompt) and detect bare /new//reset
    via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey
    ${agentId}::${chatId} and resolveDispatchPrompt fallback for the
    empty user message that OpenClaw sends on bare resets.

  • server: use the composite session key for getSession/putSession;
    on bareSessionReset, removeSession before dispatching so the CLI
    starts a fresh session; on a CLI result_error (typically
    prompt_too_long) drop the entry too so the next turn doesn't
    re-resume into the poisoned context.

  • claude/sdk-adapter: surface CLI terminal errors via a new
    result_error event (carries reason + sessionId) so the bridge
    can react instead of just streaming the synthetic
    "Prompt is too long" assistant text and silently re-using the
    same session.

  • index: convert register() to synchronous (OpenClaw rejects async
    register with "plugin register must be synchronous"); replace the
    pre-bind port probe with a server-level EADDRINUSE handler.

  • .gitignore: ignore node_modules/ and dist/.

The bridge was keying claudeSessionId by agentId alone, so every Discord channel, DM, and cron run for a single agent shared one Claude CLI session. Two consequences in the wild: - Cross-channel context bleed: 8.7MB session for `developer` mixed references from channels 1474327736242798612 and 1498579994044010566 plus the operator DM all in one --resume thread. - `/new` had no effect on the CLI side. OpenClaw rotated its session file but the bridge kept --resume-ing the same long-lived claudeSessionId, eventually crossing the 1M model context (debug log showed `prompt is too long: 1179616 tokens > 1000000 maximum`). Changes: * input-filter: extract `chat_id` from the Conversation-info untrusted-metadata block (scanning all messages, since runtimeOnly turns put it in the system prompt) and detect bare `/new`/`/reset` via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey `${agentId}::${chatId}` and resolveDispatchPrompt fallback for the empty user message that OpenClaw sends on bare resets. * server: use the composite session key for getSession/putSession; on bareSessionReset, removeSession before dispatching so the CLI starts a fresh session; on a CLI result_error (typically prompt_too_long) drop the entry too so the next turn doesn't re-resume into the poisoned context. * claude/sdk-adapter: surface CLI terminal errors via a new `result_error` event (carries reason + sessionId) so the bridge can react instead of just streaming the synthetic "Prompt is too long" assistant text and silently re-using the same session. * index: convert register() to synchronous (OpenClaw rejects async register with "plugin register must be synchronous"); replace the pre-bind port probe with a server-level EADDRINUSE handler. * .gitignore: ignore node_modules/ and dist/.
hzhang added 1 commit 2026-04-29 07:27:01 +00:00
The bridge was keying claudeSessionId by agentId alone, so every Discord
channel, DM, and cron run for a single agent shared one Claude CLI
session. Two consequences in the wild:

  - Cross-channel context bleed: 8.7MB session for `developer` mixed
    references from channels 1474327736242798612 and 1498579994044010566
    plus the operator DM all in one --resume thread.
  - `/new` had no effect on the CLI side. OpenClaw rotated its session
    file but the bridge kept --resume-ing the same long-lived
    claudeSessionId, eventually crossing the 1M model context (debug log
    showed `prompt is too long: 1179616 tokens > 1000000 maximum`).

Changes:

  * input-filter: extract `chat_id` from the Conversation-info
    untrusted-metadata block (scanning all messages, since runtimeOnly
    turns put it in the system prompt) and detect bare `/new`/`/reset`
    via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey
    `${agentId}::${chatId}` and resolveDispatchPrompt fallback for the
    empty user message that OpenClaw sends on bare resets.

  * server: use the composite session key for getSession/putSession;
    on bareSessionReset, removeSession before dispatching so the CLI
    starts a fresh session; on a CLI result_error (typically
    prompt_too_long) drop the entry too so the next turn doesn't
    re-resume into the poisoned context.

  * claude/sdk-adapter: surface CLI terminal errors via a new
    `result_error` event (carries reason + sessionId) so the bridge
    can react instead of just streaming the synthetic
    "Prompt is too long" assistant text and silently re-using the
    same session.

  * index: convert register() to synchronous (OpenClaw rejects async
    register with "plugin register must be synchronous"); replace the
    pre-bind port probe with a server-level EADDRINUSE handler.

  * .gitignore: ignore node_modules/ and dist/.
hzhang merged commit 4e015c677b into main 2026-04-29 07:27:10 +00:00
hzhang deleted branch fix/per-openclaw-session-cli-mapping 2026-04-29 07:27:10 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nav/ContractorAgent#2