fix(bridge): scope CLI sessions per OpenClaw session and reset on /new #2

hzhang · 2026-04-29T07:27:00Z

hzhang commented

2026-04-29 07:27:00 +00:00

The bridge was keying claudeSessionId by agentId alone, so every Discord
channel, DM, and cron run for a single agent shared one Claude CLI
session. Two consequences in the wild:

Cross-channel context bleed: 8.7MB session for developer mixed
references from channels 1474327736242798612 and 1498579994044010566
plus the operator DM all in one --resume thread.
/new had no effect on the CLI side. OpenClaw rotated its session
file but the bridge kept --resume-ing the same long-lived
claudeSessionId, eventually crossing the 1M model context (debug log
showed prompt is too long: 1179616 tokens > 1000000 maximum).

Changes:

input-filter: extract chat_id from the Conversation-info
untrusted-metadata block (scanning all messages, since runtimeOnly
turns put it in the system prompt) and detect bare /new//reset
via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey
${agentId}::${chatId} and resolveDispatchPrompt fallback for the
empty user message that OpenClaw sends on bare resets.
server: use the composite session key for getSession/putSession;
on bareSessionReset, removeSession before dispatching so the CLI
starts a fresh session; on a CLI result_error (typically
prompt_too_long) drop the entry too so the next turn doesn't
re-resume into the poisoned context.
claude/sdk-adapter: surface CLI terminal errors via a new
result_error event (carries reason + sessionId) so the bridge
can react instead of just streaming the synthetic
"Prompt is too long" assistant text and silently re-using the
same session.
index: convert register() to synchronous (OpenClaw rejects async
register with "plugin register must be synchronous"); replace the
pre-bind port probe with a server-level EADDRINUSE handler.
.gitignore: ignore node_modules/ and dist/.

The bridge was keying claudeSessionId by agentId alone, so every Discord channel, DM, and cron run for a single agent shared one Claude CLI session. Two consequences in the wild: - Cross-channel context bleed: 8.7MB session for `developer` mixed references from channels 1474327736242798612 and 1498579994044010566 plus the operator DM all in one --resume thread. - `/new` had no effect on the CLI side. OpenClaw rotated its session file but the bridge kept --resume-ing the same long-lived claudeSessionId, eventually crossing the 1M model context (debug log showed `prompt is too long: 1179616 tokens > 1000000 maximum`). Changes: * input-filter: extract `chat_id` from the Conversation-info untrusted-metadata block (scanning all messages, since runtimeOnly turns put it in the system prompt) and detect bare `/new`/`/reset` via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey `${agentId}::${chatId}` and resolveDispatchPrompt fallback for the empty user message that OpenClaw sends on bare resets. * server: use the composite session key for getSession/putSession; on bareSessionReset, removeSession before dispatching so the CLI starts a fresh session; on a CLI result_error (typically prompt_too_long) drop the entry too so the next turn doesn't re-resume into the poisoned context. * claude/sdk-adapter: surface CLI terminal errors via a new `result_error` event (carries reason + sessionId) so the bridge can react instead of just streaming the synthetic "Prompt is too long" assistant text and silently re-using the same session. * index: convert register() to synchronous (OpenClaw rejects async register with "plugin register must be synchronous"); replace the pre-bind port probe with a server-level EADDRINUSE handler. * .gitignore: ignore node_modules/ and dist/.

hzhang added 1 commit 2026-04-29 07:27:01 +00:00

fix(bridge): scope CLI sessions per OpenClaw session and reset on /new 992f4d8703

The bridge was keying claudeSessionId by agentId alone, so every Discord
channel, DM, and cron run for a single agent shared one Claude CLI
session. Two consequences in the wild:

  - Cross-channel context bleed: 8.7MB session for `developer` mixed
    references from channels 1474327736242798612 and 1498579994044010566
    plus the operator DM all in one --resume thread.
  - `/new` had no effect on the CLI side. OpenClaw rotated its session
    file but the bridge kept --resume-ing the same long-lived
    claudeSessionId, eventually crossing the 1M model context (debug log
    showed `prompt is too long: 1179616 tokens > 1000000 maximum`).

Changes:

  * input-filter: extract `chat_id` from the Conversation-info
    untrusted-metadata block (scanning all messages, since runtimeOnly
    turns put it in the system prompt) and detect bare `/new`/`/reset`
    via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey
    `${agentId}::${chatId}` and resolveDispatchPrompt fallback for the
    empty user message that OpenClaw sends on bare resets.

  * server: use the composite session key for getSession/putSession;
    on bareSessionReset, removeSession before dispatching so the CLI
    starts a fresh session; on a CLI result_error (typically
    prompt_too_long) drop the entry too so the next turn doesn't
    re-resume into the poisoned context.

  * claude/sdk-adapter: surface CLI terminal errors via a new
    `result_error` event (carries reason + sessionId) so the bridge
    can react instead of just streaming the synthetic
    "Prompt is too long" assistant text and silently re-using the
    same session.

  * index: convert register() to synchronous (OpenClaw rejects async
    register with "plugin register must be synchronous"); replace the
    pre-bind port probe with a server-level EADDRINUSE handler.

  * .gitignore: ignore node_modules/ and dist/.

hzhang merged commit 4e015c677b into main

2026-04-29 07:27:10 +00:00

hzhang deleted branch fix/per-openclaw-session-cli-mapping

2026-04-29 07:27:10 +00:00

hzhang referenced this issue from a commit

2026-04-29 07:27:10 +00:00

Merge pull request 'fix(bridge): scope CLI sessions per OpenClaw session and reset on /new' (#2) from fix/per-openclaw-session-cli-mapping into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: nav/ContractorAgent#2