fix(bridge): scope CLI sessions per OpenClaw session and reset on /new #2
Reference in New Issue
Block a user
Delete Branch "fix/per-openclaw-session-cli-mapping"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The bridge was keying claudeSessionId by agentId alone, so every Discord
channel, DM, and cron run for a single agent shared one Claude CLI
session. Two consequences in the wild:
developermixedreferences from channels 1474327736242798612 and 1498579994044010566
plus the operator DM all in one --resume thread.
/newhad no effect on the CLI side. OpenClaw rotated its sessionfile but the bridge kept --resume-ing the same long-lived
claudeSessionId, eventually crossing the 1M model context (debug log
showed
prompt is too long: 1179616 tokens > 1000000 maximum).Changes:
input-filter: extract
chat_idfrom the Conversation-infountrusted-metadata block (scanning all messages, since runtimeOnly
turns put it in the system prompt) and detect bare
/new//resetvia the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey
${agentId}::${chatId}and resolveDispatchPrompt fallback for theempty user message that OpenClaw sends on bare resets.
server: use the composite session key for getSession/putSession;
on bareSessionReset, removeSession before dispatching so the CLI
starts a fresh session; on a CLI result_error (typically
prompt_too_long) drop the entry too so the next turn doesn't
re-resume into the poisoned context.
claude/sdk-adapter: surface CLI terminal errors via a new
result_errorevent (carries reason + sessionId) so the bridgecan react instead of just streaming the synthetic
"Prompt is too long" assistant text and silently re-using the
same session.
index: convert register() to synchronous (OpenClaw rejects async
register with "plugin register must be synchronous"); replace the
pre-bind port probe with a server-level EADDRINUSE handler.
.gitignore: ignore node_modules/ and dist/.
The bridge was keying claudeSessionId by agentId alone, so every Discord channel, DM, and cron run for a single agent shared one Claude CLI session. Two consequences in the wild: - Cross-channel context bleed: 8.7MB session for `developer` mixed references from channels 1474327736242798612 and 1498579994044010566 plus the operator DM all in one --resume thread. - `/new` had no effect on the CLI side. OpenClaw rotated its session file but the bridge kept --resume-ing the same long-lived claudeSessionId, eventually crossing the 1M model context (debug log showed `prompt is too long: 1179616 tokens > 1000000 maximum`). Changes: * input-filter: extract `chat_id` from the Conversation-info untrusted-metadata block (scanning all messages, since runtimeOnly turns put it in the system prompt) and detect bare `/new`/`/reset` via the BARE_SESSION_RESET_PROMPT_BASE marker. Add buildSessionKey `${agentId}::${chatId}` and resolveDispatchPrompt fallback for the empty user message that OpenClaw sends on bare resets. * server: use the composite session key for getSession/putSession; on bareSessionReset, removeSession before dispatching so the CLI starts a fresh session; on a CLI result_error (typically prompt_too_long) drop the entry too so the next turn doesn't re-resume into the poisoned context. * claude/sdk-adapter: surface CLI terminal errors via a new `result_error` event (carries reason + sessionId) so the bridge can react instead of just streaming the synthetic "Prompt is too long" assistant text and silently re-using the same session. * index: convert register() to synchronous (OpenClaw rejects async register with "plugin register must be synchronous"); replace the pre-bind port probe with a server-level EADDRINUSE handler. * .gitignore: ignore node_modules/ and dist/.