ContractorAgent

nav/ContractorAgent

Fork 0

Commit Graph

Author	SHA1	Message	Date
zhi	2e64e9ce02	fix(bridge): abort propagation, SSE heartbeat, per-session FIFO queue Three coordinated fixes for the duplicate-Discord-message bug where the same prompt would be answered by two different claude subprocesses running in parallel. Root cause: handleChatCompletions had no concurrency control and no way to detect when OpenClaw closed the upstream HTTP connection. When OpenClaw's idle watchdog tripped (default 120s of stream silence), it would close the socket and retry the prompt — but the original claude subprocess kept running, and the bridge spawned a second one alongside it. Both eventually streamed back, both got delivered to Discord. Native (non-bridge) flow doesn't hit this because OpenClaw's fetch is abort-aware end-to-end: attempt timeout fires AbortSignal, fetch closes the socket, the model provider sees it, work stops. Bridge broke the chain at "spawn subprocess" — this restores it. Changes: * SSE heartbeat (server.ts): write a `: keepalive\n\n` SSE comment every 30s while a turn is in flight. Counts as bytes on the wire so upstream idle timer resets, but is a spec-mandated no-op for the OpenAI stream parser. Eliminates the 120s-silence trigger that was causing OpenClaw to give up on long tool-call sequences in the first place. * Abort propagation (server.ts + both adapters): hook req.on('close') to an AbortController and pass signal: through to dispatchToClaude / dispatchToGemini. Adapters listen on signal abort and call markDone → scheduleCleanup which SIGTERMs the child process group (3s grace for claude, 5s for gemini) then SIGKILLs. Mirrors what native fetch does when its caller aborts. * Per-sessionKey FIFO queue (server.ts): same-session turns serialize via a Map<sessionKey, Promise<void>> chain so a user firing multiple Discord messages back-to-back gets them processed in order rather than spawning concurrent subprocesses (which would corrupt the shared --resume session file). Cross-session requests live on independent chains and run in parallel. Subtle correctness points: * getSession() moved to head-of-queue so we resume into the latest claudeSessionId from the just-finished prior turn instead of a stale request-arrival snapshot. * Aborted turns skip session-map persistence — the subprocess may have already updated its own session file on disk, so the next retry resumes from there. * Queue chain GC uses Map identity check so we don't delete an entry that a later request has already chained onto. * prev.then(() => mySlot, () => mySlot) tolerates a crashed prior turn so the chain doesn't poison forever. * writeHead(200) before queue wait so OpenClaw sees response status immediately; heartbeat covers the queue-wait quiet period. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 23:58:17 +00:00
hzhang	07a0f06e2e	refactor: restructure to plugin/ + services/ layout and add per-turn bootstrap injection - Migrate src/ → plugin/ (plugin/core/, plugin/web/, plugin/commands/) and src/mcp/ → services/ per OpenClaw plugin dev spec - Add Gemini CLI backend (plugin/core/gemini/sdk-adapter.ts) with GEMINI.md system-prompt injection - Inject bootstrap as stateless system prompt on every turn instead of first turn only: Claude via --system-prompt, Gemini via workspace/GEMINI.md; eliminates isFirstTurn branch, keeps skills in sync with OpenClaw snapshots - Fix session-map-store defensive parsing (sessions ?? []) to handle bare {} reset files without crashing on .find() - Add docs/TEST_FLOW.md with E2E test scenarios and expected outcomes - Add docs/claude/BRIDGE_MODEL_FINDINGS.md with contractor-probe results Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-11 21:21:32 +01:00

Author

SHA1

Message

Date

zhi

2e64e9ce02

fix(bridge): abort propagation, SSE heartbeat, per-session FIFO queue

Three coordinated fixes for the duplicate-Discord-message bug where the
same prompt would be answered by two different claude subprocesses
running in parallel.

Root cause: handleChatCompletions had no concurrency control and no
way to detect when OpenClaw closed the upstream HTTP connection. When
OpenClaw's idle watchdog tripped (default 120s of stream silence), it
would close the socket and retry the prompt — but the original claude
subprocess kept running, and the bridge spawned a second one alongside
it. Both eventually streamed back, both got delivered to Discord.

Native (non-bridge) flow doesn't hit this because OpenClaw's fetch is
abort-aware end-to-end: attempt timeout fires AbortSignal, fetch closes
the socket, the model provider sees it, work stops. Bridge broke the
chain at "spawn subprocess" — this restores it.

Changes:

* SSE heartbeat (server.ts): write a `: keepalive\n\n` SSE comment
  every 30s while a turn is in flight. Counts as bytes on the wire so
  upstream idle timer resets, but is a spec-mandated no-op for the
  OpenAI stream parser. Eliminates the 120s-silence trigger that was
  causing OpenClaw to give up on long tool-call sequences in the first
  place.

* Abort propagation (server.ts + both adapters): hook req.on('close')
  to an AbortController and pass signal: through to dispatchToClaude /
  dispatchToGemini. Adapters listen on signal abort and call markDone
  → scheduleCleanup which SIGTERMs the child process group (3s grace
  for claude, 5s for gemini) then SIGKILLs. Mirrors what native fetch
  does when its caller aborts.

* Per-sessionKey FIFO queue (server.ts): same-session turns serialize
  via a Map<sessionKey, Promise<void>> chain so a user firing multiple
  Discord messages back-to-back gets them processed in order rather
  than spawning concurrent subprocesses (which would corrupt the shared
  --resume session file). Cross-session requests live on independent
  chains and run in parallel.

Subtle correctness points:

* getSession() moved to head-of-queue so we resume into the latest
  claudeSessionId from the just-finished prior turn instead of a stale
  request-arrival snapshot.
* Aborted turns skip session-map persistence — the subprocess may have
  already updated its own session file on disk, so the next retry
  resumes from there.
* Queue chain GC uses Map identity check so we don't delete an entry
  that a later request has already chained onto.
* prev.then(() => mySlot, () => mySlot) tolerates a crashed prior turn
  so the chain doesn't poison forever.
* writeHead(200) before queue wait so OpenClaw sees response status
  immediately; heartbeat covers the queue-wait quiet period.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-07 23:58:17 +00:00

hzhang

07a0f06e2e

refactor: restructure to plugin/ + services/ layout and add per-turn bootstrap injection

- Migrate src/ → plugin/ (plugin/core/, plugin/web/, plugin/commands/)
  and src/mcp/ → services/ per OpenClaw plugin dev spec
- Add Gemini CLI backend (plugin/core/gemini/sdk-adapter.ts) with GEMINI.md
  system-prompt injection
- Inject bootstrap as stateless system prompt on every turn instead of
  first turn only: Claude via --system-prompt, Gemini via workspace/GEMINI.md;
  eliminates isFirstTurn branch, keeps skills in sync with OpenClaw snapshots
- Fix session-map-store defensive parsing (sessions ?? []) to handle bare {}
  reset files without crashing on .find()
- Add docs/TEST_FLOW.md with E2E test scenarios and expected outcomes
- Add docs/claude/BRIDGE_MODEL_FINDINGS.md with contractor-probe results

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-11 21:21:32 +01:00

2 Commits