fix(bridge): strip NODE_OPTIONS --inspect before spawning claude/gemini #5

Merged
hzhang merged 1 commits from fix/strip-inspect-from-spawn-env into main 2026-05-31 20:04:55 +00:00
Contributor

Summary

  • Bridge spawn of claude/gemini inherited process.env wholesale.
  • When gateway runs with NODE_OPTIONS=--inspect=127.0.0.1:9229 (debug systemd drop-in), the Node-based CLI children try to bind the same inspector port → EADDRINUSE → silent exit (no stdout/stderr).
  • End-user symptom: bridge returns [contractor-bridge error: claude did not return a session_id] in ~0.5s with empty stderrSummary — opaque diagnostic.

Fix

  • Filter NODE_OPTIONS tokens: drop --inspect* / --inspect-brk* / --debug*, keep everything else (e.g. --max-old-space-size).
  • Applied to both claude/sdk-adapter.ts and gemini/sdk-adapter.ts.

Test plan

  • Repro confirmed on prod-t2: NODE_OPTIONS=--inspect=127.0.0.1:9229 claude -p hi → empty output; without env → full stream-json.
  • After deploy: developer1 (Cody, claude bridge) should respond to sub-discussion messages on Fabric channel.
  • developer2 (gemini bridge) same code path.
## Summary - Bridge spawn of claude/gemini inherited `process.env` wholesale. - When gateway runs with `NODE_OPTIONS=--inspect=127.0.0.1:9229` (debug systemd drop-in), the Node-based CLI children try to bind the same inspector port → EADDRINUSE → silent exit (no stdout/stderr). - End-user symptom: bridge returns `[contractor-bridge error: claude did not return a session_id]` in ~0.5s with empty stderrSummary — opaque diagnostic. ## Fix - Filter NODE_OPTIONS tokens: drop `--inspect*` / `--inspect-brk*` / `--debug*`, keep everything else (e.g. `--max-old-space-size`). - Applied to both `claude/sdk-adapter.ts` and `gemini/sdk-adapter.ts`. ## Test plan - [x] Repro confirmed on prod-t2: `NODE_OPTIONS=--inspect=127.0.0.1:9229 claude -p hi` → empty output; without env → full stream-json. - [ ] After deploy: developer1 (Cody, claude bridge) should respond to sub-discussion messages on Fabric channel. - [ ] developer2 (gemini bridge) same code path.
hzhang added 1 commit 2026-05-31 20:04:54 +00:00
claude-code and gemini-cli are both Node binaries. When the parent
gateway is launched with `NODE_OPTIONS=--inspect=127.0.0.1:9229` (for
debugging), spawn(child).env = {...process.env} propagates the flag into
the child. The child Node then tries to bind the same inspector port,
fails EADDRINUSE, and exits SILENTLY (no stdout, no stderr).

Bridge sees an empty stream and reports `claude did not return a
session_id` with an empty stderr summary — extremely opaque diagnostic
that took non-trivial digging to root-cause.

Sanitize NODE_OPTIONS before spawn: keep everything except
`--inspect*` / `--inspect-brk*` / `--debug*`. Operators that legitimately
need other NODE_OPTIONS values (e.g. `--max-old-space-size`) keep them.

Verified end-user repro on prod-t2 2026-05-31: with
`Environment=NODE_OPTIONS=--inspect=127.0.0.1:9229` in the gateway
systemd drop-in, `claude -p "hi" --output-format stream-json --verbose`
spawned from the bridge returned ZERO bytes; running the exact same
command from a shell without the env var returned the full init →
assistant → result stream in ~6s. Surfaced recruiting developer1
(Cody, contractor-claude-bridge).
hzhang merged commit d726c3c35d into main 2026-05-31 20:04:55 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nav/ContractorAgent#5