fix: wait for gateway ready before post-install model validation #2

Merged
hzhang merged 2 commits from :feat/whispergate-mvp into feat/whispergate-mvp 2026-02-26 11:15:14 +00:00
Collaborator

问题

安装脚本在 gateway restart 后立即执行 model 可见性校验,但 gateway 重启是异步的(systemd),导致校验时新进程还没初始化完成,models list 看不到刚注册的 whisper-gateway/no-reply,触发 rollback。

这是 HANDOFF 文档中 P0/P1 问题的根因。

修复

  1. waitForGatewayReady() — 重启后轮询 openclaw gateway status,等 RPC probe ok 再继续(最多30s)
  2. validateNoReplyModelAvailable() 加重试 — 5次尝试,每次间隔2s,应对 model catalog 刷新延迟
  3. CONFIG.example.jsoncontextWindow: 4096 → 200000maxTokens: 64 → 8192(OpenClaw 要求最低 16000)

测试

  • node --check scripts/install-whispergate-openclaw.mjs
  • 手动验证:写入 provider 配置后 models list 确实需要 gateway 重启才刷新,证实竞态条件存在
## 问题 安装脚本在 `gateway restart` 后立即执行 model 可见性校验,但 gateway 重启是异步的(systemd),导致校验时新进程还没初始化完成,`models list` 看不到刚注册的 `whisper-gateway/no-reply`,触发 rollback。 这是 HANDOFF 文档中 P0/P1 问题的根因。 ## 修复 1. **`waitForGatewayReady()`** — 重启后轮询 `openclaw gateway status`,等 RPC probe ok 再继续(最多30s) 2. **`validateNoReplyModelAvailable()` 加重试** — 5次尝试,每次间隔2s,应对 model catalog 刷新延迟 3. **`CONFIG.example.json`** — `contextWindow: 4096 → 200000`,`maxTokens: 64 → 8192`(OpenClaw 要求最低 16000) ## 测试 - `node --check scripts/install-whispergate-openclaw.mjs` ✅ - 手动验证:写入 provider 配置后 `models list` 确实需要 gateway 重启才刷新,证实竞态条件存在
zhi added 1 commit 2026-02-26 08:46:03 +00:00
Root cause: gateway restart is async (systemd), but validateNoReplyModelAvailable()
ran immediately after, hitting a race condition where the new gateway process
hadn't finished initializing yet. This caused 'model not listed' validation
failures, triggering config rollback even though the config was correct.

Changes:
- Add waitForGatewayReady() that polls 'openclaw gateway status' for RPC probe
- Add retry loop (5 attempts, 2s interval) to validateNoReplyModelAvailable()
- Fix CONFIG.example.json: contextWindow 4096->200000, maxTokens 64->8192
  (OpenClaw requires minimum 16000 contextWindow)
zhi force-pushed feat/whispergate-mvp from de04e21aa1 to fd6c4dd3a2 2026-02-26 08:47:48 +00:00 Compare
zhi added 1 commit 2026-02-26 08:56:29 +00:00
Root cause: PluginHookAgentContext in before_model_resolve only has
agentId, sessionKey, sessionId, workspaceDir, messageProvider.
senderId, channelId, input are NOT available in this hook phase.

The plugin was reading ctx.senderId (undefined) -> inHumanList=false
for ALL Discord sessions -> shouldUseNoReply=true -> all silenced.

Fix: use event.prompt which contains the full user message including
the 'Conversation info (untrusted metadata)' JSON block, and extract
sender_id from there. Same fix applied to before_prompt_build.
hzhang merged commit 6b3d89634a into feat/whispergate-mvp 2026-02-26 11:15:14 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nav/Dirigent#2