- Migrate src/ → plugin/ (plugin/core/, plugin/web/, plugin/commands/)
and src/mcp/ → services/ per OpenClaw plugin dev spec
- Add Gemini CLI backend (plugin/core/gemini/sdk-adapter.ts) with GEMINI.md
system-prompt injection
- Inject bootstrap as stateless system prompt on every turn instead of
first turn only: Claude via --system-prompt, Gemini via workspace/GEMINI.md;
eliminates isFirstTurn branch, keeps skills in sync with OpenClaw snapshots
- Fix session-map-store defensive parsing (sessions ?? []) to handle bare {}
reset files without crashing on .find()
- Add docs/TEST_FLOW.md with E2E test scenarios and expected outcomes
- Add docs/claude/BRIDGE_MODEL_FINDINGS.md with contractor-probe results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8.2 KiB
ContractorAgent E2E Test Flow
End-to-end test scenarios for the contractor-agent plugin. Each scenario describes the precondition, the action, and the expected observable outcome.
Setup
ST-1 — Clean install
Precondition: Plugin not installed; test agent workspaces do not exist.
Steps:
node scripts/install.mjs --uninstall(idempotent if already gone)- Remove test agents from
openclaw.jsonif present node scripts/install.mjs --installopenclaw gateway restart
Expected:
GET /health→{"ok":true,"service":"contractor-bridge"}GET /v1/models→ list includescontractor-claude-bridgeandcontractor-gemini-bridge- OpenClaw logs:
[contractor-agent] plugin registered (bridge port: 18800)
ST-2 — Provision Claude contractor agent
Command: openclaw contractor-agents add --agent-id claude-e2e --workspace /tmp/claude-e2e-workspace --contractor claude
Expected:
openclaw.jsonagents list containsclaude-e2ewith modelcontractor-agent/contractor-claude-bridge/tmp/claude-e2e-workspace/.openclaw/contractor-agent/session-map.jsonexists (empty sessions)- Workspace files created by
openclaw agents addare present (SOUL.md, IDENTITY.md, etc.)
ST-3 — Provision Gemini contractor agent
Command: openclaw contractor-agents add --agent-id gemini-e2e --workspace /tmp/gemini-e2e-workspace --contractor gemini
Expected: same as ST-2 but model is contractor-agent/contractor-gemini-bridge.
Core Bridge Behaviour
CB-1 — First turn bootstraps persona (Claude)
Precondition: claude-e2e has no active session (session-map empty).
Request: POST /v1/chat/completions, model contractor-agent/contractor-claude-bridge,
system message contains Runtime: agent=claude-e2e | repo=/tmp/claude-e2e-workspace,
user message: "Introduce yourself briefly."
Expected:
- Response streams SSE chunks, terminates with
[DONE] - Response reflects the persona from
SOUL.md/IDENTITY.md— agent uses the name and tone defined in the workspace files (no generic "I'm Claude …" dry response) session-map.jsonnow contains one entry withcontractor=claude,state=active, and a non-emptyclaudeSessionId
Mechanism: bootstrap is injected only on the first turn; it embeds the content of SOUL.md, IDENTITY.md, USER.md, MEMORY.md inline so the agent adopts the persona immediately without needing to read files itself.
CB-2 — Session resume retains context (Claude)
Precondition: CB-1 has run; session-map holds an active session.
Request: same headers, user message: "What did I ask you in my first message?"
Expected:
- Agent recalls the previous question without re-reading files
session-map.jsonlastActivityAtupdated;claudeSessionIdunchanged
Mechanism: bridge detects existing active session, passes --resume <sessionId> to
Claude Code; bootstrap is NOT re-injected.
CB-3 — MCP tool relay — contractor_echo (Claude)
Precondition: active session from CB-1; request includes contractor_echo tool definition.
Request: user message: "Use the contractor_echo tool to echo: hello"
Expected:
- Agent calls the
mcp__openclaw__contractor_echotool via the MCP proxy - Bridge relays the call to
POST /mcp/execute→ OpenClaw plugin registry - Response confirms the echo with a timestamp:
"Echo confirmed: hello at …"
Mechanism: bridge writes an MCP config file pointing to
services/openclaw-mcp-server.mjs before each claude invocation; the MCP server
forwards tool calls to the bridge /mcp/execute endpoint which resolves them through
the OpenClaw global plugin registry.
CB-4 — First turn bootstraps persona (Gemini)
Same as CB-1 but model contractor-agent/contractor-gemini-bridge, agent gemini-e2e.
Expected: persona from SOUL.md/IDENTITY.md is reflected; session entry has
contractor=gemini.
Mechanism: bootstrap is identical; Gemini CLI receives the full prompt via -p.
MCP config is written to workspace/.gemini/settings.json (Gemini's project settings
path) instead of an --mcp-config flag.
CB-5 — Session resume (Gemini)
Same as CB-2 but for Gemini.
Expected: agent recalls prior context via --resume <UUID>.
CB-6 — MCP tool relay (Gemini)
Same as CB-3 but for Gemini. Gemini sees the tool as mcp_openclaw_contractor_echo
(single-underscore FQN; server alias is openclaw with no underscores).
Expected: echo confirmed in response.
Skill Invocation
SK-1 — Agent reads and executes a skill script (Claude)
Precondition: fresh session (session-map cleared); skill
contractor-test-skill is installed in ~/.openclaw/skills/contractor-test-skill/.
System message includes:
<available_skills>
<skill>
<name>contractor-test-skill</name>
<description>Test skill for verifying that a contractor agent can discover and invoke a workspace script…</description>
<location>/home/hzhang/.openclaw/skills/contractor-test-skill</location>
</skill>
</available_skills>
Request: user message: "Run the contractor test skill and show me the output."
Expected:
- Agent reads
SKILL.mdat the given<location> - Expands
{baseDir}→/home/hzhang/.openclaw/skills/contractor-test-skill - Executes
scripts/test.shvia Bash - Response contains:
=== contractor-test-skill: PASSED === Timestamp: <ISO-8601>
Why first turn matters: the bootstrap embeds the <available_skills> block in the
system prompt sent to Claude on the first turn. Claude retains this in session memory
for subsequent turns. If the session is already active when the skill-bearing request
arrives, Claude won't know about the skill.
SK-2 — Agent reads and executes a skill script (Gemini)
Same as SK-1 but for Gemini. Gemini reads SKILL.md and executes the script.
Expected: same PASSED output block.
Skills Injection Timing
Background — How OpenClaw injects skills
OpenClaw rebuilds the system prompt (including <available_skills>) on every turn
and sends it to the model. The skill list comes from a snapshot cached in the
session entry, refreshed under the following conditions:
| Trigger | Refresh? |
|---|---|
| First turn in a new session | ✅ Always |
| Skills directory file changed (file watcher detects version bump) | ✅ Yes |
openclaw gateway restart (session entry survives) |
❌ No — old snapshot reused |
openclaw gateway restart + session reset / new session |
✅ Yes — first-turn logic runs |
| Skill filter changes | ✅ Yes |
Implication for the contractor bridge
The bridge receives the skills block on every incoming turn but currently only uses it in the first-turn bootstrap. Subsequent turns carry updated skills if OpenClaw refreshed the snapshot, but the bridge does not re-inject them into the running Claude/Gemini session.
Consequence: if skills are added or removed while a contractor session is active, the agent won't see the change until the session is reset (session-map cleared) and a new bootstrap is sent.
This is intentional for v1: contractor sessions are meant to be long-lived, and skill changes mid-session are uncommon. If needed, explicitly clearing the session map forces a new bootstrap on the next turn.
Error Cases
ER-1 — No active session, no workspace in system message
Request: system message has no Runtime: line.
Expected: bridge logs a warning, falls back to /tmp workspace, session key is empty,
response may succeed but session is not persisted.
ER-2 — Gemini CLI not installed
Request: model contractor-agent/contractor-gemini-bridge.
Expected: dispatchToGemini spawn fails, bridge streams
[contractor-bridge dispatch failed: …] error chunk, then [DONE].
Session is not persisted; if a prior session existed, it is marked orphaned.
ER-3 — MCP tool not registered in plugin registry
Request: tool unknown_tool called via /mcp/execute.
Expected: POST /mcp/execute returns {"error":"Tool 'unknown_tool' not registered…"} (200).
The agent receives the error text as the tool result and surfaces it in its reply.