Files
ContractorAgent/docs/TEST_FLOW.md
hzhang 07a0f06e2e refactor: restructure to plugin/ + services/ layout and add per-turn bootstrap injection
- Migrate src/ → plugin/ (plugin/core/, plugin/web/, plugin/commands/)
  and src/mcp/ → services/ per OpenClaw plugin dev spec
- Add Gemini CLI backend (plugin/core/gemini/sdk-adapter.ts) with GEMINI.md
  system-prompt injection
- Inject bootstrap as stateless system prompt on every turn instead of
  first turn only: Claude via --system-prompt, Gemini via workspace/GEMINI.md;
  eliminates isFirstTurn branch, keeps skills in sync with OpenClaw snapshots
- Fix session-map-store defensive parsing (sessions ?? []) to handle bare {}
  reset files without crashing on .find()
- Add docs/TEST_FLOW.md with E2E test scenarios and expected outcomes
- Add docs/claude/BRIDGE_MODEL_FINDINGS.md with contractor-probe results

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 21:21:32 +01:00

8.2 KiB

ContractorAgent E2E Test Flow

End-to-end test scenarios for the contractor-agent plugin. Each scenario describes the precondition, the action, and the expected observable outcome.


Setup

ST-1 — Clean install

Precondition: Plugin not installed; test agent workspaces do not exist.

Steps:

  1. node scripts/install.mjs --uninstall (idempotent if already gone)
  2. Remove test agents from openclaw.json if present
  3. node scripts/install.mjs --install
  4. openclaw gateway restart

Expected:

  • GET /health{"ok":true,"service":"contractor-bridge"}
  • GET /v1/models → list includes contractor-claude-bridge and contractor-gemini-bridge
  • OpenClaw logs: [contractor-agent] plugin registered (bridge port: 18800)

ST-2 — Provision Claude contractor agent

Command: openclaw contractor-agents add --agent-id claude-e2e --workspace /tmp/claude-e2e-workspace --contractor claude

Expected:

  • openclaw.json agents list contains claude-e2e with model contractor-agent/contractor-claude-bridge
  • /tmp/claude-e2e-workspace/.openclaw/contractor-agent/session-map.json exists (empty sessions)
  • Workspace files created by openclaw agents add are present (SOUL.md, IDENTITY.md, etc.)

ST-3 — Provision Gemini contractor agent

Command: openclaw contractor-agents add --agent-id gemini-e2e --workspace /tmp/gemini-e2e-workspace --contractor gemini

Expected: same as ST-2 but model is contractor-agent/contractor-gemini-bridge.


Core Bridge Behaviour

CB-1 — First turn bootstraps persona (Claude)

Precondition: claude-e2e has no active session (session-map empty).

Request: POST /v1/chat/completions, model contractor-agent/contractor-claude-bridge, system message contains Runtime: agent=claude-e2e | repo=/tmp/claude-e2e-workspace, user message: "Introduce yourself briefly."

Expected:

  • Response streams SSE chunks, terminates with [DONE]
  • Response reflects the persona from SOUL.md/IDENTITY.md — agent uses the name and tone defined in the workspace files (no generic "I'm Claude …" dry response)
  • session-map.json now contains one entry with contractor=claude, state=active, and a non-empty claudeSessionId

Mechanism: bootstrap is injected only on the first turn; it embeds the content of SOUL.md, IDENTITY.md, USER.md, MEMORY.md inline so the agent adopts the persona immediately without needing to read files itself.


CB-2 — Session resume retains context (Claude)

Precondition: CB-1 has run; session-map holds an active session.

Request: same headers, user message: "What did I ask you in my first message?"

Expected:

  • Agent recalls the previous question without re-reading files
  • session-map.json lastActivityAt updated; claudeSessionId unchanged

Mechanism: bridge detects existing active session, passes --resume <sessionId> to Claude Code; bootstrap is NOT re-injected.


CB-3 — MCP tool relay — contractor_echo (Claude)

Precondition: active session from CB-1; request includes contractor_echo tool definition.

Request: user message: "Use the contractor_echo tool to echo: hello"

Expected:

  • Agent calls the mcp__openclaw__contractor_echo tool via the MCP proxy
  • Bridge relays the call to POST /mcp/execute → OpenClaw plugin registry
  • Response confirms the echo with a timestamp: "Echo confirmed: hello at …"

Mechanism: bridge writes an MCP config file pointing to services/openclaw-mcp-server.mjs before each claude invocation; the MCP server forwards tool calls to the bridge /mcp/execute endpoint which resolves them through the OpenClaw global plugin registry.


CB-4 — First turn bootstraps persona (Gemini)

Same as CB-1 but model contractor-agent/contractor-gemini-bridge, agent gemini-e2e.

Expected: persona from SOUL.md/IDENTITY.md is reflected; session entry has contractor=gemini.

Mechanism: bootstrap is identical; Gemini CLI receives the full prompt via -p. MCP config is written to workspace/.gemini/settings.json (Gemini's project settings path) instead of an --mcp-config flag.


CB-5 — Session resume (Gemini)

Same as CB-2 but for Gemini.

Expected: agent recalls prior context via --resume <UUID>.


CB-6 — MCP tool relay (Gemini)

Same as CB-3 but for Gemini. Gemini sees the tool as mcp_openclaw_contractor_echo (single-underscore FQN; server alias is openclaw with no underscores).

Expected: echo confirmed in response.


Skill Invocation

SK-1 — Agent reads and executes a skill script (Claude)

Precondition: fresh session (session-map cleared); skill contractor-test-skill is installed in ~/.openclaw/skills/contractor-test-skill/. System message includes:

<available_skills>
  <skill>
    <name>contractor-test-skill</name>
    <description>Test skill for verifying that a contractor agent can discover and invoke a workspace script…</description>
    <location>/home/hzhang/.openclaw/skills/contractor-test-skill</location>
  </skill>
</available_skills>

Request: user message: "Run the contractor test skill and show me the output."

Expected:

  • Agent reads SKILL.md at the given <location>
  • Expands {baseDir}/home/hzhang/.openclaw/skills/contractor-test-skill
  • Executes scripts/test.sh via Bash
  • Response contains:
    === contractor-test-skill: PASSED ===
    Timestamp: <ISO-8601>
    

Why first turn matters: the bootstrap embeds the <available_skills> block in the system prompt sent to Claude on the first turn. Claude retains this in session memory for subsequent turns. If the session is already active when the skill-bearing request arrives, Claude won't know about the skill.


SK-2 — Agent reads and executes a skill script (Gemini)

Same as SK-1 but for Gemini. Gemini reads SKILL.md and executes the script.

Expected: same PASSED output block.


Skills Injection Timing

Background — How OpenClaw injects skills

OpenClaw rebuilds the system prompt (including <available_skills>) on every turn and sends it to the model. The skill list comes from a snapshot cached in the session entry, refreshed under the following conditions:

Trigger Refresh?
First turn in a new session Always
Skills directory file changed (file watcher detects version bump) Yes
openclaw gateway restart (session entry survives) No — old snapshot reused
openclaw gateway restart + session reset / new session Yes — first-turn logic runs
Skill filter changes Yes

Implication for the contractor bridge

The bridge receives the skills block on every incoming turn but currently only uses it in the first-turn bootstrap. Subsequent turns carry updated skills if OpenClaw refreshed the snapshot, but the bridge does not re-inject them into the running Claude/Gemini session.

Consequence: if skills are added or removed while a contractor session is active, the agent won't see the change until the session is reset (session-map cleared) and a new bootstrap is sent.

This is intentional for v1: contractor sessions are meant to be long-lived, and skill changes mid-session are uncommon. If needed, explicitly clearing the session map forces a new bootstrap on the next turn.


Error Cases

ER-1 — No active session, no workspace in system message

Request: system message has no Runtime: line.

Expected: bridge logs a warning, falls back to /tmp workspace, session key is empty, response may succeed but session is not persisted.


ER-2 — Gemini CLI not installed

Request: model contractor-agent/contractor-gemini-bridge.

Expected: dispatchToGemini spawn fails, bridge streams [contractor-bridge dispatch failed: …] error chunk, then [DONE]. Session is not persisted; if a prior session existed, it is marked orphaned.


ER-3 — MCP tool not registered in plugin registry

Request: tool unknown_tool called via /mcp/execute.

Expected: POST /mcp/execute returns {"error":"Tool 'unknown_tool' not registered…"} (200). The agent receives the error text as the tool result and surfaces it in its reply.