- Migrate src/ → plugin/ (plugin/core/, plugin/web/, plugin/commands/)
and src/mcp/ → services/ per OpenClaw plugin dev spec
- Add Gemini CLI backend (plugin/core/gemini/sdk-adapter.ts) with GEMINI.md
system-prompt injection
- Inject bootstrap as stateless system prompt on every turn instead of
first turn only: Claude via --system-prompt, Gemini via workspace/GEMINI.md;
eliminates isFirstTurn branch, keeps skills in sync with OpenClaw snapshots
- Fix session-map-store defensive parsing (sessions ?? []) to handle bare {}
reset files without crashing on .find()
- Add docs/TEST_FLOW.md with E2E test scenarios and expected outcomes
- Add docs/claude/BRIDGE_MODEL_FINDINGS.md with contractor-probe results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
237 lines
8.2 KiB
Markdown
237 lines
8.2 KiB
Markdown
# ContractorAgent E2E Test Flow
|
|
|
|
End-to-end test scenarios for the contractor-agent plugin. Each scenario describes
|
|
the precondition, the action, and the expected observable outcome.
|
|
|
|
---
|
|
|
|
## Setup
|
|
|
|
### ST-1 — Clean install
|
|
|
|
**Precondition**: Plugin not installed; test agent workspaces do not exist.
|
|
|
|
**Steps**:
|
|
1. `node scripts/install.mjs --uninstall` (idempotent if already gone)
|
|
2. Remove test agents from `openclaw.json` if present
|
|
3. `node scripts/install.mjs --install`
|
|
4. `openclaw gateway restart`
|
|
|
|
**Expected**:
|
|
- `GET /health` → `{"ok":true,"service":"contractor-bridge"}`
|
|
- `GET /v1/models` → list includes `contractor-claude-bridge` and `contractor-gemini-bridge`
|
|
- OpenClaw logs: `[contractor-agent] plugin registered (bridge port: 18800)`
|
|
|
|
---
|
|
|
|
### ST-2 — Provision Claude contractor agent
|
|
|
|
**Command**: `openclaw contractor-agents add --agent-id claude-e2e --workspace /tmp/claude-e2e-workspace --contractor claude`
|
|
|
|
**Expected**:
|
|
- `openclaw.json` agents list contains `claude-e2e` with model `contractor-agent/contractor-claude-bridge`
|
|
- `/tmp/claude-e2e-workspace/.openclaw/contractor-agent/session-map.json` exists (empty sessions)
|
|
- Workspace files created by `openclaw agents add` are present (SOUL.md, IDENTITY.md, etc.)
|
|
|
|
---
|
|
|
|
### ST-3 — Provision Gemini contractor agent
|
|
|
|
**Command**: `openclaw contractor-agents add --agent-id gemini-e2e --workspace /tmp/gemini-e2e-workspace --contractor gemini`
|
|
|
|
**Expected**: same as ST-2 but model is `contractor-agent/contractor-gemini-bridge`.
|
|
|
|
---
|
|
|
|
## Core Bridge Behaviour
|
|
|
|
### CB-1 — First turn bootstraps persona (Claude)
|
|
|
|
**Precondition**: `claude-e2e` has no active session (session-map empty).
|
|
|
|
**Request**: POST `/v1/chat/completions`, model `contractor-agent/contractor-claude-bridge`,
|
|
system message contains `Runtime: agent=claude-e2e | repo=/tmp/claude-e2e-workspace`,
|
|
user message: `"Introduce yourself briefly."`
|
|
|
|
**Expected**:
|
|
- Response streams SSE chunks, terminates with `[DONE]`
|
|
- Response reflects the persona from `SOUL.md`/`IDENTITY.md` — agent uses the name
|
|
and tone defined in the workspace files (no generic "I'm Claude …" dry response)
|
|
- `session-map.json` now contains one entry with `contractor=claude`, `state=active`,
|
|
and a non-empty `claudeSessionId`
|
|
|
|
**Mechanism**: bootstrap is injected only on the first turn; it embeds the content of
|
|
SOUL.md, IDENTITY.md, USER.md, MEMORY.md inline so the agent adopts the persona
|
|
immediately without needing to read files itself.
|
|
|
|
---
|
|
|
|
### CB-2 — Session resume retains context (Claude)
|
|
|
|
**Precondition**: CB-1 has run; session-map holds an active session.
|
|
|
|
**Request**: same headers, user message: `"What did I ask you in my first message?"`
|
|
|
|
**Expected**:
|
|
- Agent recalls the previous question without re-reading files
|
|
- `session-map.json` `lastActivityAt` updated; `claudeSessionId` unchanged
|
|
|
|
**Mechanism**: bridge detects existing active session, passes `--resume <sessionId>` to
|
|
Claude Code; bootstrap is NOT re-injected.
|
|
|
|
---
|
|
|
|
### CB-3 — MCP tool relay — contractor_echo (Claude)
|
|
|
|
**Precondition**: active session from CB-1; request includes `contractor_echo` tool definition.
|
|
|
|
**Request**: user message: `"Use the contractor_echo tool to echo: hello"`
|
|
|
|
**Expected**:
|
|
- Agent calls the `mcp__openclaw__contractor_echo` tool via the MCP proxy
|
|
- Bridge relays the call to `POST /mcp/execute` → OpenClaw plugin registry
|
|
- Response confirms the echo with a timestamp: `"Echo confirmed: hello at …"`
|
|
|
|
**Mechanism**: bridge writes an MCP config file pointing to
|
|
`services/openclaw-mcp-server.mjs` before each `claude` invocation; the MCP server
|
|
forwards tool calls to the bridge `/mcp/execute` endpoint which resolves them through
|
|
the OpenClaw global plugin registry.
|
|
|
|
---
|
|
|
|
### CB-4 — First turn bootstraps persona (Gemini)
|
|
|
|
Same as CB-1 but model `contractor-agent/contractor-gemini-bridge`, agent `gemini-e2e`.
|
|
|
|
**Expected**: persona from `SOUL.md`/`IDENTITY.md` is reflected; session entry has
|
|
`contractor=gemini`.
|
|
|
|
**Mechanism**: bootstrap is identical; Gemini CLI receives the full prompt via `-p`.
|
|
MCP config is written to `workspace/.gemini/settings.json` (Gemini's project settings
|
|
path) instead of an `--mcp-config` flag.
|
|
|
|
---
|
|
|
|
### CB-5 — Session resume (Gemini)
|
|
|
|
Same as CB-2 but for Gemini.
|
|
|
|
**Expected**: agent recalls prior context via `--resume <UUID>`.
|
|
|
|
---
|
|
|
|
### CB-6 — MCP tool relay (Gemini)
|
|
|
|
Same as CB-3 but for Gemini. Gemini sees the tool as `mcp_openclaw_contractor_echo`
|
|
(single-underscore FQN; server alias is `openclaw` with no underscores).
|
|
|
|
**Expected**: echo confirmed in response.
|
|
|
|
---
|
|
|
|
## Skill Invocation
|
|
|
|
### SK-1 — Agent reads and executes a skill script (Claude)
|
|
|
|
**Precondition**: fresh session (session-map cleared); skill
|
|
`contractor-test-skill` is installed in `~/.openclaw/skills/contractor-test-skill/`.
|
|
System message includes:
|
|
|
|
```xml
|
|
<available_skills>
|
|
<skill>
|
|
<name>contractor-test-skill</name>
|
|
<description>Test skill for verifying that a contractor agent can discover and invoke a workspace script…</description>
|
|
<location>/home/hzhang/.openclaw/skills/contractor-test-skill</location>
|
|
</skill>
|
|
</available_skills>
|
|
```
|
|
|
|
**Request**: user message: `"Run the contractor test skill and show me the output."`
|
|
|
|
**Expected**:
|
|
- Agent reads `SKILL.md` at the given `<location>`
|
|
- Expands `{baseDir}` → `/home/hzhang/.openclaw/skills/contractor-test-skill`
|
|
- Executes `scripts/test.sh` via Bash
|
|
- Response contains:
|
|
```
|
|
=== contractor-test-skill: PASSED ===
|
|
Timestamp: <ISO-8601>
|
|
```
|
|
|
|
**Why first turn matters**: the bootstrap embeds the `<available_skills>` block in the
|
|
system prompt sent to Claude on the first turn. Claude retains this in session memory
|
|
for subsequent turns. If the session is already active when the skill-bearing request
|
|
arrives, Claude won't know about the skill.
|
|
|
|
---
|
|
|
|
### SK-2 — Agent reads and executes a skill script (Gemini)
|
|
|
|
Same as SK-1 but for Gemini. Gemini reads SKILL.md and executes the script.
|
|
|
|
**Expected**: same `PASSED` output block.
|
|
|
|
---
|
|
|
|
## Skills Injection Timing
|
|
|
|
### Background — How OpenClaw injects skills
|
|
|
|
OpenClaw rebuilds the system prompt (including `<available_skills>`) on **every turn**
|
|
and sends it to the model. The skill list comes from a **snapshot** cached in the
|
|
session entry, refreshed under the following conditions:
|
|
|
|
| Trigger | Refresh? |
|
|
|---------|----------|
|
|
| First turn in a new session | ✅ Always |
|
|
| Skills directory file changed (file watcher detects version bump) | ✅ Yes |
|
|
| `openclaw gateway restart` (session entry survives) | ❌ No — old snapshot reused |
|
|
| `openclaw gateway restart` + session reset / new session | ✅ Yes — first-turn logic runs |
|
|
| Skill filter changes | ✅ Yes |
|
|
|
|
### Implication for the contractor bridge
|
|
|
|
The bridge receives the skills block on **every** incoming turn but currently only uses
|
|
it in the **first-turn bootstrap**. Subsequent turns carry updated skills if OpenClaw
|
|
refreshed the snapshot, but the bridge does not re-inject them into the running
|
|
Claude/Gemini session.
|
|
|
|
**Consequence**: if skills are added or removed while a contractor session is active,
|
|
the agent won't see the change until the session is reset (session-map cleared) and a
|
|
new bootstrap is sent.
|
|
|
|
This is intentional for v1: contractor sessions are meant to be long-lived, and skill
|
|
changes mid-session are uncommon. If needed, explicitly clearing the session map forces
|
|
a new bootstrap on the next turn.
|
|
|
|
---
|
|
|
|
## Error Cases
|
|
|
|
### ER-1 — No active session, no workspace in system message
|
|
|
|
**Request**: system message has no `Runtime:` line.
|
|
|
|
**Expected**: bridge logs a warning, falls back to `/tmp` workspace, session key is empty,
|
|
response may succeed but session is not persisted.
|
|
|
|
---
|
|
|
|
### ER-2 — Gemini CLI not installed
|
|
|
|
**Request**: model `contractor-agent/contractor-gemini-bridge`.
|
|
|
|
**Expected**: `dispatchToGemini` spawn fails, bridge streams
|
|
`[contractor-bridge dispatch failed: …]` error chunk, then `[DONE]`.
|
|
Session is not persisted; if a prior session existed, it is marked `orphaned`.
|
|
|
|
---
|
|
|
|
### ER-3 — MCP tool not registered in plugin registry
|
|
|
|
**Request**: tool `unknown_tool` called via `/mcp/execute`.
|
|
|
|
**Expected**: `POST /mcp/execute` returns `{"error":"Tool 'unknown_tool' not registered…"}` (200).
|
|
The agent receives the error text as the tool result and surfaces it in its reply.
|