refactor: restructure to plugin/ + services/ layout and add per-turn bootstrap injection
- Migrate src/ → plugin/ (plugin/core/, plugin/web/, plugin/commands/)
and src/mcp/ → services/ per OpenClaw plugin dev spec
- Add Gemini CLI backend (plugin/core/gemini/sdk-adapter.ts) with GEMINI.md
system-prompt injection
- Inject bootstrap as stateless system prompt on every turn instead of
first turn only: Claude via --system-prompt, Gemini via workspace/GEMINI.md;
eliminates isFirstTurn branch, keeps skills in sync with OpenClaw snapshots
- Fix session-map-store defensive parsing (sessions ?? []) to handle bare {}
reset files without crashing on .find()
- Add docs/TEST_FLOW.md with E2E test scenarios and expected outcomes
- Add docs/claude/BRIDGE_MODEL_FINDINGS.md with contractor-probe results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
236
docs/TEST_FLOW.md
Normal file
236
docs/TEST_FLOW.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# ContractorAgent E2E Test Flow
|
||||
|
||||
End-to-end test scenarios for the contractor-agent plugin. Each scenario describes
|
||||
the precondition, the action, and the expected observable outcome.
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
### ST-1 — Clean install
|
||||
|
||||
**Precondition**: Plugin not installed; test agent workspaces do not exist.
|
||||
|
||||
**Steps**:
|
||||
1. `node scripts/install.mjs --uninstall` (idempotent if already gone)
|
||||
2. Remove test agents from `openclaw.json` if present
|
||||
3. `node scripts/install.mjs --install`
|
||||
4. `openclaw gateway restart`
|
||||
|
||||
**Expected**:
|
||||
- `GET /health` → `{"ok":true,"service":"contractor-bridge"}`
|
||||
- `GET /v1/models` → list includes `contractor-claude-bridge` and `contractor-gemini-bridge`
|
||||
- OpenClaw logs: `[contractor-agent] plugin registered (bridge port: 18800)`
|
||||
|
||||
---
|
||||
|
||||
### ST-2 — Provision Claude contractor agent
|
||||
|
||||
**Command**: `openclaw contractor-agents add --agent-id claude-e2e --workspace /tmp/claude-e2e-workspace --contractor claude`
|
||||
|
||||
**Expected**:
|
||||
- `openclaw.json` agents list contains `claude-e2e` with model `contractor-agent/contractor-claude-bridge`
|
||||
- `/tmp/claude-e2e-workspace/.openclaw/contractor-agent/session-map.json` exists (empty sessions)
|
||||
- Workspace files created by `openclaw agents add` are present (SOUL.md, IDENTITY.md, etc.)
|
||||
|
||||
---
|
||||
|
||||
### ST-3 — Provision Gemini contractor agent
|
||||
|
||||
**Command**: `openclaw contractor-agents add --agent-id gemini-e2e --workspace /tmp/gemini-e2e-workspace --contractor gemini`
|
||||
|
||||
**Expected**: same as ST-2 but model is `contractor-agent/contractor-gemini-bridge`.
|
||||
|
||||
---
|
||||
|
||||
## Core Bridge Behaviour
|
||||
|
||||
### CB-1 — First turn bootstraps persona (Claude)
|
||||
|
||||
**Precondition**: `claude-e2e` has no active session (session-map empty).
|
||||
|
||||
**Request**: POST `/v1/chat/completions`, model `contractor-agent/contractor-claude-bridge`,
|
||||
system message contains `Runtime: agent=claude-e2e | repo=/tmp/claude-e2e-workspace`,
|
||||
user message: `"Introduce yourself briefly."`
|
||||
|
||||
**Expected**:
|
||||
- Response streams SSE chunks, terminates with `[DONE]`
|
||||
- Response reflects the persona from `SOUL.md`/`IDENTITY.md` — agent uses the name
|
||||
and tone defined in the workspace files (no generic "I'm Claude …" dry response)
|
||||
- `session-map.json` now contains one entry with `contractor=claude`, `state=active`,
|
||||
and a non-empty `claudeSessionId`
|
||||
|
||||
**Mechanism**: bootstrap is injected only on the first turn; it embeds the content of
|
||||
SOUL.md, IDENTITY.md, USER.md, MEMORY.md inline so the agent adopts the persona
|
||||
immediately without needing to read files itself.
|
||||
|
||||
---
|
||||
|
||||
### CB-2 — Session resume retains context (Claude)
|
||||
|
||||
**Precondition**: CB-1 has run; session-map holds an active session.
|
||||
|
||||
**Request**: same headers, user message: `"What did I ask you in my first message?"`
|
||||
|
||||
**Expected**:
|
||||
- Agent recalls the previous question without re-reading files
|
||||
- `session-map.json` `lastActivityAt` updated; `claudeSessionId` unchanged
|
||||
|
||||
**Mechanism**: bridge detects existing active session, passes `--resume <sessionId>` to
|
||||
Claude Code; bootstrap is NOT re-injected.
|
||||
|
||||
---
|
||||
|
||||
### CB-3 — MCP tool relay — contractor_echo (Claude)
|
||||
|
||||
**Precondition**: active session from CB-1; request includes `contractor_echo` tool definition.
|
||||
|
||||
**Request**: user message: `"Use the contractor_echo tool to echo: hello"`
|
||||
|
||||
**Expected**:
|
||||
- Agent calls the `mcp__openclaw__contractor_echo` tool via the MCP proxy
|
||||
- Bridge relays the call to `POST /mcp/execute` → OpenClaw plugin registry
|
||||
- Response confirms the echo with a timestamp: `"Echo confirmed: hello at …"`
|
||||
|
||||
**Mechanism**: bridge writes an MCP config file pointing to
|
||||
`services/openclaw-mcp-server.mjs` before each `claude` invocation; the MCP server
|
||||
forwards tool calls to the bridge `/mcp/execute` endpoint which resolves them through
|
||||
the OpenClaw global plugin registry.
|
||||
|
||||
---
|
||||
|
||||
### CB-4 — First turn bootstraps persona (Gemini)
|
||||
|
||||
Same as CB-1 but model `contractor-agent/contractor-gemini-bridge`, agent `gemini-e2e`.
|
||||
|
||||
**Expected**: persona from `SOUL.md`/`IDENTITY.md` is reflected; session entry has
|
||||
`contractor=gemini`.
|
||||
|
||||
**Mechanism**: bootstrap is identical; Gemini CLI receives the full prompt via `-p`.
|
||||
MCP config is written to `workspace/.gemini/settings.json` (Gemini's project settings
|
||||
path) instead of an `--mcp-config` flag.
|
||||
|
||||
---
|
||||
|
||||
### CB-5 — Session resume (Gemini)
|
||||
|
||||
Same as CB-2 but for Gemini.
|
||||
|
||||
**Expected**: agent recalls prior context via `--resume <UUID>`.
|
||||
|
||||
---
|
||||
|
||||
### CB-6 — MCP tool relay (Gemini)
|
||||
|
||||
Same as CB-3 but for Gemini. Gemini sees the tool as `mcp_openclaw_contractor_echo`
|
||||
(single-underscore FQN; server alias is `openclaw` with no underscores).
|
||||
|
||||
**Expected**: echo confirmed in response.
|
||||
|
||||
---
|
||||
|
||||
## Skill Invocation
|
||||
|
||||
### SK-1 — Agent reads and executes a skill script (Claude)
|
||||
|
||||
**Precondition**: fresh session (session-map cleared); skill
|
||||
`contractor-test-skill` is installed in `~/.openclaw/skills/contractor-test-skill/`.
|
||||
System message includes:
|
||||
|
||||
```xml
|
||||
<available_skills>
|
||||
<skill>
|
||||
<name>contractor-test-skill</name>
|
||||
<description>Test skill for verifying that a contractor agent can discover and invoke a workspace script…</description>
|
||||
<location>/home/hzhang/.openclaw/skills/contractor-test-skill</location>
|
||||
</skill>
|
||||
</available_skills>
|
||||
```
|
||||
|
||||
**Request**: user message: `"Run the contractor test skill and show me the output."`
|
||||
|
||||
**Expected**:
|
||||
- Agent reads `SKILL.md` at the given `<location>`
|
||||
- Expands `{baseDir}` → `/home/hzhang/.openclaw/skills/contractor-test-skill`
|
||||
- Executes `scripts/test.sh` via Bash
|
||||
- Response contains:
|
||||
```
|
||||
=== contractor-test-skill: PASSED ===
|
||||
Timestamp: <ISO-8601>
|
||||
```
|
||||
|
||||
**Why first turn matters**: the bootstrap embeds the `<available_skills>` block in the
|
||||
system prompt sent to Claude on the first turn. Claude retains this in session memory
|
||||
for subsequent turns. If the session is already active when the skill-bearing request
|
||||
arrives, Claude won't know about the skill.
|
||||
|
||||
---
|
||||
|
||||
### SK-2 — Agent reads and executes a skill script (Gemini)
|
||||
|
||||
Same as SK-1 but for Gemini. Gemini reads SKILL.md and executes the script.
|
||||
|
||||
**Expected**: same `PASSED` output block.
|
||||
|
||||
---
|
||||
|
||||
## Skills Injection Timing
|
||||
|
||||
### Background — How OpenClaw injects skills
|
||||
|
||||
OpenClaw rebuilds the system prompt (including `<available_skills>`) on **every turn**
|
||||
and sends it to the model. The skill list comes from a **snapshot** cached in the
|
||||
session entry, refreshed under the following conditions:
|
||||
|
||||
| Trigger | Refresh? |
|
||||
|---------|----------|
|
||||
| First turn in a new session | ✅ Always |
|
||||
| Skills directory file changed (file watcher detects version bump) | ✅ Yes |
|
||||
| `openclaw gateway restart` (session entry survives) | ❌ No — old snapshot reused |
|
||||
| `openclaw gateway restart` + session reset / new session | ✅ Yes — first-turn logic runs |
|
||||
| Skill filter changes | ✅ Yes |
|
||||
|
||||
### Implication for the contractor bridge
|
||||
|
||||
The bridge receives the skills block on **every** incoming turn but currently only uses
|
||||
it in the **first-turn bootstrap**. Subsequent turns carry updated skills if OpenClaw
|
||||
refreshed the snapshot, but the bridge does not re-inject them into the running
|
||||
Claude/Gemini session.
|
||||
|
||||
**Consequence**: if skills are added or removed while a contractor session is active,
|
||||
the agent won't see the change until the session is reset (session-map cleared) and a
|
||||
new bootstrap is sent.
|
||||
|
||||
This is intentional for v1: contractor sessions are meant to be long-lived, and skill
|
||||
changes mid-session are uncommon. If needed, explicitly clearing the session map forces
|
||||
a new bootstrap on the next turn.
|
||||
|
||||
---
|
||||
|
||||
## Error Cases
|
||||
|
||||
### ER-1 — No active session, no workspace in system message
|
||||
|
||||
**Request**: system message has no `Runtime:` line.
|
||||
|
||||
**Expected**: bridge logs a warning, falls back to `/tmp` workspace, session key is empty,
|
||||
response may succeed but session is not persisted.
|
||||
|
||||
---
|
||||
|
||||
### ER-2 — Gemini CLI not installed
|
||||
|
||||
**Request**: model `contractor-agent/contractor-gemini-bridge`.
|
||||
|
||||
**Expected**: `dispatchToGemini` spawn fails, bridge streams
|
||||
`[contractor-bridge dispatch failed: …]` error chunk, then `[DONE]`.
|
||||
Session is not persisted; if a prior session existed, it is marked `orphaned`.
|
||||
|
||||
---
|
||||
|
||||
### ER-3 — MCP tool not registered in plugin registry
|
||||
|
||||
**Request**: tool `unknown_tool` called via `/mcp/execute`.
|
||||
|
||||
**Expected**: `POST /mcp/execute` returns `{"error":"Tool 'unknown_tool' not registered…"}` (200).
|
||||
The agent receives the error text as the tool result and surfaces it in its reply.
|
||||
258
docs/claude/BRIDGE_MODEL_FINDINGS.md
Normal file
258
docs/claude/BRIDGE_MODEL_FINDINGS.md
Normal file
@@ -0,0 +1,258 @@
|
||||
# Bridge Model Probe Findings
|
||||
|
||||
## Purpose
|
||||
|
||||
Document actual test results from running the `contractor-probe` test plugin against a live
|
||||
OpenClaw gateway. Resolves the two critical unknowns identified in earlier feasibility review.
|
||||
|
||||
Test setup: installed `contractor-probe` plugin that exposes an OpenAI-compatible HTTP server
|
||||
on port 8799 and logs every raw request body to `/tmp/contractor-probe-requests.jsonl`.
|
||||
Created a `probe-test` agent with `model: contractor-probe/contractor-probe-bridge`.
|
||||
Sent two consecutive messages via `openclaw agent --channel qa-channel`.
|
||||
|
||||
---
|
||||
|
||||
## Finding 1 — Custom Model Registration Mechanism
|
||||
|
||||
**There is no plugin SDK `registerModelProvider` call.**
|
||||
|
||||
The actual mechanism used by dirigent (and confirmed working for contractor-probe) is:
|
||||
|
||||
### Step 1 — Add provider to `openclaw.json`
|
||||
|
||||
Under `models.providers`, add an entry pointing to a local OpenAI-compatible HTTP server:
|
||||
|
||||
```json
|
||||
"contractor-probe": {
|
||||
"baseUrl": "http://127.0.0.1:8799/v1",
|
||||
"apiKey": "probe-local",
|
||||
"api": "openai-completions",
|
||||
"models": [{
|
||||
"id": "contractor-probe-bridge",
|
||||
"name": "Contractor Probe Bridge (test)",
|
||||
"reasoning": false,
|
||||
"input": ["text"],
|
||||
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
|
||||
"contextWindow": 200000,
|
||||
"maxTokens": 4096
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2 — Plugin starts a sidecar HTTP server
|
||||
|
||||
The plugin's `register()` function starts a Node.js HTTP server on `gateway_start` (protected
|
||||
by a `globalThis` flag to prevent double-start on hot-reload). The server implements:
|
||||
|
||||
- `GET /v1/models` — model list
|
||||
- `POST /v1/chat/completions` — model inference (must support streaming)
|
||||
- `POST /v1/responses` — responses API variant (optional)
|
||||
|
||||
### Step 3 — Agent uses `provider/model` as primary model
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "my-agent",
|
||||
"model": { "primary": "contractor-probe/contractor-probe-bridge" }
|
||||
}
|
||||
```
|
||||
|
||||
### What this means for ContractorAgent
|
||||
|
||||
The `contractor-claude-bridge` model should be registered the same way:
|
||||
|
||||
1. Install script writes a provider entry to `openclaw.json` pointing to the bridge sidecar port
|
||||
2. Plugin starts the bridge sidecar on `gateway_start`
|
||||
3. `openclaw contractor-agents add` sets the agent's primary model to `contractor-claude-bridge`
|
||||
|
||||
No plugin SDK model registration API exists or is needed.
|
||||
|
||||
---
|
||||
|
||||
## Finding 2 — Exact Payload Sent to Custom Model
|
||||
|
||||
OpenClaw sends a standard OpenAI Chat Completions request to the sidecar on every turn.
|
||||
|
||||
### Endpoint and transport
|
||||
|
||||
```
|
||||
POST /v1/chat/completions
|
||||
Content-Type: application/json
|
||||
stream: true ← streaming is always requested; sidecar MUST emit SSE
|
||||
```
|
||||
|
||||
### Message array structure
|
||||
|
||||
**Turn 1 (2 messages):**
|
||||
|
||||
| index | role | content |
|
||||
|-------|------|---------|
|
||||
| 0 | `system` | Full OpenClaw agent context (~28,000 chars) — rebuilt every turn |
|
||||
| 1 | `user` | `[Sat 2026-04-11 08:32 GMT+1] hello from probe test` |
|
||||
|
||||
**Turn 2 (3 messages):**
|
||||
|
||||
| index | role | content |
|
||||
|-------|------|---------|
|
||||
| 0 | `system` | Same full context (~28,000 chars) |
|
||||
| 1 | `user` | `[Sat 2026-04-11 08:32 GMT+1] hello from probe test` |
|
||||
| 2 | `user` | `[Sat 2026-04-11 08:34 GMT+1] and this is the second message` |
|
||||
|
||||
Note: the probe sidecar did not emit proper SSE. As a result, turn 2 shows no assistant message
|
||||
between the two user messages. Once the bridge sidecar returns well-formed SSE, OpenClaw should
|
||||
include assistant turns in history. Needs follow-up verification with streaming.
|
||||
|
||||
### System prompt contents
|
||||
|
||||
The system prompt is assembled by OpenClaw from:
|
||||
- OpenClaw base instructions (tool call style, scheduling rules, ACP guidance)
|
||||
- Workspace context files (AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md)
|
||||
- Skill definitions
|
||||
- Tool guidance text
|
||||
|
||||
In the test, total system prompt was 28,942 chars. This is rebuilt from scratch on every turn.
|
||||
|
||||
### User message format
|
||||
|
||||
```
|
||||
[Day YYYY-MM-DD HH:MM TZ] <message text>
|
||||
```
|
||||
|
||||
Example: `[Sat 2026-04-11 08:32 GMT+1] hello from probe test`
|
||||
|
||||
### Tool definitions
|
||||
|
||||
Full OpenAI function definitions are sent on every request as `tools: [...]`. In the test run,
|
||||
37 tools were included (read, edit, exec, cron, message, sessions_spawn, dirigent tools, etc.).
|
||||
|
||||
### Other request fields
|
||||
|
||||
| field | observed value | notes |
|
||||
|-------|---------------|-------|
|
||||
| `model` | `contractor-probe-bridge` | model id as configured |
|
||||
| `stream` | `true` | always; bridge must stream |
|
||||
| `store` | `false` | |
|
||||
| `max_completion_tokens` | `4096` | from provider model config |
|
||||
|
||||
### What this means for ContractorAgent
|
||||
|
||||
**The input filter is critical.** On every turn, OpenClaw sends:
|
||||
|
||||
- A large system prompt (28K+ chars) that repeats unchanged
|
||||
- The full accumulated user message history
|
||||
- No (or incomplete) assistant message history
|
||||
|
||||
The bridge model must NOT forward this verbatim to Claude Code. Instead:
|
||||
|
||||
1. Extract only the latest user message from the messages array (last `user` role entry)
|
||||
2. Strip the OpenClaw system prompt entirely — Claude Code maintains its own live context
|
||||
3. On first turn: inject a one-time bootstrap block telling Claude it is operating as an
|
||||
OpenClaw contractor agent, with workspace path and session key
|
||||
4. On subsequent turns: forward only the latest user message text
|
||||
|
||||
This keeps Claude as the owner of its own conversational context and avoids dual-context drift.
|
||||
|
||||
**The bridge sidecar must support SSE streaming.** OpenClaw always sets `stream: true`. A
|
||||
non-streaming response causes assistant turn data to be dropped from OpenClaw's session history.
|
||||
|
||||
---
|
||||
|
||||
## Finding 3 — Claude Code Session Continuation Identifier
|
||||
|
||||
From Claude Code documentation research:
|
||||
|
||||
### Session ID format
|
||||
|
||||
UUIDs assigned at session creation. Example: `bc1a7617-0651-443d-a8f1-efeb2957b8c2`
|
||||
|
||||
### Session storage
|
||||
|
||||
```
|
||||
~/.claude/projects/<encoded-cwd>/<session-id>.jsonl
|
||||
```
|
||||
|
||||
`<encoded-cwd>` is the absolute working directory path with every non-alphanumeric character
|
||||
replaced by `-`:
|
||||
- `/home/user/project` → `-home-user-project`
|
||||
|
||||
### CLI resumption
|
||||
|
||||
```bash
|
||||
# Non-interactive mode with session resume
|
||||
claude -p --resume <session-uuid> "next user message"
|
||||
```
|
||||
|
||||
`-p` is print/non-interactive mode. The session UUID is passed as the resume argument.
|
||||
The output includes the session ID so the caller can capture it for next turn.
|
||||
|
||||
### SDK resumption (TypeScript)
|
||||
|
||||
```typescript
|
||||
import { query } from "@anthropic-ai/claude-agent-sdk";
|
||||
|
||||
for await (const message of query({
|
||||
prompt: "next user message",
|
||||
options: {
|
||||
resume: sessionId, // UUID from previous turn
|
||||
allowedTools: ["Read", "Edit", "Glob"],
|
||||
}
|
||||
})) {
|
||||
if (message.type === "result") {
|
||||
const nextSessionId = message.session_id; // capture for next turn
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Session enumeration
|
||||
|
||||
```typescript
|
||||
import { listSessions, getSessionInfo } from "@anthropic-ai/claude-agent-sdk";
|
||||
|
||||
const sessions = await listSessions(); // keyed by session UUID
|
||||
const info = await getSessionInfo(sessionId);
|
||||
```
|
||||
|
||||
### What this means for ContractorAgent
|
||||
|
||||
The `SessionMapEntry.claudeSessionId` field should store the UUID returned by `message.session_id`
|
||||
after each Claude turn. On the next turn, pass it as `options.resume`.
|
||||
|
||||
The session file path can be reconstructed from the session ID and workspace path if needed for
|
||||
recovery or inspection, but direct SDK resumption is the primary path.
|
||||
|
||||
---
|
||||
|
||||
## Finding 4 — Sidecar Port Conflict and Globalthis Guard
|
||||
|
||||
During testing, the probe sidecar failed to start with `EADDRINUSE` when `openclaw agent --local`
|
||||
was used alongside a running gateway, because both tried to spawn the server process.
|
||||
|
||||
This is exactly the hot-reload / double-start problem documented in LESSONS_LEARNED items 1, 3,
|
||||
and 7. The fix for the bridge sidecar:
|
||||
|
||||
1. Check a lock file (e.g. `/tmp/contractor-bridge-sidecar.lock`) before starting
|
||||
2. If the lock file exists and the PID is alive, skip start
|
||||
3. Protect the `startSidecar` call with a `globalThis` flag
|
||||
4. Clean up the lock file on `gateway_stop`
|
||||
|
||||
---
|
||||
|
||||
## Open Questions Resolved
|
||||
|
||||
| Question | Status | Answer |
|
||||
|----------|--------|--------|
|
||||
| Custom model registration API? | ✅ Resolved | `openclaw.json` provider config + OpenAI-compatible sidecar |
|
||||
| Claude session continuation identifier? | ✅ Resolved | UUID via `message.session_id`, resume via `options.resume` |
|
||||
| Does OpenClaw include assistant history? | ⚠️ Partial | Appears yes when sidecar streams correctly; needs retest with SSE |
|
||||
| Streaming required? | ✅ Resolved | Yes, `stream: true` always sent; non-streaming drops assistant history |
|
||||
|
||||
---
|
||||
|
||||
## Immediate Next Steps (Updated)
|
||||
|
||||
1. **Add SSE streaming to probe sidecar** — retest to confirm assistant messages appear in turn 3
|
||||
2. **Build the real bridge sidecar** — implement SSE passthrough from Claude SDK output
|
||||
3. **Implement input filter** — extract latest user message, strip system prompt
|
||||
4. **Implement session map store** — persist UUID → OpenClaw session key mapping
|
||||
5. **Implement bootstrap injection** — first-turn only; include workspace path and session key
|
||||
6. **Add lock file to sidecar** — prevent double-start (LESSONS_LEARNED lesson 7)
|
||||
Reference in New Issue
Block a user