docs: add initial project plan

2026-04-06 17:41:01 +00:00
parent f4b8d151a9
commit 723a6d9903
1 changed files with 511 additions and 0 deletions
--- a/PROJECT_PLAN.md
+++ b/PROJECT_PLAN.md
@@ -0,0 +1,511 @@
+# ClawSpec Project Plan
+
+## 1. Project Overview
+
+**ClawSpec** is an automation testing framework for OpenClaw plugins.
+
+Its purpose is to provide a deterministic, reproducible way to validate plugin behavior in a realistic OpenClaw runtime environment without relying on an actual LLM provider.
+
+Instead of calling real language models, ClawSpec will run OpenClaw instances against a rule-based fake model service. This allows plugin developers to test message handling, tool-calling flows, plugin configuration, integration boundaries, and observable side effects with stable results.
+
+---
+
+## 2. Problem Statement
+
+OpenClaw plugin testing is currently hard to standardize because:
+
+- plugin behavior often depends on runtime integration rather than isolated pure functions
+- real LLM responses are non-deterministic and expensive
+- testing usually requires manual setup of OpenClaw, plugin installation, configuration, and message simulation
+- there is no unified way to express plugin test environments and expected outcomes
+
+ClawSpec aims to solve this by offering:
+
+- declarative test environment definitions
+- deterministic model behavior via rules instead of real LLMs
+- automated provisioning of OpenClaw runtime units
+- repeatable execution of plugin integration tests
+- structured validation and test reporting
+
+---
+
+## 3. Project Goals
+
+### Primary Goals
+
+- Define test environments for one or more OpenClaw runtime units
+- Install and configure plugins automatically per test spec
+- Provide a fake model service that responds according to declarative rules
+- Execute test cases against running OpenClaw units
+- Verify expected behavior through built-in assertions and custom verifier scripts
+- Produce reproducible test artifacts and reports
+
+### Secondary Goals
+
+- Support multiple OpenClaw versions for compatibility testing
+- Support multi-unit scenarios in a single test suite
+- Provide reusable example specs for plugin developers
+- Make local plugin integration testing fast enough for everyday development
+
+### Non-Goals for V1
+
+- Full UI dashboard
+- Large-scale distributed execution
+- Performance benchmarking
+- Fuzzing or random conversation generation
+- Automatic support for every possible provider protocol
+- Replacing unit tests inside plugin repositories
+
+---
+
+## 4. Product Positioning
+
+ClawSpec is **not** a generic unit test runner.
+
+It is a **scenario-driven integration testing framework** for validating the behavior of:
+
+- OpenClaw runtime
+- installed plugins
+- model interaction boundaries
+- tool-calling flows
+- message outputs
+- side effects exposed through plugins or configured backends
+
+The core value is deterministic validation of complex runtime behavior.
+
+---
+
+## 5. High-Level Architecture
+
+ClawSpec is expected to contain four major components.
+
+### 5.1 Spec Loader
+
+Responsible for:
+
+- reading the project spec file
+- validating structure against schema
+- normalizing runtime definitions
+- producing an execution plan for the runner
+
+### 5.2 Environment Orchestrator
+
+Responsible for:
+
+- provisioning OpenClaw test units
+- generating or managing Docker Compose definitions
+- preparing workspace directories and mounted volumes
+- installing plugins
+- applying plugin configuration
+- starting and stopping test environments
+
+### 5.3 Fake Model Service
+
+Responsible for:
+
+- exposing a model-compatible endpoint for OpenClaw
+- receiving model requests from test units
+- matching incoming requests against declarative rules
+- returning deterministic text responses and/or tool-calling instructions
+- logging interactions for debugging and verification
+
+### 5.4 Test Runner
+
+Responsible for:
+
+- selecting target test cases
+- injecting input events/messages
+- collecting outputs, logs, and tool-call traces
+- evaluating built-in assertions
+- executing optional verifier scripts
+- producing final pass/fail results and artifacts
+
+---
+
+## 6. Core Design Principles
+
+### Determinism First
+
+The framework should avoid real LLM randomness in automated tests.
+
+### Runtime Realism
+
+Tests should run against realistic OpenClaw environments, not only mocked plugin internals.
+
+### Declarative Configuration
+
+Test environments and cases should be defined in configuration files rather than hard-coded scripts.
+
+### Extensible Verification
+
+Built-in assertions should cover common cases, while custom scripts should support project-specific validation.
+
+### Reproducible Artifacts
+
+All important outputs should be captured for debugging, including logs, matched model rules, tool-call traces, and verifier results.
+
+---
+
+## 7. Proposed Spec Structure
+
+The initial idea is to define a single spec file that describes:
+
+- OpenClaw runtime units
+- plugin installation and configuration
+- test cases
+- fake model behavior
+- expected validation steps
+
+A normalized V1 structure may look like this:
+
+```json
+{
+  "version": "v1",
+  "environment": {
+    "networkName": "clawspec-net",
+    "workspaceRoot": "./.clawspec/workspaces",
+    "artifactsRoot": "./.clawspec/artifacts"
+  },
+  "clawUnits": [
+    {
+      "unitId": "calendar-agent",
+      "openClawVersion": "0.5.0",
+      "image": "ghcr.io/openclaw/openclaw:0.5.0",
+      "plugins": [
+        {
+          "pluginName": "harborforge-calendar",
+          "installCommand": "openclaw plugins add harborforge-calendar",
+          "configs": {
+            "backendUrl": "http://fake-backend:8080",
+            "agentId": "calendar-test-agent"
+          }
+        }
+      ]
+    }
+  ],
+  "testCases": [
+    {
+      "testId": "calendar-reminder-basic",
+      "targetUnitId": "calendar-agent",
+      "input": {
+        "channel": "discord",
+        "chatType": "direct",
+        "message": "明天下午3点提醒我开会"
+      },
+      "modelRules": [
+        {
+          "receive": ".*明天下午3点提醒我开会.*",
+          "action": {
+            "type": "tool_call_then_respond",
+            "toolName": "harborforge_calendar_create",
+            "toolParameters": {
+              "title": "开会",
+              "time": "tomorrow 15:00"
+            },
+            "text": "已经帮你记下了"
+          }
+        }
+      ],
+      "expected": [
+        {
+          "type": "tool_called",
+          "toolName": "harborforge_calendar_create"
+        },
+        {
+          "type": "message_contains",
+          "value": "已经帮你记下了"
+        }
+      ],
+      "verifier": {
+        "type": "script",
+        "path": "./verifiers/calendar-reminder-basic.sh"
+      }
+    }
+  ]
+}
+```
+
+This is only a starting point. The exact schema should be refined during implementation.
+
+---
+
+## 8. Fake Model Service Design
+
+The fake model service is one of the most important parts of ClawSpec.
+
+It should behave like a deterministic model backend that OpenClaw can call during tests.
+
+### Responsibilities
+
+- receive model requests from OpenClaw
+- inspect request content and context
+- match a rule set in declared order
+- return predefined outputs
+- support text-only responses
+- support tool-calling responses
+- support tool-call plus final text patterns
+- emit logs showing which rule matched and what output was generated
+
+### Why This Matters
+
+Without this service, tests would depend on live model providers, causing:
+
+- unstable results
+- variable tool-calling behavior
+- token costs
+- difficult reproduction of failures
+
+The fake model service turns model behavior into a controlled part of the test spec.
+
+---
+
+## 9. Verification Model
+
+ClawSpec should support two layers of verification.
+
+### 9.1 Built-in Assertions
+
+Common assertions should be supported directly by the framework, such as:
+
+- `message_contains`
+- `message_equals`
+- `tool_called`
+- `tool_called_with`
+- `exit_code`
+- `log_contains`
+
+### 9.2 External Verifier Scripts
+
+Custom verifier scripts should be supported for advanced cases, such as:
+
+- checking database state
+- validating generated files
+- verifying HTTP side effects
+- checking plugin-specific external systems
+
+This combination keeps common tests simple while preserving flexibility.
+
+---
+
+## 10. Execution Flow
+
+A typical ClawSpec test run should look like this:
+
+1. Load and validate spec file
+2. Prepare workspace and artifact directories
+3. Materialize runtime environment definitions
+4. Start fake model service
+5. Start target OpenClaw unit(s)
+6. Install and configure required plugins
+7. Inject test input into the target unit
+8. Let the runtime interact with the fake model service
+9. Collect outputs, logs, tool traces, and events
+10. Evaluate expected assertions
+11. Run external verifier script if defined
+12. Produce final result summary and artifact bundle
+13. Tear down environment unless retention is requested
+
+---
+
+## 11. Proposed Directory Structure
+
+```text
+ClawSpec/
+├── README.md
+├── PROJECT_PLAN.md
+├── docs/
+│   ├── architecture.md
+│   ├── spec-schema.md
+│   ├── fake-model.md
+│   └── runner.md
+├── schema/
+│   └── clawspec.schema.json
+├── examples/
+│   ├── basic.json
+│   └── calendar-plugin.json
+├── docker/
+│   ├── compose.template.yml
+│   └── fake-model.Dockerfile
+├── src/
+│   ├── cli/
+│   ├── spec/
+│   ├── orchestrator/
+│   ├── model/
+│   ├── runner/
+│   └── report/
+├── verifiers/
+│   └── examples/
+└── .clawspec/
+    ├── workspaces/
+    └── artifacts/
+```
+
+---
+
+## 12. Recommended Tech Stack
+
+### Preferred Language
+
+**TypeScript / Node.js**
+
+Reasoning:
+
+- fits well with OpenClaw ecosystem conventions
+- convenient for JSON schema validation
+- good support for CLI tooling
+- convenient for HTTP fake service implementation
+- straightforward Docker and subprocess orchestration
+
+### Suggested Libraries
+
+- `ajv` for schema validation
+- `commander` or `yargs` for CLI
+- `execa` for shell and Docker command orchestration
+- `fastify` or `express` for fake model service
+- `yaml` for optional YAML support in the future
+- `vitest` for ClawSpec self-tests
+
+---
+
+## 13. V1 Scope
+
+The first version should focus on the smallest useful end-to-end workflow.
+
+### V1 Must Include
+
+- load one spec file
+- validate the basic schema
+- start one OpenClaw test unit
+- install one or more plugins in that unit
+- apply plugin configuration entries
+- start one fake model service
+- inject one test input
+- support rule-based text responses
+- support rule-based tool-calling responses
+- support basic assertions:
+  - message contains
+  - tool called
+  - script verifier
+- generate logs and a pass/fail summary
+
+### V1 Should Avoid
+
+- complex multi-turn state machines
+- distributed execution
+- UI dashboard
+- performance benchmarking
+- broad provider emulation beyond test needs
+- advanced matrix test expansion
+
+---
+
+## 14. Milestone Proposal
+
+### Milestone 0 - Project Bootstrap
+
+- initialize repository layout
+- define coding conventions
+- write initial README and project plan
+- select runtime and libraries
+
+### Milestone 1 - Spec Definition
+
+- draft spec schema v0.1
+- implement spec parser and validation
+- add example specs
+
+### Milestone 2 - Fake Model Service
+
+- define internal rule format
+- implement rule matcher
+- implement deterministic response generation
+- add request/response logging
+
+### Milestone 3 - Environment Orchestrator
+
+- generate runtime environment configuration
+- start and stop OpenClaw containers
+- apply plugin install commands
+- apply plugin configs
+
+### Milestone 4 - Test Runner
+
+- inject test inputs
+- collect runtime outputs
+- evaluate assertions
+- execute verifier scripts
+- output structured test reports
+
+### Milestone 5 - First Real Plugin Demo
+
+- create an example test suite for a real OpenClaw plugin
+- validate the full workflow end to end
+- document limitations and next steps
+
+---
+
+## 15. Risks and Open Questions
+
+### Runtime Interface Risk
+
+The exact model-provider interface expected by OpenClaw must be verified early. The fake model service depends on matching this contract well enough for tests.
+
+### Plugin Installation Variability
+
+Different plugins may require different setup flows. ClawSpec must decide how much it standardizes versus how much it leaves to custom setup hooks.
+
+### Observable Output Boundaries
+
+Some plugins expose behavior through logs, some through tool calls, some through external HTTP effects. The framework must define what counts as the authoritative observable result.
+
+### Docker/Image Strategy
+
+The project needs a clear policy for:
+
+- official base images
+- local image overrides
+- plugin source mounting during local development
+- OpenClaw version pinning
+
+### Test Case Reuse
+
+It may be useful later to split infra definitions, model rules, and assertions into reusable modules rather than keeping everything in one file.
+
+---
+
+## 16. Success Criteria
+
+ClawSpec can be considered successful in its first phase if:
+
+- a plugin developer can define a test spec without writing framework code
+- a test run is reproducible across machines with the same environment
+- plugin integration behavior can be validated without a real LLM
+- failed runs produce enough artifacts to diagnose the issue quickly
+- at least one real plugin can be tested end-to-end using the framework
+
+---
+
+## 17. Next Recommended Deliverables
+
+After this plan, the next most useful documents are:
+
+1. `README.md` — concise positioning and quick start
+2. `docs/spec-schema.md` — formalize the spec design
+3. `schema/clawspec.schema.json` — machine-validatable V0 schema
+4. `docs/fake-model.md` — define fake model request/response behavior
+5. `TASKLIST.md` or milestone tracker — implementation breakdown
+
+---
+
+## 18. Summary
+
+ClawSpec should become a deterministic integration testing framework for OpenClaw plugins.
+
+Its core innovation is simple:
+
+- run real OpenClaw runtime environments
+- replace real LLM behavior with a rule-driven fake model service
+- execute declarative test cases
+- verify runtime behavior with stable, repeatable assertions
+
+If implemented well, ClawSpec can become the standard foundation for plugin-level automated testing in the OpenClaw ecosystem.