docs: rewrite project plan in chinese

This commit is contained in:
nav
2026-04-06 17:55:53 +00:00
parent 723a6d9903
commit 6ba9110f07

View File

@@ -1,162 +1,208 @@
# ClawSpec Project Plan # ClawSpec 项目规划
## 1. Project Overview ## 1. 项目概述
**ClawSpec** is an automation testing framework for OpenClaw plugins. **ClawSpec** 是一个面向 **OpenClaw 插件自动化测试** 的框架。
Its purpose is to provide a deterministic, reproducible way to validate plugin behavior in a realistic OpenClaw runtime environment without relying on an actual LLM provider. 它的目标不是替代插件仓库内部已有的单元测试,而是为 **插件在真实 OpenClaw 运行时中的集成行为** 提供一套可重复、可编排、可验证的自动化测试方案。
Instead of calling real language models, ClawSpec will run OpenClaw instances against a rule-based fake model service. This allows plugin developers to test message handling, tool-calling flows, plugin configuration, integration boundaries, and observable side effects with stable results. ClawSpec 的核心思路是:
- 使用配置文件声明测试场景
- 启动一个或多个 OpenClaw 测试实体claw unit
- 自动安装并配置目标插件
- 使用一个**规则驱动的假模型服务**替代真实 LLM
- 通过规则控制模型回复、工具调用和测试终止时机
- 在测试结束或超时后自动执行断言与验证脚本
- 输出稳定、可复现的测试结果和调试产物
它要解决的问题,本质上是:
> 如何在不依赖真实大模型、不手工搭环境、不靠人工点点点观察日志的前提下,对 OpenClaw 插件做稳定的集成测试。
--- ---
## 2. Problem Statement ## 2. 要解决的问题
OpenClaw plugin testing is currently hard to standardize because: 目前 OpenClaw 插件测试存在几个现实问题:
- plugin behavior often depends on runtime integration rather than isolated pure functions - 插件行为很多不是纯函数,而是依赖 OpenClaw 运行时上下文
- real LLM responses are non-deterministic and expensive - 插件经常要和模型调用、工具调用、消息路由、配置项共同工作
- testing usually requires manual setup of OpenClaw, plugin installation, configuration, and message simulation - 使用真实 LLM 会引入不确定性、成本和复现困难
- there is no unified way to express plugin test environments and expected outcomes - 手工搭建测试环境成本高,流程不统一
- 缺少统一的方式描述测试环境、测试输入和预期结果
ClawSpec aims to solve this by offering: 因此,ClawSpec 需要提供一套标准化能力:
- declarative test environment definitions - 用声明式配置描述测试环境和测试用例
- deterministic model behavior via rules instead of real LLMs - 用假模型替代真实 LLM保证测试稳定性
- automated provisioning of OpenClaw runtime units - 自动拉起 OpenClaw 运行环境并完成插件安装与配置
- repeatable execution of plugin integration tests - 支持单实体和多实体协作场景
- structured validation and test reporting - 在测试结束后自动校验结果并生成报告
--- ---
## 3. Project Goals ## 3. 项目目标
### Primary Goals ### 3.1 核心目标
- Define test environments for one or more OpenClaw runtime units - 定义一个测试配置文件格式,用于描述 OpenClaw 测试场景
- Install and configure plugins automatically per test spec - 根据配置自动创建一个或多个 OpenClaw 测试实体
- Provide a fake model service that responds according to declarative rules - 自动安装和配置测试所需插件
- Execute test cases against running OpenClaw units - 提供规则驱动的假模型服务,不接真实 LLM
- Verify expected behavior through built-in assertions and custom verifier scripts - 支持单轮和多轮对话测试
- Produce reproducible test artifacts and reports - 支持基于断言和脚本的结果验证
- 输出稳定、可重现、便于排查的测试产物
### Secondary Goals ### 3.2 次级目标
- Support multiple OpenClaw versions for compatibility testing - 在需要时支持不同 OpenClaw 版本的兼容性测试
- Support multi-unit scenarios in a single test suite - 支持多个 OpenClaw 实体之间的协作测试
- Provide reusable example specs for plugin developers - 为插件开发者提供可复用的示例测试配置
- Make local plugin integration testing fast enough for everyday development - 让日常本地插件集成测试足够轻量、足够快
### Non-Goals for V1 ### 3.3 非目标(至少不在 V1
- Full UI dashboard - 替代插件仓库内部单元测试
- Large-scale distributed execution - 一上来做图形化管理界面
- Performance benchmarking - 一上来做分布式大规模并发执行
- Fuzzing or random conversation generation - 一上来做性能压测、压力测试、模糊测试
- Automatic support for every possible provider protocol - 一上来模拟所有模型供应商的全部协议细节
- Replacing unit tests inside plugin repositories
--- ---
## 4. Product Positioning ## 4. 产品定位
ClawSpec is **not** a generic unit test runner. ClawSpec **不是一个通用测试框架**,也**不是一个纯单元测试工具**。
It is a **scenario-driven integration testing framework** for validating the behavior of: 它更准确的定位是:
- OpenClaw runtime > 一个面向 OpenClaw 插件生态的、基于场景编排的、确定性集成测试框架。
- installed plugins
- model interaction boundaries
- tool-calling flows
- message outputs
- side effects exposed through plugins or configured backends
The core value is deterministic validation of complex runtime behavior. 它关注的对象是以下整体行为:
- OpenClaw runtime 本身
- 已安装插件的行为
- 模型请求与回复边界
- 工具调用链路
- 消息输出
- 插件对外部系统产生的可观察副作用
所以它测试的不是“某个函数返回值对不对”,而是:
> 当 OpenClaw + 插件 + 模型交互 + 工具链路一起工作时,最终表现是否符合预期。
--- ---
## 5. High-Level Architecture ## 5. 核心设计原则
ClawSpec is expected to contain four major components. ### 5.1 确定性优先
### 5.1 Spec Loader 自动化测试尽量不依赖真实 LLM 的随机性。假模型服务必须成为测试过程中的受控变量。
Responsible for: ### 5.2 运行时真实性
- reading the project spec file 测试目标是接近真实 OpenClaw 运行时,而不是只 mock 插件内部函数。
- validating structure against schema
- normalizing runtime definitions
- producing an execution plan for the runner
### 5.2 Environment Orchestrator ### 5.3 声明式配置
Responsible for: 测试环境、测试输入、模型规则、断言条件都应尽可能通过配置描述,而不是散落在脚本里。
- provisioning OpenClaw test units ### 5.4 默认简单,按需扩展
- generating or managing Docker Compose definitions
- preparing workspace directories and mounted volumes
- installing plugins
- applying plugin configuration
- starting and stopping test environments
### 5.3 Fake Model Service 大部分插件测试只需要一个 claw unit因此单实体测试应当是默认路径多实体协作应当支持但不应让单实体测试变复杂。
Responsible for: ### 5.5 可调试性
- exposing a model-compatible endpoint for OpenClaw 每次测试都要尽量留下足够的产物用于排查,比如日志、规则命中记录、工具调用轨迹、验证脚本输出等。
- receiving model requests from test units
- matching incoming requests against declarative rules
- returning deterministic text responses and/or tool-calling instructions
- logging interactions for debugging and verification
### 5.4 Test Runner
Responsible for:
- selecting target test cases
- injecting input events/messages
- collecting outputs, logs, and tool-call traces
- evaluating built-in assertions
- executing optional verifier scripts
- producing final pass/fail results and artifacts
--- ---
## 6. Core Design Principles ## 6. claw unit 的定位
### Determinism First 这里需要明确一个关键概念:
The framework should avoid real LLM randomness in automated tests. `clawUnits` 的存在,**主要不是为了同时测试多个版本或多个配置**,而是为了支持:
### Runtime Realism - 一个插件要作用于多个 OpenClaw 实体
- 插件行为依赖多个 agent / runtime 协作
- 测试场景本身需要多个 OpenClaw 节点参与
Tests should run against realistic OpenClaw environments, not only mocked plugin internals. 例如:
### Declarative Configuration - A 实体发送消息B 实体响应
- 一个插件监听另一个实体的行为
- 多 agent 协作场景中的插件联动
Test environments and cases should be defined in configuration files rather than hard-coded scripts. 因此:
### Extensible Verification - **绝大多数插件测试应当只需要一个 `clawUnit`**
- `multi-unit` 是为了协作测试,而不是为了把所有变化都塞进一个场景里
Built-in assertions should cover common cases, while custom scripts should support project-specific validation. - 如果只是测试版本兼容性或配置差异,更适合通过多个测试场景分别执行,而不是默认依赖多个 unit 同时运行
### Reproducible Artifacts
All important outputs should be captured for debugging, including logs, matched model rules, tool-call traces, and verifier results.
--- ---
## 7. Proposed Spec Structure ## 7. 总体架构
The initial idea is to define a single spec file that describes: ClawSpec 可以拆成四个核心模块。
- OpenClaw runtime units ### 7.1 Spec Loader配置加载器
- plugin installation and configuration
- test cases
- fake model behavior
- expected validation steps
A normalized V1 structure may look like this: 负责:
- 读取测试配置文件
- 校验配置结构是否合法
- 标准化字段
- 生成内部执行计划
### 7.2 Environment Orchestrator环境编排器
负责:
- 创建 OpenClaw 测试环境
- 生成和管理 Docker Compose 运行定义
- 准备工作目录、挂载目录、产物目录
- 安装插件
- 写入插件配置
- 启动和销毁测试环境
### 7.3 Fake Model Service假模型服务
负责:
- 提供一个供 OpenClaw 调用的模型接口
- 接收 OpenClaw 发来的模型请求
- 根据配置规则决定返回什么
- 生成稳定的文本回复或工具调用
- 在满足终止条件时调用 `test-finished`
- 记录请求、规则命中和输出结果
### 7.4 Test Runner测试执行器
负责:
- 选择测试用例
- 注入输入事件或测试消息
- 收集输出消息、工具调用、日志和事件
- 判断测试何时结束
- 执行断言与外部验证脚本
- 汇总结果并输出报告
---
## 8. 测试配置的整体思路
ClawSpec 需要一个统一的配置文件来描述以下内容:
- 测试环境
- 一个或多个 claw unit
- 各 unit 使用的 OpenClaw 版本或镜像
- 需要安装的插件及配置
- 测试输入
- 假模型行为规则
- 断言和验证脚本
一个建议中的配置结构如下:
```json ```json
{ {
@@ -187,6 +233,7 @@ A normalized V1 structure may look like this:
{ {
"testId": "calendar-reminder-basic", "testId": "calendar-reminder-basic",
"targetUnitId": "calendar-agent", "targetUnitId": "calendar-agent",
"timeout": 15000,
"input": { "input": {
"channel": "discord", "channel": "discord",
"chatType": "direct", "chatType": "direct",
@@ -204,6 +251,16 @@ A normalized V1 structure may look like this:
}, },
"text": "已经帮你记下了" "text": "已经帮你记下了"
} }
},
{
"receive": ".*已经帮你记下了.*",
"action": {
"type": "tool_call",
"toolName": "test-finished",
"toolParameters": {
"reason": "expected final response observed"
}
}
} }
], ],
"expected": [ "expected": [
@@ -225,89 +282,176 @@ A normalized V1 structure may look like this:
} }
``` ```
This is only a starting point. The exact schema should be refined during implementation. 这只是方向草案,后续需要再细化为正式 schema。
--- ---
## 8. Fake Model Service Design ## 9. 假模型服务设计
The fake model service is one of the most important parts of ClawSpec. 假模型服务是 ClawSpec 的关键能力之一。
It should behave like a deterministic model backend that OpenClaw can call during tests. 如果没有它,测试就会依赖真实模型,带来以下问题:
### Responsibilities - 结果不稳定
- 工具调用行为不可预测
- 成本上升
- 失败难以复现
- receive model requests from OpenClaw 因此 ClawSpec 里的假模型不应该只是“随便返回一句话”,而应是一个 **规则驱动的测试模型**
- inspect request content and context
- match a rule set in declared order
- return predefined outputs
- support text-only responses
- support tool-calling responses
- support tool-call plus final text patterns
- emit logs showing which rule matched and what output was generated
### Why This Matters ### 9.1 假模型的职责
Without this service, tests would depend on live model providers, causing: - 接收 OpenClaw 发来的模型请求
- 读取当前 test case 对应的规则集
- 按规则顺序或匹配优先级决定响应动作
- 返回文本回复、工具调用,或两者组合
- 在满足结束条件时触发 `test-finished`
- 输出详细日志,记录每次命中的规则与返回动作
- unstable results ### 9.2 规则支持的动作方向
- variable tool-calling behavior
- token costs
- difficult reproduction of failures
The fake model service turns model behavior into a controlled part of the test spec. 至少应支持:
- 纯文本回复
- 工具调用
- 先工具调用再回复文本
- 明确结束测试(调用 `test-finished`
- 不匹配时的默认行为(例如返回空结果、报错或记录未命中)
### 9.3 多轮对话支持
这里也要明确:
> 多轮对话本身不难,没必要一开始就把它设计成复杂状态机。
一个足够实用且容易落地的规则是:
- 每个 test case 都必须定义 `timeout`
- 每个 test case 的 `modelRules` 中都必须包含一个**测试结束规则**
- 当结束规则被命中时,假模型调用工具 `test-finished`
- 只要 `test-finished` 被观察到,测试就可以进入结果检查阶段
- 如果在 `timeout` 时间内没有观察到 `test-finished`,也应自动停止等待并开始检查结果
这样带来的好处:
- 可以天然支持多轮对话
- 不需要一开始就做复杂状态机
- 测试终止条件明确
- 超时处理简单统一
换句话说ClawSpec 的多轮测试机制,第一版完全可以建立在:
- 规则匹配
- 显式结束信号 `test-finished`
- 统一 timeout
这三件事上。
--- ---
## 9. Verification Model ## 10. 测试结束机制
ClawSpec should support two layers of verification. ClawSpec 中每个测试用例都应该具备两个结束维度:
### 9.1 Built-in Assertions ### 10.1 显式结束
Common assertions should be supported directly by the framework, such as: 由假模型通过工具调用:
- `test-finished`
来表示测试已经到达预期终点。
这代表:
- 测试逻辑已经跑到设计的结束条件
- 可以停止继续等待新一轮交互
- 可以开始执行断言和验证脚本
### 10.2 超时结束
每个 test case 必须提供 `timeout` 字段。
如果在超时时间内没有观察到 `test-finished`,则:
- runner 停止继续等待
- 将当前收集到的日志、消息、工具调用作为测试结果输入
- 自动开始执行断言和验证脚本
这样做的意义是:
- 防止测试无限挂起
- 允许一些“不要求明确结束工具”的场景仍可评估结果
- 给开发者提供统一的失败诊断入口
---
## 11. 验证模型
ClawSpec 至少需要支持两层验证方式。
### 11.1 内建断言
适合高频常见场景,例如:
- `message_contains` - `message_contains`
- `message_equals` - `message_equals`
- `tool_called` - `tool_called`
- `tool_called_with` - `tool_called_with`
- `exit_code`
- `log_contains` - `log_contains`
- `exit_code`
### 9.2 External Verifier Scripts 这些断言由框架直接执行,适合大多数常见插件测试。
Custom verifier scripts should be supported for advanced cases, such as: ### 11.2 外部验证脚本
- checking database state 适合复杂或高度项目定制的检查,例如:
- validating generated files
- verifying HTTP side effects
- checking plugin-specific external systems
This combination keeps common tests simple while preserving flexibility. - 数据库状态是否正确
- 某个文件是否生成
- 某个 HTTP 回调是否发生
- 某个外部服务状态是否变化
这类场景可以通过:
```json
{
"verifier": {
"type": "script",
"path": "./verifiers/check-result.sh"
}
}
```
来执行。
内建断言负责覆盖通用场景,脚本验证负责保留扩展性。
--- ---
## 10. Execution Flow ## 12. 执行流程
A typical ClawSpec test run should look like this: 一次典型的 ClawSpec 测试流程应当如下:
1. Load and validate spec file 1. 读取并校验测试配置
2. Prepare workspace and artifact directories 2. 准备 workspaceartifacts 等目录
3. Materialize runtime environment definitions 3. 根据配置生成运行环境定义
4. Start fake model service 4. 启动假模型服务
5. Start target OpenClaw unit(s) 5. 启动目标 OpenClaw unit(单个或多个)
6. Install and configure required plugins 6. 安装并配置目标插件
7. Inject test input into the target unit 7. 向目标 unit 注入测试输入
8. Let the runtime interact with the fake model service 8. 让 OpenClaw 与假模型进行一轮或多轮交互
9. Collect outputs, logs, tool traces, and events 9. 持续观察是否出现 `test-finished`
10. Evaluate expected assertions 10. 若命中结束信号,则进入结果验证
11. Run external verifier script if defined 11. 若超时,则停止等待并进入结果验证
12. Produce final result summary and artifact bundle 12. 收集日志、消息、工具调用记录和事件轨迹
13. Tear down environment unless retention is requested 13. 执行内建断言
14. 如有需要,执行外部验证脚本
15. 输出 pass/fail、摘要和产物路径
16. 按配置决定是否销毁环境
--- ---
## 11. Proposed Directory Structure ## 13. 推荐目录结构
```text ```text
ClawSpec/ ClawSpec/
@@ -322,7 +466,7 @@ ClawSpec/
│ └── clawspec.schema.json │ └── clawspec.schema.json
├── examples/ ├── examples/
│ ├── basic.json │ ├── basic.json
│ └── calendar-plugin.json │ └── multi-unit.json
├── docker/ ├── docker/
│ ├── compose.template.yml │ ├── compose.template.yml
│ └── fake-model.Dockerfile │ └── fake-model.Dockerfile
@@ -342,170 +486,197 @@ ClawSpec/
--- ---
## 12. Recommended Tech Stack ## 14. 技术选型建议
### Preferred Language ### 14.1 推荐语言
**TypeScript / Node.js** 推荐优先使用 **TypeScript / Node.js**
Reasoning: 理由:
- fits well with OpenClaw ecosystem conventions - 与 OpenClaw 生态更贴近
- convenient for JSON schema validation - 处理 JSON / schema / CLI 更顺手
- good support for CLI tooling - 实现假模型 HTTP 服务成本低
- convenient for HTTP fake service implementation - 调用 Docker、OpenClaw CLI、外部脚本都比较方便
- straightforward Docker and subprocess orchestration
### Suggested Libraries ### 14.2 推荐基础库方向
- `ajv` for schema validation - `ajv`schema 校验
- `commander` or `yargs` for CLI - `commander` `yargs`CLI
- `execa` for shell and Docker command orchestration - `execa`:子进程与命令调度
- `fastify` or `express` for fake model service - `fastify` `express`:假模型服务
- `yaml` for optional YAML support in the future - `yaml`:未来支持 YAML 配置
- `vitest` for ClawSpec self-tests - `vitest`:框架自身测试
--- ---
## 13. V1 Scope ## 15. V1 范围建议
The first version should focus on the smallest useful end-to-end workflow. 第一版应该先打通最小闭环,而不是过早扩张。
### V1 Must Include ### 15.1 V1 必须具备
- load one spec file - 读取一个 spec 文件
- validate the basic schema - 校验基础 schema
- start one OpenClaw test unit - 启动一个 OpenClaw test unit
- install one or more plugins in that unit - 在该 unit 中安装一个或多个插件
- apply plugin configuration entries - 应用插件配置项
- start one fake model service - 启动一个假模型服务
- inject one test input - 注入一条测试输入
- support rule-based text responses - 支持单轮和多轮交互
- support rule-based tool-calling responses - 要求每个 test case 包含 `timeout`
- support basic assertions: - 要求每个 test case 定义显式结束规则,并通过 `test-finished` 结束
- message contains - 支持规则驱动文本回复
- tool called - 支持规则驱动工具调用
- script verifier - 支持基础断言:
- generate logs and a pass/fail summary - `message_contains`
- `tool_called`
- `script verifier`
- 输出日志和 pass/fail 摘要
### V1 Should Avoid ### 15.2 V1 可以先不做
- complex multi-turn state machines - 图形界面
- distributed execution - 分布式执行
- UI dashboard - 性能测试
- performance benchmarking - 高级 provider 协议仿真
- broad provider emulation beyond test needs - 复杂矩阵测试展开
- advanced matrix test expansion - 过度复杂的对话状态机
换句话说V1 只要把:
- 环境起来
- 插件装好
- 假模型能按规则回
- 测试能结束
- 结果能校验
这条链路打通,就已经有实际价值。
--- ---
## 14. Milestone Proposal ## 16. 里程碑建议
### Milestone 0 - Project Bootstrap ### Milestone 0:项目初始化
- initialize repository layout - 初始化仓库结构
- define coding conventions - 建立基础工程
- write initial README and project plan - 写 README 和项目规划
- select runtime and libraries - 明确技术栈
### Milestone 1 - Spec Definition ### Milestone 1:配置定义
- draft spec schema v0.1 - 明确 spec 字段设计
- implement spec parser and validation - 编写 `docs/spec-schema.md`
- add example specs - 编写基础 schema 文件
- 提供一个最小示例
### Milestone 2 - Fake Model Service ### Milestone 2:假模型服务
- define internal rule format - 定义规则结构
- implement rule matcher - 实现规则匹配
- implement deterministic response generation - 实现文本和工具调用响应
- add request/response logging - 实现 `test-finished` 结束机制
- 实现请求与规则命中日志
### Milestone 3 - Environment Orchestrator ### Milestone 3:环境编排
- generate runtime environment configuration - 启动 OpenClaw 容器或运行实例
- start and stop OpenClaw containers - 安装插件
- apply plugin install commands - 写入插件配置
- apply plugin configs - 管理测试生命周期
### Milestone 4 - Test Runner ### Milestone 4:测试执行器
- inject test inputs - 注入测试输入
- collect runtime outputs - 收集输出和工具调用轨迹
- evaluate assertions - 处理 timeout 与结束规则
- execute verifier scripts - 执行断言和验证脚本
- output structured test reports - 输出报告
### Milestone 5 - First Real Plugin Demo ### Milestone 5:真实插件验证
- create an example test suite for a real OpenClaw plugin - 选一个真实 OpenClaw 插件做端到端样例
- validate the full workflow end to end - 验证框架设计是否足够支撑真实需求
- document limitations and next steps - 记录限制和下一步演进方向
--- ---
## 15. Risks and Open Questions ## 17. 风险与待确认问题
### Runtime Interface Risk ### 17.1 OpenClaw 模型接入接口
The exact model-provider interface expected by OpenClaw must be verified early. The fake model service depends on matching this contract well enough for tests. 假模型服务想顺利工作,前提是要足够清楚 OpenClaw 调用模型时的接口约定。这一点必须尽早验证。
### Plugin Installation Variability ### 17.2 插件安装流程差异
Different plugins may require different setup flows. ClawSpec must decide how much it standardizes versus how much it leaves to custom setup hooks. 不同插件可能需要不同安装方式、初始化步骤、配置项和额外依赖。ClawSpec 需要决定哪些做成通用能力,哪些交给 setup hook 或脚本。
### Observable Output Boundaries ### 17.3 可观察结果边界
Some plugins expose behavior through logs, some through tool calls, some through external HTTP effects. The framework must define what counts as the authoritative observable result. 有些插件的结果体现在消息里,有些体现在工具调用里,有些体现在外部系统副作用里。框架需要定义清楚“什么是结果来源”。
### Docker/Image Strategy ### 17.4 Docker / 镜像策略
The project needs a clear policy for: 需要明确:
- official base images - 基础镜像怎么选
- local image overrides - OpenClaw 版本怎么管理
- plugin source mounting during local development - 本地开发时插件源码如何挂载
- OpenClaw version pinning - 如何兼顾快速迭代和环境稳定性
### Test Case Reuse ### 17.5 规则表达能力
It may be useful later to split infra definitions, model rules, and assertions into reusable modules rather than keeping everything in one file. 如果规则太简单,复杂插件测不了;如果规则太复杂,配置会很难写。需要在表达能力和可维护性之间找平衡。
--- ---
## 16. Success Criteria ## 18. 成功标准
ClawSpec can be considered successful in its first phase if: 在第一阶段,ClawSpec 可以被认为成功,如果它满足以下条件:
- a plugin developer can define a test spec without writing framework code - 插件开发者不需要写框架代码就能描述一个测试场景
- a test run is reproducible across machines with the same environment - 同样配置在相同环境下能重复得到相同结果
- plugin integration behavior can be validated without a real LLM - 插件集成行为可以在不依赖真实 LLM 的情况下验证
- failed runs produce enough artifacts to diagnose the issue quickly - 测试失败时能留下足够产物用于定位问题
- at least one real plugin can be tested end-to-end using the framework - 至少有一个真实插件能够通过 ClawSpec 完成端到端自动化测试
--- ---
## 17. Next Recommended Deliverables ## 19. 下一步建议产物
After this plan, the next most useful documents are: 在这份项目规划之后,最值得继续补的文档有:
1. `README.md` — concise positioning and quick start 1. `README.md`
2. `docs/spec-schema.md` — formalize the spec design - 用简洁语言说明项目是什么、解决什么问题、怎么快速开始
3. `schema/clawspec.schema.json` — machine-validatable V0 schema
4. `docs/fake-model.md` — define fake model request/response behavior 2. `docs/spec-schema.md`
5. `TASKLIST.md` or milestone tracker — implementation breakdown - 把配置结构正式写清楚
- 明确 `clawUnits``testCases``modelRules``timeout``test-finished` 等字段
3. `schema/clawspec.schema.json`
- 提供一份可以直接用于校验的机器可读 schema
4. `docs/fake-model.md`
- 明确假模型服务的输入输出协议、规则匹配方式、结束机制
5. `TASKLIST.md`
- 把里程碑拆成可执行任务
--- ---
## 18. Summary ## 20. 总结
ClawSpec should become a deterministic integration testing framework for OpenClaw plugins. ClawSpec 的目标可以概括成一句话:
Its core innovation is simple: > 在真实 OpenClaw 运行时中,用规则驱动的假模型替代真实 LLM对插件进行稳定、可重复的自动化集成测试。
- run real OpenClaw runtime environments 它的关键价值在于:
- replace real LLM behavior with a rule-driven fake model service
- execute declarative test cases
- verify runtime behavior with stable, repeatable assertions
If implemented well, ClawSpec can become the standard foundation for plugin-level automated testing in the OpenClaw ecosystem. - 用真实 runtime 测集成行为
- 用假模型消除 LLM 随机性
- 用声明式配置统一测试场景
- 用显式结束规则 `test-finished` + `timeout` 解决多轮测试收口问题
- 用断言和脚本验证兼顾通用性与扩展性
如果这个框架做成,它会非常适合作为 OpenClaw 插件开发中的基础测试设施。