reset project and add new yonexus communication plan

This commit is contained in:
nav
2026-03-31 13:59:40 +00:00
parent 00ffef0d8e
commit 83e02829e7
29 changed files with 1465 additions and 2111 deletions

654
PLAN.md
View File

@@ -1,127 +1,567 @@
# Yonexus — Project Plan
## 1) Goal
Build an OpenClaw plugin that models organization hierarchy and agent identities, supports supervisor relationships, provides query tools for agents, and uses shared memory per scope (org/department/team).
## 1. Goal
## 2) Core Concepts
- **Hierarchy**: Organization → Department → Team → Agent
- **Supervisor**: each agent may have exactly one supervisor
- **Identity**: an agent can hold multiple identities across teams/departments
- **Schema-driven metadata**: configurable fields with per-field queryability
- **Scope memory**: shared memory for org/department/team (using `memory_store`, compatible with memory-lancedb-pro)
Yonexus is an OpenClaw plugin for **cross-instance communication** between multiple OpenClaw deployments.
## 3) Storage Strategy
- **Structure & identity data**: in-memory + JSON persistence (no memory_store)
- **Shared memory**: memory_store keyed by scope (`org:{id}`, `dept:{id}`, `team:{id}`)
- **Filesystem resources** (OpenClaw install dir `${openclaw dir}`):
- Create a data-only folder at `${openclaw dir}/yonexus` (no plugin code here)
- `yonexus/organizations/<org-name>/` contains: `teams/`, `docs/`, `notes/`, `knowledge/`, `rules/`, `lessons/`, `workflows/`
- On **create_organization**: create `<org-name>` folder and its subfolders
- On **create_team**: create `organizations/<org-name>/teams/<team-name>/` with `agents/`, `docs/`, `notes/`, `knowledge/`, `rules/`, `lessons/`, `workflows/`
- On **assign_identity**: create `organizations/<org-name>/teams/<team-name>/agents/<agent-id>/` with `docs/`, `notes/`, `knowledge/`, `rules/`, `lessons/`, `workflows/`
A Yonexus network contains:
- exactly one instance with role `main`
- one or more instances with role `follower`
## 4) Permissions Model (B)
Roles:
- Org Admin
- Dept Admin
- Team Lead
- Agent
The plugin provides:
- a WebSocket-based communication layer between OpenClaw instances
- pairing and identity verification for followers
- persistent follower registry and trust state on the main node
- heartbeat-based follower status tracking
- a rule-based message dispatch mechanism
- TypeScript function interfaces for other plugin/runtime code
This project is **not** an organization/identity management plugin anymore. All prior goals are discarded.
---
## 2. High-Level Architecture
### 2.1 Roles
Each OpenClaw instance running Yonexus must be configured with a `role`:
- `main`
- `follower`
Role semantics:
- `main` is the hub/server for all Yonexus communication
- `follower` connects outbound to the `main` instance
### 2.2 Network Topology
- The `main` instance must expose a fixed reachable IP/domain and run a WebSocket service.
- `follower` instances do not need fixed IP/domain.
- All `follower` instances connect to the `main` WebSocket endpoint.
- No direct follower-to-follower communication is required in v1.
- Messages between followers, if needed, are relayed by `main`.
### 2.3 Runtime Lifecycle
- On OpenClaw gateway startup:
- if role is `main`, Yonexus starts a WebSocket server through a hook
- if role is `follower`, Yonexus starts a WebSocket client and attempts to connect to `mainHost`
---
## 3. Configuration Model
## 3.1 Common Config
```ts
role: "main" | "follower"
```
## 3.2 Follower Config
Required when `role === "follower"`:
```ts
mainHost: string
identifier: string
```
Semantics:
- `mainHost`: WebSocket endpoint of the main instance (`ip:port` or full URL)
- `identifier`: unique follower identity inside the Yonexus network
## 3.3 Main Config
Required when `role === "main"`:
```ts
followerIdentifiers: string[]
```
Semantics:
- `followerIdentifiers`: allowlist of follower identifiers that are permitted to pair/connect
## 3.4 Validation Rules
### Main
- must have `role = main`
- must provide `followerIdentifiers`
- must expose a stable/reachable IP/domain outside the plugin itself
### Follower
- must have `role = follower`
- must provide `mainHost`
- must provide `identifier`
### Shared
- invalid or missing role-specific fields must fail plugin initialization
- unknown follower identifiers must be rejected by `main`
---
## 4. Main Responsibilities
The `main` instance must maintain a registry keyed by follower `identifier`.
Each follower record contains at minimum:
- `identifier`
- `publicKey`
- `secret`
- pairing state
- pairing expiration data
- connection status
- security counters/window data
- heartbeat timestamps
- last known connection/session metadata
The registry must use:
- in-memory runtime state for active operations
- persistent on-disk storage for restart survival
### 4.1 Persistent Main Registry Model
Proposed shape:
```ts
interface FollowerRecord {
identifier: string;
publicKey?: string;
secret?: string;
pairingStatus: "unpaired" | "pending" | "paired" | "revoked";
pairingCode?: string;
pairingExpiresAt?: number;
status: "online" | "offline" | "unstable";
lastHeartbeatAt?: number;
lastAuthenticatedAt?: number;
recentNonces: Array<{
nonce: string;
timestamp: number;
}>;
recentHandshakeAttempts: number[];
createdAt: number;
updatedAt: number;
}
```
Notes:
- `recentNonces` stores only the recent nonce window needed for replay detection
- `recentHandshakeAttempts` stores timestamps for rate-limiting / unsafe reconnect detection
- actual field names can change during implementation, but these semantics must remain
---
## 5. Pairing and Authentication Flow
## 5.1 First Connection: Key Generation
When a follower connects to main for the first time:
- the follower generates a public/private key pair locally
- the private key remains only on the follower
- the public key is sent to `main` during handshake
If `main` sees that:
- the follower identifier is allowed, and
- no valid `secret` is currently associated with that identifier
then `main` must enter pairing flow.
## 5.2 Pairing Flow
### Step A: Pairing Request
`main` responds with a pairing request containing:
- a random pairing string
- an expiration time
### Step B: Pairing Confirmation
If the follower sends that random pairing string back to `main` before expiration:
- pairing succeeds
### Step C: Secret Issuance
After successful pairing:
- `main` generates a random `secret`
- `main` returns that `secret` to the follower
- `main` stores follower `publicKey` + `secret`
- `follower` stores private key + secret locally
If pairing expires before confirmation:
- pairing fails
- follower must restart the pairing process
## 5.3 Reconnection Authentication Flow
After pairing is complete, future follower authentication must use:
- the stored `secret`
- a 24-character random nonce
- current UTC Unix timestamp
The follower builds a plaintext proof payload from:
- `secret`
- `nonce`
- `timestamp`
Concatenation order:
```text
secret + nonce + timestamp
```
The follower encrypts/signs this payload using its private key and sends it to `main`.
`main` verifies:
1. the follower identifier is known and paired
2. the public key matches stored state
3. decrypted/verified payload contains the correct `secret`
4. timestamp difference from current UTC time is less than 10 seconds
5. nonce does not collide with the recent nonce window
6. handshake attempts in the last 10 seconds do not exceed 10
If all checks pass:
- authentication succeeds
- follower is considered authenticated for the connection/session
If any check fails:
- authentication fails
- main may downgrade/revoke trust state
## 5.4 Unsafe Condition Handling
The connection is considered unsafe and must return to pairing flow if either is true:
- more than 10 handshake attempts occur within 10 seconds
- the presented nonce collides with one of the last 10 nonces observed within the recent window
When unsafe:
- existing trust state must no longer be accepted for authentication
- the follower must re-pair
- main should clear or rotate the stored `secret`
- main should reset security windows as part of re-pairing
---
## 6. Heartbeat and Follower Status
The main instance must track each followers liveness state:
- `online`
- `unstable`
- `offline`
## 6.1 Heartbeat Rules
Each follower must send a heartbeat to main every 5 minutes.
## 6.2 Status Transitions
### online
A follower is `online` when:
- it has an active authenticated WebSocket connection, and
- main has received a recent heartbeat
### unstable
A follower becomes `unstable` when:
- no heartbeat has been received for 7 minutes
### offline
A follower becomes `offline` when:
- no heartbeat has been received for 11 minutes
When follower becomes `offline`:
- main must close/terminate the WebSocket connection for that follower
## 6.3 Status Evaluation Strategy
Main should run a periodic status sweep timer to evaluate heartbeat freshness.
Recommended initial interval:
- every 30 to 60 seconds
---
## 7. Messaging Model
Yonexus provides rule-based message dispatch over WebSocket.
## 7.1 Base Message Format
All application messages must use the format:
```text
${rule_identifier}::${message_content}
```
Constraints:
- `rule_identifier` is a string token
- `message_content` is the remainder payload as string
## 7.2 Main-Side Rewriting
When `main` receives a message from a follower, before rule matching it must rewrite the message into:
```text
${rule_identifier}::${sender_identifier}::${message_content}
```
This ensures rule processors on `main` can identify which follower sent the message.
## 7.3 Builtin Rule Namespace
The reserved rule identifier is:
```text
builtin
```
It is used internally for:
- handshake
- pairing
- heartbeat
- protocol/system messages
User code must not be allowed to register handlers for `builtin`.
---
## 8. Rule Registration and Dispatch
## 8.1 Public API
```ts
registerRule(rule: string, processor: (message: string) => unknown): void
```
## 8.2 Rule Format
`rule` must use the format:
```text
${rule_identifier}
```
Validation rules:
- must be non-empty
- must not contain the message delimiter sequence in invalid ways
- must not equal `builtin`
## 8.3 Dispatch Rules
When Yonexus receives a message over WebSocket:
- it iterates registered rules in registration order
- it finds the first matching rule
- it invokes the corresponding processor
- only the first match is used
Clarification for implementation:
- matching should initially be exact match on `rule_identifier`
- if pattern-based matching is desired later, that must be explicitly added in a future phase
If no rule matches:
- the message is ignored or logged as unhandled, depending on runtime policy
---
## 9. TypeScript API Surface
## 9.1 sendMessageToMain
```ts
sendMessageToMain(message: string): Promise<void>
```
Rules:
- Supervisor is **not** a role (no inherent permissions)
- Registration **not** self-service
- only configured agent list or human via slash command
- allowed only on `follower`
- calling from `main` must throw an error
- sends message to connected `main`
- message must already conform to `${rule_identifier}::${message_content}`
Permission matrix (recommended):
- create_department → Org Admin
- create_team → Org Admin, Dept Admin (same dept)
- assign_identity → Org Admin, Dept Admin (same dept), Team Lead (same team)
- register_agent → Org Admin, Dept Admin, Team Lead (scope-limited)
- set_supervisor → Org Admin, Dept Admin (same dept)
- query → all roles, but only schema fields with `queryable: true`
## 9.2 sendMessageToFollower
## 5) Schema Configuration (example)
```json
{
"position": { "type": "string", "queryable": true },
"discord_user_id": { "type": "string", "queryable": true },
"git_user_name": { "type": "string", "queryable": true },
"department": { "type": "string", "queryable": false },
"team": { "type": "string", "queryable": false }
}
```ts
sendMessageToFollower(identifier: string, message: string): Promise<void>
```
## 6) Tool/API Surface (MVP)
- `create_organization(name)`
- `create_department(name, orgId)`
- `create_team(name, deptId)`
- `register_agent(agentId, name)`
- `assign_identity(agentId, deptId, teamId, meta)`
- `set_supervisor(actor, agentId, supervisorId)`
- `whoami(agentId)` → identities + supervisor + roles
- `query_agents(filters, options)` → list; supports `eq | contains | regex`
Rules:
- allowed only on `main`
- calling from `follower` must throw an error
- target follower must be known and currently connected/authenticated
- message must already conform to `${rule_identifier}::${message_content}`
Query example:
```json
{
"filters": [
{"field":"discord_user_id","op":"eq","value":"123"},
{"field":"git_user_name","op":"regex","value":"^hang"}
],
"options": {"limit": 20, "offset": 0}
}
## 9.3 registerRule
```ts
registerRule(rule: string, processor: (message: string) => unknown): void
```
## 7) Data Model (MVP)
- Organization { id, name }
- Department { id, name, orgId }
- Team { id, name, deptId }
- Agent { id, name, roles[] }
- Identity { id, agentId, deptId, teamId, meta }
- Supervisor { agentId, supervisorId }
Rules:
- rejects `builtin`
- rejects duplicate rule registration unless an explicit override mode is added later
- processors are invoked with the final received string after any main-side rewrite
## 8) Milestones
**Phase 0 (Design)**
- finalize schema
- confirm permission rules
---
**Phase 1 (MVP)**
- storage + JSON persistence
- core models + tools
- query DSL
- scope memory adapter
## 10. Hooks and Runtime Integration
**Phase 2 (v1)**
- policy refinements
- better query pagination & filtering
- management commands & validation
## 10.1 Main Hook
## 9) Project Structure (recommended)
```
openclaw-plugin-yonexus/
├─ plugin.json
├─ src/
│ ├─ index.ts
│ ├─ store/ # in-memory + JSON persistence
│ ├─ models/
│ ├─ permissions/
│ ├─ tools/
│ ├─ memory/
│ └─ utils/
├─ scripts/
│ └─ install.sh
├─ dist/
│ └─ yonexus/ # build output target
└─ data/
└─ org.json
```
The plugin must register a hook so that when OpenClaw gateway starts:
- Yonexus initializes internal state
- Yonexus starts a WebSocket server
- Yonexus begins follower status sweep tasks
## 10) Install Script Requirement
- Provide `scripts/install.sh`
- It should register the OpenClaw plugin name as **`yonexus`**
- Build artifacts must be placed into **`dist/yonexus`**
## 10.2 Follower Runtime Behavior
## 11) Notes & Decisions
- Structure data is not stored in memory_store.
- Shared memory uses memory_store (compatible with memory-lancedb-pro).
- Queryable fields are whitelisted via schema.
On startup, follower should:
- load local identity/secret/private key state
- connect to `mainHost`
- perform pairing or authentication flow
- start periodic heartbeats when authenticated
- attempt reconnect when disconnected
## 10.3 Persistence Requirements
### Main persists:
- follower registry
- public keys
- secrets
- pairing state
- security/rate-limit windows if needed across restart, or resets them safely
### Follower persists:
- identifier
- private key
- current secret
- minimal pairing/auth state needed for reconnect
---
## 11. Storage Strategy
## 11.1 Main Storage
Main needs a local data file for follower registry persistence.
Suggested persisted sections:
- trusted followers
- pairing pending records
- last known status metadata
- security-related rolling records when persistence is desirable
## 11.2 Follower Storage
Follower needs a local secure data file for:
- private key
- secret
- identifier
- optional last successful connection metadata
## 11.3 Security Notes
- private key must never be sent to main
- secret must be treated as sensitive material
- storage format should support future encryption-at-rest, but plaintext local file may be acceptable in initial implementation if clearly documented as a limitation
---
## 12. Error Handling
The plugin should define structured errors for at least:
- invalid configuration
- invalid role usage
- unauthorized identifier
- pairing required
- pairing expired
- handshake verification failed
- replay/nonce collision detected
- rate limit / unsafe handshake detected
- follower not connected
- duplicate rule registration
- reserved rule registration
- malformed message
---
## 13. Initial Implementation Phases
## Phase 0 — Protocol and Skeleton
- finalize config schema
- define persisted data models
- define protocol message types for builtin traffic
- define hook startup behavior
- define rule registry behavior
## Phase 1 — Main/Follower Transport MVP
- main WebSocket server startup
- follower WebSocket client startup
- reconnect logic
- basic builtin protocol channel
- persistent registry scaffolding
## Phase 2 — Pairing and Authentication
- follower keypair generation
- pairing request/confirmation flow
- secret issuance and persistence
- signed/encrypted handshake proof verification
- nonce/replay protection
- unsafe-condition reset to pairing
## Phase 3 — Heartbeat and Status Tracking
- follower heartbeat sender
- main heartbeat receiver
- periodic sweep
- status transitions: online / unstable / offline
- forced disconnect on offline
## Phase 4 — Public APIs and Message Dispatch
- `sendMessageToMain`
- `sendMessageToFollower`
- `registerRule`
- first-match dispatch
- main-side sender rewrite behavior
## Phase 5 — Hardening and Docs
- integration tests
- failure-path coverage
- restart recovery checks
- protocol docs
- operator setup docs for main/follower deployment
---
## 14. Non-Goals for Initial Version
Not required in the first version unless explicitly added later:
- direct follower-to-follower sockets
- multi-main clustering
- distributed consensus
- message ordering guarantees across reconnects
- end-to-end application payload encryption beyond the handshake/authentication requirements
- UI management panel
---
## 15. Open Questions To Confirm Later
These should be resolved before implementation starts:
1. Is the handshake primitive meant to be:
- asymmetric encryption with private/public key, or
- digital signature with verification by public key?
Recommended: **signature**, not “private-key encryption” wording.
2. Should `mainHost` accept only full WebSocket URLs (`ws://` / `wss://`) or also raw `ip:port` strings?
3. Should pairing require explicit operator approval on main, or is allowlist membership enough for automatic pairing?
4. On unsafe condition, should the old public key be retained or must the follower generate a brand-new keypair?
5. Should offline followers be allowed queued outbound messages from main, or should send fail immediately?
6. Are rule identifiers exact strings only, or should regex/prefix matching exist in future?
---
## 16. Immediate Next Deliverables
After this plan, the next files to create should be:
- `FEAT.md` — feature checklist derived from this plan
- `README.md` — concise operator/developer overview
- `plugin.json` — plugin config schema and entry declaration
- protocol notes for builtin messages
- implementation task breakdown