567 lines
14 KiB
Markdown
567 lines
14 KiB
Markdown
# Yonexus — Project Plan
|
||
|
||
## 1. Goal
|
||
|
||
Yonexus is an OpenClaw plugin for **cross-instance communication** between multiple OpenClaw deployments.
|
||
|
||
A Yonexus network contains:
|
||
- exactly one instance with role `main`
|
||
- one or more instances with role `follower`
|
||
|
||
The plugin provides:
|
||
- a WebSocket-based communication layer between OpenClaw instances
|
||
- pairing and identity verification for followers
|
||
- persistent follower registry and trust state on the main node
|
||
- heartbeat-based follower status tracking
|
||
- a rule-based message dispatch mechanism
|
||
- TypeScript function interfaces for other plugin/runtime code
|
||
|
||
This project is **not** an organization/identity management plugin anymore. All prior goals are discarded.
|
||
|
||
---
|
||
|
||
## 2. High-Level Architecture
|
||
|
||
### 2.1 Roles
|
||
|
||
Each OpenClaw instance running Yonexus must be configured with a `role`:
|
||
- `main`
|
||
- `follower`
|
||
|
||
Role semantics:
|
||
- `main` is the hub/server for all Yonexus communication
|
||
- `follower` connects outbound to the `main` instance
|
||
|
||
### 2.2 Network Topology
|
||
|
||
- The `main` instance must expose a fixed reachable IP/domain and run a WebSocket service.
|
||
- `follower` instances do not need fixed IP/domain.
|
||
- All `follower` instances connect to the `main` WebSocket endpoint.
|
||
- No direct follower-to-follower communication is required in v1.
|
||
- Messages between followers, if needed, are relayed by `main`.
|
||
|
||
### 2.3 Runtime Lifecycle
|
||
|
||
- On OpenClaw gateway startup:
|
||
- if role is `main`, Yonexus starts a WebSocket server through a hook
|
||
- if role is `follower`, Yonexus starts a WebSocket client and attempts to connect to `mainHost`
|
||
|
||
---
|
||
|
||
## 3. Configuration Model
|
||
|
||
## 3.1 Common Config
|
||
|
||
```ts
|
||
role: "main" | "follower"
|
||
```
|
||
|
||
## 3.2 Follower Config
|
||
|
||
Required when `role === "follower"`:
|
||
|
||
```ts
|
||
mainHost: string
|
||
identifier: string
|
||
```
|
||
|
||
Semantics:
|
||
- `mainHost`: WebSocket endpoint of the main instance (`ip:port` or full URL)
|
||
- `identifier`: unique follower identity inside the Yonexus network
|
||
|
||
## 3.3 Main Config
|
||
|
||
Required when `role === "main"`:
|
||
|
||
```ts
|
||
followerIdentifiers: string[]
|
||
```
|
||
|
||
Semantics:
|
||
- `followerIdentifiers`: allowlist of follower identifiers that are permitted to pair/connect
|
||
|
||
## 3.4 Validation Rules
|
||
|
||
### Main
|
||
- must have `role = main`
|
||
- must provide `followerIdentifiers`
|
||
- must expose a stable/reachable IP/domain outside the plugin itself
|
||
|
||
### Follower
|
||
- must have `role = follower`
|
||
- must provide `mainHost`
|
||
- must provide `identifier`
|
||
|
||
### Shared
|
||
- invalid or missing role-specific fields must fail plugin initialization
|
||
- unknown follower identifiers must be rejected by `main`
|
||
|
||
---
|
||
|
||
## 4. Main Responsibilities
|
||
|
||
The `main` instance must maintain a registry keyed by follower `identifier`.
|
||
|
||
Each follower record contains at minimum:
|
||
- `identifier`
|
||
- `publicKey`
|
||
- `secret`
|
||
- pairing state
|
||
- pairing expiration data
|
||
- connection status
|
||
- security counters/window data
|
||
- heartbeat timestamps
|
||
- last known connection/session metadata
|
||
|
||
The registry must use:
|
||
- in-memory runtime state for active operations
|
||
- persistent on-disk storage for restart survival
|
||
|
||
### 4.1 Persistent Main Registry Model
|
||
|
||
Proposed shape:
|
||
|
||
```ts
|
||
interface FollowerRecord {
|
||
identifier: string;
|
||
publicKey?: string;
|
||
secret?: string;
|
||
pairingStatus: "unpaired" | "pending" | "paired" | "revoked";
|
||
pairingCode?: string;
|
||
pairingExpiresAt?: number;
|
||
status: "online" | "offline" | "unstable";
|
||
lastHeartbeatAt?: number;
|
||
lastAuthenticatedAt?: number;
|
||
recentNonces: Array<{
|
||
nonce: string;
|
||
timestamp: number;
|
||
}>;
|
||
recentHandshakeAttempts: number[];
|
||
createdAt: number;
|
||
updatedAt: number;
|
||
}
|
||
```
|
||
|
||
Notes:
|
||
- `recentNonces` stores only the recent nonce window needed for replay detection
|
||
- `recentHandshakeAttempts` stores timestamps for rate-limiting / unsafe reconnect detection
|
||
- actual field names can change during implementation, but these semantics must remain
|
||
|
||
---
|
||
|
||
## 5. Pairing and Authentication Flow
|
||
|
||
## 5.1 First Connection: Key Generation
|
||
|
||
When a follower connects to main for the first time:
|
||
- the follower generates a public/private key pair locally
|
||
- the private key remains only on the follower
|
||
- the public key is sent to `main` during handshake
|
||
|
||
If `main` sees that:
|
||
- the follower identifier is allowed, and
|
||
- no valid `secret` is currently associated with that identifier
|
||
|
||
then `main` must enter pairing flow.
|
||
|
||
## 5.2 Pairing Flow
|
||
|
||
### Step A: Pairing Request
|
||
`main` responds with a pairing request containing:
|
||
- a random pairing string
|
||
- an expiration time
|
||
|
||
### Step B: Pairing Confirmation
|
||
If the follower sends that random pairing string back to `main` before expiration:
|
||
- pairing succeeds
|
||
|
||
### Step C: Secret Issuance
|
||
After successful pairing:
|
||
- `main` generates a random `secret`
|
||
- `main` returns that `secret` to the follower
|
||
- `main` stores follower `publicKey` + `secret`
|
||
- `follower` stores private key + secret locally
|
||
|
||
If pairing expires before confirmation:
|
||
- pairing fails
|
||
- follower must restart the pairing process
|
||
|
||
## 5.3 Reconnection Authentication Flow
|
||
|
||
After pairing is complete, future follower authentication must use:
|
||
- the stored `secret`
|
||
- a 24-character random nonce
|
||
- current UTC Unix timestamp
|
||
|
||
The follower builds a plaintext proof payload from:
|
||
- `secret`
|
||
- `nonce`
|
||
- `timestamp`
|
||
|
||
Concatenation order:
|
||
|
||
```text
|
||
secret + nonce + timestamp
|
||
```
|
||
|
||
The follower encrypts/signs this payload using its private key and sends it to `main`.
|
||
|
||
`main` verifies:
|
||
1. the follower identifier is known and paired
|
||
2. the public key matches stored state
|
||
3. decrypted/verified payload contains the correct `secret`
|
||
4. timestamp difference from current UTC time is less than 10 seconds
|
||
5. nonce does not collide with the recent nonce window
|
||
6. handshake attempts in the last 10 seconds do not exceed 10
|
||
|
||
If all checks pass:
|
||
- authentication succeeds
|
||
- follower is considered authenticated for the connection/session
|
||
|
||
If any check fails:
|
||
- authentication fails
|
||
- main may downgrade/revoke trust state
|
||
|
||
## 5.4 Unsafe Condition Handling
|
||
|
||
The connection is considered unsafe and must return to pairing flow if either is true:
|
||
- more than 10 handshake attempts occur within 10 seconds
|
||
- the presented nonce collides with one of the last 10 nonces observed within the recent window
|
||
|
||
When unsafe:
|
||
- existing trust state must no longer be accepted for authentication
|
||
- the follower must re-pair
|
||
- main should clear or rotate the stored `secret`
|
||
- main should reset security windows as part of re-pairing
|
||
|
||
---
|
||
|
||
## 6. Heartbeat and Follower Status
|
||
|
||
The main instance must track each follower’s liveness state:
|
||
- `online`
|
||
- `unstable`
|
||
- `offline`
|
||
|
||
## 6.1 Heartbeat Rules
|
||
|
||
Each follower must send a heartbeat to main every 5 minutes.
|
||
|
||
## 6.2 Status Transitions
|
||
|
||
### online
|
||
A follower is `online` when:
|
||
- it has an active authenticated WebSocket connection, and
|
||
- main has received a recent heartbeat
|
||
|
||
### unstable
|
||
A follower becomes `unstable` when:
|
||
- no heartbeat has been received for 7 minutes
|
||
|
||
### offline
|
||
A follower becomes `offline` when:
|
||
- no heartbeat has been received for 11 minutes
|
||
|
||
When follower becomes `offline`:
|
||
- main must close/terminate the WebSocket connection for that follower
|
||
|
||
## 6.3 Status Evaluation Strategy
|
||
|
||
Main should run a periodic status sweep timer to evaluate heartbeat freshness.
|
||
|
||
Recommended initial interval:
|
||
- every 30 to 60 seconds
|
||
|
||
---
|
||
|
||
## 7. Messaging Model
|
||
|
||
Yonexus provides rule-based message dispatch over WebSocket.
|
||
|
||
## 7.1 Base Message Format
|
||
|
||
All application messages must use the format:
|
||
|
||
```text
|
||
${rule_identifier}::${message_content}
|
||
```
|
||
|
||
Constraints:
|
||
- `rule_identifier` is a string token
|
||
- `message_content` is the remainder payload as string
|
||
|
||
## 7.2 Main-Side Rewriting
|
||
|
||
When `main` receives a message from a follower, before rule matching it must rewrite the message into:
|
||
|
||
```text
|
||
${rule_identifier}::${sender_identifier}::${message_content}
|
||
```
|
||
|
||
This ensures rule processors on `main` can identify which follower sent the message.
|
||
|
||
## 7.3 Builtin Rule Namespace
|
||
|
||
The reserved rule identifier is:
|
||
|
||
```text
|
||
builtin
|
||
```
|
||
|
||
It is used internally for:
|
||
- handshake
|
||
- pairing
|
||
- heartbeat
|
||
- protocol/system messages
|
||
|
||
User code must not be allowed to register handlers for `builtin`.
|
||
|
||
---
|
||
|
||
## 8. Rule Registration and Dispatch
|
||
|
||
## 8.1 Public API
|
||
|
||
```ts
|
||
registerRule(rule: string, processor: (message: string) => unknown): void
|
||
```
|
||
|
||
## 8.2 Rule Format
|
||
|
||
`rule` must use the format:
|
||
|
||
```text
|
||
${rule_identifier}
|
||
```
|
||
|
||
Validation rules:
|
||
- must be non-empty
|
||
- must not contain the message delimiter sequence in invalid ways
|
||
- must not equal `builtin`
|
||
|
||
## 8.3 Dispatch Rules
|
||
|
||
When Yonexus receives a message over WebSocket:
|
||
- it iterates registered rules in registration order
|
||
- it finds the first matching rule
|
||
- it invokes the corresponding processor
|
||
- only the first match is used
|
||
|
||
Clarification for implementation:
|
||
- matching should initially be exact match on `rule_identifier`
|
||
- if pattern-based matching is desired later, that must be explicitly added in a future phase
|
||
|
||
If no rule matches:
|
||
- the message is ignored or logged as unhandled, depending on runtime policy
|
||
|
||
---
|
||
|
||
## 9. TypeScript API Surface
|
||
|
||
## 9.1 sendMessageToMain
|
||
|
||
```ts
|
||
sendMessageToMain(message: string): Promise<void>
|
||
```
|
||
|
||
Rules:
|
||
- allowed only on `follower`
|
||
- calling from `main` must throw an error
|
||
- sends message to connected `main`
|
||
- message must already conform to `${rule_identifier}::${message_content}`
|
||
|
||
## 9.2 sendMessageToFollower
|
||
|
||
```ts
|
||
sendMessageToFollower(identifier: string, message: string): Promise<void>
|
||
```
|
||
|
||
Rules:
|
||
- allowed only on `main`
|
||
- calling from `follower` must throw an error
|
||
- target follower must be known and currently connected/authenticated
|
||
- message must already conform to `${rule_identifier}::${message_content}`
|
||
|
||
## 9.3 registerRule
|
||
|
||
```ts
|
||
registerRule(rule: string, processor: (message: string) => unknown): void
|
||
```
|
||
|
||
Rules:
|
||
- rejects `builtin`
|
||
- rejects duplicate rule registration unless an explicit override mode is added later
|
||
- processors are invoked with the final received string after any main-side rewrite
|
||
|
||
---
|
||
|
||
## 10. Hooks and Runtime Integration
|
||
|
||
## 10.1 Main Hook
|
||
|
||
The plugin must register a hook so that when OpenClaw gateway starts:
|
||
- Yonexus initializes internal state
|
||
- Yonexus starts a WebSocket server
|
||
- Yonexus begins follower status sweep tasks
|
||
|
||
## 10.2 Follower Runtime Behavior
|
||
|
||
On startup, follower should:
|
||
- load local identity/secret/private key state
|
||
- connect to `mainHost`
|
||
- perform pairing or authentication flow
|
||
- start periodic heartbeats when authenticated
|
||
- attempt reconnect when disconnected
|
||
|
||
## 10.3 Persistence Requirements
|
||
|
||
### Main persists:
|
||
- follower registry
|
||
- public keys
|
||
- secrets
|
||
- pairing state
|
||
- security/rate-limit windows if needed across restart, or resets them safely
|
||
|
||
### Follower persists:
|
||
- identifier
|
||
- private key
|
||
- current secret
|
||
- minimal pairing/auth state needed for reconnect
|
||
|
||
---
|
||
|
||
## 11. Storage Strategy
|
||
|
||
## 11.1 Main Storage
|
||
|
||
Main needs a local data file for follower registry persistence.
|
||
|
||
Suggested persisted sections:
|
||
- trusted followers
|
||
- pairing pending records
|
||
- last known status metadata
|
||
- security-related rolling records when persistence is desirable
|
||
|
||
## 11.2 Follower Storage
|
||
|
||
Follower needs a local secure data file for:
|
||
- private key
|
||
- secret
|
||
- identifier
|
||
- optional last successful connection metadata
|
||
|
||
## 11.3 Security Notes
|
||
|
||
- private key must never be sent to main
|
||
- secret must be treated as sensitive material
|
||
- storage format should support future encryption-at-rest, but plaintext local file may be acceptable in initial implementation if clearly documented as a limitation
|
||
|
||
---
|
||
|
||
## 12. Error Handling
|
||
|
||
The plugin should define structured errors for at least:
|
||
- invalid configuration
|
||
- invalid role usage
|
||
- unauthorized identifier
|
||
- pairing required
|
||
- pairing expired
|
||
- handshake verification failed
|
||
- replay/nonce collision detected
|
||
- rate limit / unsafe handshake detected
|
||
- follower not connected
|
||
- duplicate rule registration
|
||
- reserved rule registration
|
||
- malformed message
|
||
|
||
---
|
||
|
||
## 13. Initial Implementation Phases
|
||
|
||
## Phase 0 — Protocol and Skeleton
|
||
- finalize config schema
|
||
- define persisted data models
|
||
- define protocol message types for builtin traffic
|
||
- define hook startup behavior
|
||
- define rule registry behavior
|
||
|
||
## Phase 1 — Main/Follower Transport MVP
|
||
- main WebSocket server startup
|
||
- follower WebSocket client startup
|
||
- reconnect logic
|
||
- basic builtin protocol channel
|
||
- persistent registry scaffolding
|
||
|
||
## Phase 2 — Pairing and Authentication
|
||
- follower keypair generation
|
||
- pairing request/confirmation flow
|
||
- secret issuance and persistence
|
||
- signed/encrypted handshake proof verification
|
||
- nonce/replay protection
|
||
- unsafe-condition reset to pairing
|
||
|
||
## Phase 3 — Heartbeat and Status Tracking
|
||
- follower heartbeat sender
|
||
- main heartbeat receiver
|
||
- periodic sweep
|
||
- status transitions: online / unstable / offline
|
||
- forced disconnect on offline
|
||
|
||
## Phase 4 — Public APIs and Message Dispatch
|
||
- `sendMessageToMain`
|
||
- `sendMessageToFollower`
|
||
- `registerRule`
|
||
- first-match dispatch
|
||
- main-side sender rewrite behavior
|
||
|
||
## Phase 5 — Hardening and Docs
|
||
- integration tests
|
||
- failure-path coverage
|
||
- restart recovery checks
|
||
- protocol docs
|
||
- operator setup docs for main/follower deployment
|
||
|
||
---
|
||
|
||
## 14. Non-Goals for Initial Version
|
||
|
||
Not required in the first version unless explicitly added later:
|
||
- direct follower-to-follower sockets
|
||
- multi-main clustering
|
||
- distributed consensus
|
||
- message ordering guarantees across reconnects
|
||
- end-to-end application payload encryption beyond the handshake/authentication requirements
|
||
- UI management panel
|
||
|
||
---
|
||
|
||
## 15. Open Questions To Confirm Later
|
||
|
||
These should be resolved before implementation starts:
|
||
|
||
1. Is the handshake primitive meant to be:
|
||
- asymmetric encryption with private/public key, or
|
||
- digital signature with verification by public key?
|
||
|
||
Recommended: **signature**, not “private-key encryption” wording.
|
||
|
||
2. Should `mainHost` accept only full WebSocket URLs (`ws://` / `wss://`) or also raw `ip:port` strings?
|
||
|
||
3. Should pairing require explicit operator approval on main, or is allowlist membership enough for automatic pairing?
|
||
|
||
4. On unsafe condition, should the old public key be retained or must the follower generate a brand-new keypair?
|
||
|
||
5. Should offline followers be allowed queued outbound messages from main, or should send fail immediately?
|
||
|
||
6. Are rule identifiers exact strings only, or should regex/prefix matching exist in future?
|
||
|
||
---
|
||
|
||
## 16. Immediate Next Deliverables
|
||
|
||
After this plan, the next files to create should be:
|
||
- `FEAT.md` — feature checklist derived from this plan
|
||
- `README.md` — concise operator/developer overview
|
||
- `plugin.json` — plugin config schema and entry declaration
|
||
- protocol notes for builtin messages
|
||
- implementation task breakdown |