Files
Yonexus/PLAN.md

610 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Yonexus — Project Plan
## 1. Goal
Yonexus is a cross-instance communication system for OpenClaw, implemented as **two separate plugins**:
- `Yonexus.Server`
- `Yonexus.Client`
Together they provide:
- communication between multiple OpenClaw instances
- a central WebSocket hub model
- client pairing and authentication
- heartbeat-based client liveness tracking
- rule-based message dispatch
- out-of-band pairing notification to a human administrator via Discord DM
- TypeScript interfaces for higher-level plugin/runtime integrations
This project is no longer a role-switched single plugin. It is now explicitly split into two installable plugins with distinct responsibilities.
---
## 2. Plugin Split
## 2.1 Yonexus.Server
`Yonexus.Server` is installed only on the main OpenClaw instance.
Responsibilities:
- start and maintain the WebSocket server
- accept incoming client connections
- maintain the client registry
- handle pairing flow
- verify authentication proofs
- track heartbeat and connection state
- route or relay messages to connected clients
- rewrite inbound client messages before rule dispatch
- send Discord DM pairing notifications to the human administrator
## 2.2 Yonexus.Client
`Yonexus.Client` is installed on follower OpenClaw instances.
Responsibilities:
- connect to the configured Yonexus server
- generate and persist local keypair on first use
- persist local client identity and secret
- perform pairing confirmation
- perform authenticated reconnect
- send periodic heartbeats
- expose client-side messaging and rule registration APIs
---
## 3. Deployment Model
A Yonexus network contains:
- exactly one OpenClaw instance running `Yonexus.Server`
- one or more OpenClaw instances running `Yonexus.Client`
Topology rules:
- `Yonexus.Server` must be reachable via fixed IP/domain or otherwise stable addressable endpoint
- `Yonexus.Client` instances do not need stable public IP/domain
- all `Yonexus.Client` instances connect outbound to the `Yonexus.Server` WebSocket endpoint
- no direct client-to-client communication is required in v1
- inter-client communication, if needed, is relayed by `Yonexus.Server`
---
## 4. Configuration Model
## 4.1 Yonexus.Server Config
```ts
followerIdentifiers: string[]
notifyBotToken: string
adminUserId: string
listenHost?: string
listenPort: number
publicWsUrl?: string
```
Semantics:
- `followerIdentifiers`: allowlist of client identifiers permitted to pair/connect
- `notifyBotToken`: Discord bot token used to send pairing notifications
- `adminUserId`: Discord user id of the human administrator who receives pairing codes by DM
- `listenHost`: local bind host for WebSocket server
- `listenPort`: local bind port for WebSocket server
- `publicWsUrl`: optional canonical external URL advertised/documented for clients
## 4.2 Yonexus.Client Config
```ts
mainHost: string
identifier: string
notifyBotToken: string
adminUserId: string
```
Semantics:
- `mainHost`: WebSocket endpoint of `Yonexus.Server`
- `identifier`: unique identity of this client inside the Yonexus network
- `notifyBotToken`: kept aligned with shared config expectations if future client-side notification behaviors are needed
- `adminUserId`: human administrator identity reference shared with the Yonexus system
## 4.3 Validation Rules
### Yonexus.Server
- must provide `followerIdentifiers`
- must provide `notifyBotToken`
- must provide `adminUserId`
- must provide `listenPort`
- must be deployed on a reachable/stable endpoint
### Yonexus.Client
- must provide `mainHost`
- must provide `identifier`
- must provide `notifyBotToken`
- must provide `adminUserId`
### Shared
- invalid or missing required fields must fail plugin initialization
- unknown client identifiers must be rejected by `Yonexus.Server`
---
## 4.4 Shared Terminology Baseline
These names are normative across umbrella docs, protocol docs, and implementation repos:
- `identifier`: the unique logical name of a client/follower instance.
- `rule_identifier`: the exact-match application route key.
- `builtin`: reserved rule namespace for protocol/system frames.
- `pairingCode`: short-lived out-of-band code delivered to the human admin.
- `secret`: server-issued shared secret used for reconnect proof construction.
- `publicKey` / `privateKey`: client-held signing keypair.
- `nextAction`: server-directed next step returned by `hello_ack`.
Implementations should avoid introducing alternative synonyms for these fields unless there is a versioned migration plan.
---
## 5. Runtime Lifecycle
## 5.1 Yonexus.Server Startup
On OpenClaw gateway startup:
- initialize persistent client registry
- start WebSocket server
- register builtin protocol handlers
- register application rule registry
- start heartbeat/status sweep timer
## 5.2 Yonexus.Client Startup
On OpenClaw gateway startup:
- load local persisted identity, private key, and secret state
- generate keypair if absent
- connect to `mainHost`
- perform pairing or authentication flow depending on local state
- start heartbeat schedule after successful authentication
- attempt reconnect when disconnected
---
## 6. Server Registry and Persistence
`Yonexus.Server` must maintain a registry keyed by client `identifier`.
Each client record contains at minimum:
- `identifier`
- `publicKey`
- `secret`
- pairing state
- pairing expiration data
- pairing notification metadata
- connection status
- security counters/window data
- heartbeat timestamps
- last known session metadata
The registry must use:
- in-memory runtime state for active connections and recent security windows
- persistent on-disk storage for durable trust state
### 6.1 Proposed Server Record Shape
```ts
interface ClientRecord {
identifier: string;
publicKey?: string;
secret?: string;
pairingStatus: "unpaired" | "pending" | "paired" | "revoked";
pairingCode?: string;
pairingExpiresAt?: number;
pairingNotifiedAt?: number;
pairingNotifyStatus?: "pending" | "sent" | "failed";
status: "online" | "offline" | "unstable";
lastHeartbeatAt?: number;
lastAuthenticatedAt?: number;
recentNonces: Array<{
nonce: string;
timestamp: number;
}>;
recentHandshakeAttempts: number[];
createdAt: number;
updatedAt: number;
}
```
---
## 7. Pairing and Authentication
## 7.1 First Connection and Key Generation
When a client connects to the server for the first time:
- `Yonexus.Client` generates a public/private key pair locally
- the private key remains only on the client instance
- the public key is sent to `Yonexus.Server` during handshake
If the server sees that:
- the client identifier is allowed, and
- there is no valid `secret` currently associated with that identifier
then the server must enter pairing flow.
## 7.2 Pairing Flow
### Step A: Pairing Request Creation
`Yonexus.Server` generates:
- a random pairing string
- an expiration time
The pairing string must **not** be sent to the client over WebSocket.
Instead, `Yonexus.Server` uses `notifyBotToken` to send a Discord DM to `adminUserId` containing:
- the client `identifier`
- the generated `pairingCode`
- the expiration time
### Step B: Pairing Confirmation
The client must provide the pairing code back to the server before expiration.
How the client operator obtains the pairing code is intentionally out-of-band from the Yonexus WebSocket channel. The server only trusts that the code came through some human-mediated path.
If the client sends the correct pairing code before expiration:
- pairing succeeds
### Step C: Secret Issuance
After successful pairing:
- `Yonexus.Server` generates a random `secret`
- `Yonexus.Server` returns that `secret` to the client
- `Yonexus.Server` stores client `publicKey` + `secret`
- `Yonexus.Client` stores private key + secret locally
If Discord DM delivery fails:
- pairing must not proceed
- server should mark the pairing attempt as failed or pending-error
- client must not receive a usable pairing code through the protocol channel
If pairing expires before confirmation:
- pairing fails
- the client must restart the pairing process
## 7.3 Reconnect Authentication Flow
After pairing is complete, future client authentication must use:
- the stored `secret`
- a 24-character random nonce
- current UTC Unix timestamp
The client builds a proof payload from:
- `secret`
- `nonce`
- `timestamp`
Logical concatenation order:
```text
secret + nonce + timestamp
```
Implementation recommendation:
- use a canonical serialized object and sign its bytes rather than naive string concatenation in code
The client signs the proof using its private key and sends it to the server.
The server verifies:
1. identifier is known and paired
2. public key matches stored state
3. proof contains the correct `secret`
4. timestamp difference from current time is less than 10 seconds
5. nonce does not collide with the recent nonce window
6. handshake attempts in the last 10 seconds do not exceed 10
If all checks pass:
- authentication succeeds
- the client is considered authenticated for the session
If any check fails:
- authentication fails
- server may downgrade or revoke trust state
## 7.4 Unsafe Condition Handling
The connection is considered unsafe and must return to pairing flow if either is true:
- more than 10 handshake attempts occur within 10 seconds
- the presented nonce collides with one of the last 10 nonces observed within the recent window
When unsafe:
- existing trust state must no longer be accepted for authentication
- the client must re-pair
- server should clear or rotate the stored `secret`
- server should reset security windows as part of re-pairing
---
## 8. Heartbeat and Client Status
The server must track each clients liveness state:
- `online`
- `unstable`
- `offline`
## 8.1 Heartbeat Rules
Each client must send a heartbeat to the server every 5 minutes.
## 8.2 Status Transitions
### online
A client is `online` when:
- it has an active authenticated WebSocket connection, and
- the server has received a recent heartbeat
### unstable
A client becomes `unstable` when:
- no heartbeat has been received for 7 minutes
### offline
A client becomes `offline` when:
- no heartbeat has been received for 11 minutes
When a client becomes `offline`:
- the server must close/terminate the WebSocket connection for that client
## 8.3 Status Evaluation Strategy
The server should run a periodic status sweep timer.
Recommended interval:
- every 30 to 60 seconds
---
## 9. Messaging Model
Yonexus provides rule-based message dispatch over WebSocket.
## 9.1 Base Message Format
All application messages must use the format:
```text
${rule_identifier}::${message_content}
```
## 9.2 Server-Side Rewriting
When `Yonexus.Server` receives a message from a client, before rule matching it must rewrite the message into:
```text
${rule_identifier}::${sender_identifier}::${message_content}
```
This ensures server-side processors can identify which client sent the message.
## 9.3 Builtin Rule Namespace
The reserved rule identifier is:
```text
builtin
```
It is used internally for:
- handshake
- pairing
- heartbeat
- protocol/system messages
User code must not be allowed to register handlers for `builtin`.
---
## 10. TypeScript API Surface
## 10.1 Yonexus.Client API
```ts
sendMessageToServer(message: string): Promise<void>
```
Rules:
- sends message to connected `Yonexus.Server`
- message must already conform to `${rule_identifier}::${message_content}`
```ts
registerRule(rule: string, processor: (message: string) => unknown): void
```
Rules:
- rejects `builtin`
- rejects duplicate rule registration unless explicit override support is added later
## 10.2 Yonexus.Server API
```ts
sendMessageToClient(identifier: string, message: string): Promise<void>
```
Rules:
- target client must be known and currently connected/authenticated
- message must already conform to `${rule_identifier}::${message_content}`
```ts
registerRule(rule: string, processor: (message: string) => unknown): void
```
Rules:
- rejects `builtin`
- rejects duplicate rule registration unless explicit override support is added later
- processors are invoked with the final received string after any server-side rewrite
---
## 11. Hooks and Integration
## 11.1 Yonexus.Server Hooking
`Yonexus.Server` must register hooks so that when OpenClaw gateway starts:
- the WebSocket server is started
- the server registry is initialized
- builtin protocol handling is enabled
- heartbeat sweep begins
## 11.2 Yonexus.Client Behavior
`Yonexus.Client` must:
- connect outbound to `mainHost`
- manage local trust material
- handle pairing/authentication transitions
- emit heartbeats after authentication
- reconnect after disconnect with retry/backoff behavior
---
## 12. Storage Strategy
## 12.1 Yonexus.Server Storage
Server persists at minimum:
- identifier
- public key
- secret
- trust state
- pairing code + expiry if pairing is pending
- pairing notification metadata
- last known status
- metadata timestamps
May persist or reset on restart:
- recent nonces
- recent handshake attempts
Recommended v1:
- clear rolling security windows on restart
- keep long-lived trust records
## 12.2 Yonexus.Client Storage
Client persists at minimum:
- identifier
- private key
- secret
- optional last successful pair/auth metadata
Security notes:
- private key must never be sent to the server
- secret must be treated as sensitive material
- encryption-at-rest can be a future enhancement, but any plaintext local storage must be documented as a limitation if used initially
---
## 13. Error Handling
Structured errors should exist for at least:
- invalid configuration
- unauthorized identifier
- pairing required
- pairing expired
- pairing notification failure
- handshake verification failure
- replay/nonce collision detected
- unsafe handshake rate detected
- target client not connected
- duplicate rule registration
- reserved rule registration
- malformed message
---
## 14. Initial Implementation Phases
## Phase 0 — Protocol and Skeleton
- finalize split-plugin configuration schema
- define persistent data models
- define builtin protocol messages
- define startup hooks for both plugins
- define rule registry behavior
- define Discord DM notification flow
## Phase 1 — Transport MVP
- Yonexus.Server WebSocket server startup
- Yonexus.Client WebSocket client startup
- reconnect logic
- builtin protocol channel
- persistent registry/state scaffolding
## Phase 2 — Pairing and Authentication
- client keypair generation
- pairing request creation
- Discord DM notification to admin user
- pairing confirmation flow
- secret issuance and persistence
- signed proof verification
- nonce/replay protection
- unsafe-condition reset to pairing
## Phase 3 — Heartbeat and Status Tracking
- client heartbeat sender
- server heartbeat receiver
- periodic sweep
- status transitions: online / unstable / offline
- forced disconnect on offline
## Phase 4 — Public APIs and Dispatch
- `sendMessageToServer`
- `sendMessageToClient`
- `registerRule`
- first-match dispatch
- server-side sender rewrite behavior
## Phase 5 — Hardening and Docs
- integration tests
- failure-path coverage
- restart recovery checks
- protocol docs
- operator setup docs for server/client deployment
---
## 15. Non-Goals for Initial Version
Not required in the first version unless explicitly added later:
- direct client-to-client sockets
- multi-server clustering
- distributed consensus
- offline message queues or guaranteed delivery to disconnected clients
- advanced rule matching beyond exact string match
- message ordering guarantees across reconnects
- end-to-end payload encryption beyond the pairing/authentication requirements
- management UI
- admin-side approve/deny control plane beyond human relay of pairing codes
- encryption-at-rest hardening beyond documenting current local storage limitations
---
## 16. v1 Decisions Locked for Current Implementation
The following implementation-boundary decisions are now treated as settled for v1:
1. Signing algorithm default: Ed25519.
2. `mainHost` should be configured as a full `ws://` or `wss://` URL in v1.
3. Human relay of the pairing code is sufficient for v1; richer admin approve/deny control can wait.
4. `heartbeat_ack` remains optional.
5. Client reconnect uses exponential backoff.
6. Rule identifiers are exact-match strings only in v1.
7. Outbound sends to offline clients fail immediately rather than queueing.
## 17. Open Questions To Confirm Later
1. On unsafe condition, should the old public key be retained or should the client generate a new keypair?
2. Should future versions support explicit key rotation without full re-pairing?
3. Should offline clients support queued outbound messages from server in a later version?
4. Are richer admin approval workflows worth adding after v1 stabilizes?
5. Should encryption-at-rest become a hard requirement in v2?
---
## 18. Immediate Next Deliverables
After this plan, the next files to create should be:
- `FEAT.md` — feature checklist derived from this plan
- `README.md` — concise system overview for both plugins
- `plugin.server.json` or equivalent server plugin manifest
- `plugin.client.json` or equivalent client plugin manifest
- implementation task breakdown