From c8681af0acd36c955c1bd8ad81a14d02884a573d Mon Sep 17 00:00:00 2001 From: zhi Date: Wed, 11 Mar 2026 21:35:28 +0000 Subject: [PATCH] docs: add server connector plan --- docs/monitor-server-connector-plan.md | 112 ++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 docs/monitor-server-connector-plan.md diff --git a/docs/monitor-server-connector-plan.md b/docs/monitor-server-connector-plan.md new file mode 100644 index 0000000..398971f --- /dev/null +++ b/docs/monitor-server-connector-plan.md @@ -0,0 +1,112 @@ +# HarborForge OpenClaw Server Connector Plugin — Project Plan + +## 1) Goal +Provide a secure, lightweight plugin/agent that connects servers to HarborForge Monitor, streams telemetry in real time, and falls back to HTTP heartbeat when WebSocket is unavailable. + +## 2) Scope +- **Handshake + auth** using backend-issued challenge + RSA-OAEP encrypted payload. +- **WebSocket telemetry** to `/monitor/server/ws`. +- **HTTP heartbeat** to `/monitor/server/heartbeat` as fallback. +- **System metrics**: CPU/Mem/Disk/Swap/Uptime/OpenClaw version/Agents list. +- **Retry & backoff**, offline handling, and minimal local state. + +## 3) Non-Goals +- No UI in the plugin. +- No provider billing calls from plugin. +- No multi-tenant auth beyond challenge + server identifier. + +## 4) Architecture +``` +plugin/ + config/ # load config & secrets + crypto/ # RSA-OAEP encrypt/decrypt helpers + collector/ # system + openclaw metrics + transport/ # ws + http heartbeat + state/ # retry/backoff, last sent, cache + main.ts|py # entry +``` + +### 4.1 Config +- `backend_url` +- `identifier` +- `challenge_uuid` +- `report_interval_sec` (default: 20-30s) +- `http_fallback_interval_sec` (default: 60s) +- `log_level` + +### 4.2 Security +- Fetch public key: `GET /monitor/public/server-public-key` +- Encrypt payload with RSA-OAEP +- Include `nonce` + `ts` (UTC) to prevent replay +- **Challenge valid**: 10 minutes +- **Offline threshold**: 7 minutes + +## 5) Communication Flow +### 5.1 Handshake (WS) +1. Plugin reads `identifier + challenge_uuid`. +2. Fetch RSA public key. +3. Encrypt payload: `{identifier, challenge_uuid, nonce, ts}`. +4. Connect WS `/monitor/server/ws` and send `encrypted_payload`. +5. On success: begin periodic telemetry push. + +### 5.2 Fallback (HTTP) +If WS fails: +- POST telemetry to `/monitor/server/heartbeat` with same payload fields. +- Retry with exponential backoff (cap 5–10 min). + +## 6) Telemetry Schema (example) +``` +{ + identifier: "vps.t1", + openclaw_version: "x.y.z", + cpu_pct: 12.5, + mem_pct: 41.2, + disk_pct: 62.0, + swap_pct: 0.0, + agents: [ { id: "a1", name: "agent", status: "running" } ], + last_seen_at: "2026-03-11T21:00:00Z" +} +``` + +## 7) Reliability +- Automatic reconnect on WS drop +- HTTP fallback if WS unavailable > 2 intervals +- Exponential backoff on failures +- Local cache for last successful payload + +## 8) Deployment Options +- **Systemd service** (preferred for VPS) +- **Docker container** (optional) +- Single-binary build if using Go/Rust + +## 9) Milestones +**M1 – POC (2–3 days)** +- CLI config loader + HTTP heartbeat +- See online + metrics in Monitor + +**M2 – WS realtime (2–3 days)** +- Full handshake + WS streaming +- Reconnect & fallback logic + +**M3 – Packaging (1–2 days)** +- systemd unit + sample config +- installation script + +**M4 – Hardening & Docs (1–2 days)** +- logging, metrics, docs +- troubleshooting guide + +## 10) Deliverables +- Plugin source +- Config template + systemd unit +- Integration docs +- Test script + example payloads + +## 11) Open Questions +- Preferred language (Go/Python/Node/Rust)? +- How to read OpenClaw agent list (API vs local state)? +- Required log format / retention? + +--- + +**Next step:** confirm preferred runtime (Go/Python/Node) and I will scaffold the project structure + first heartbeat implementation.