docs: add server connector plan
This commit is contained in:
112
docs/monitor-server-connector-plan.md
Normal file
112
docs/monitor-server-connector-plan.md
Normal file
@@ -0,0 +1,112 @@
|
|||||||
|
# HarborForge OpenClaw Server Connector Plugin — Project Plan
|
||||||
|
|
||||||
|
## 1) Goal
|
||||||
|
Provide a secure, lightweight plugin/agent that connects servers to HarborForge Monitor, streams telemetry in real time, and falls back to HTTP heartbeat when WebSocket is unavailable.
|
||||||
|
|
||||||
|
## 2) Scope
|
||||||
|
- **Handshake + auth** using backend-issued challenge + RSA-OAEP encrypted payload.
|
||||||
|
- **WebSocket telemetry** to `/monitor/server/ws`.
|
||||||
|
- **HTTP heartbeat** to `/monitor/server/heartbeat` as fallback.
|
||||||
|
- **System metrics**: CPU/Mem/Disk/Swap/Uptime/OpenClaw version/Agents list.
|
||||||
|
- **Retry & backoff**, offline handling, and minimal local state.
|
||||||
|
|
||||||
|
## 3) Non-Goals
|
||||||
|
- No UI in the plugin.
|
||||||
|
- No provider billing calls from plugin.
|
||||||
|
- No multi-tenant auth beyond challenge + server identifier.
|
||||||
|
|
||||||
|
## 4) Architecture
|
||||||
|
```
|
||||||
|
plugin/
|
||||||
|
config/ # load config & secrets
|
||||||
|
crypto/ # RSA-OAEP encrypt/decrypt helpers
|
||||||
|
collector/ # system + openclaw metrics
|
||||||
|
transport/ # ws + http heartbeat
|
||||||
|
state/ # retry/backoff, last sent, cache
|
||||||
|
main.ts|py # entry
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.1 Config
|
||||||
|
- `backend_url`
|
||||||
|
- `identifier`
|
||||||
|
- `challenge_uuid`
|
||||||
|
- `report_interval_sec` (default: 20-30s)
|
||||||
|
- `http_fallback_interval_sec` (default: 60s)
|
||||||
|
- `log_level`
|
||||||
|
|
||||||
|
### 4.2 Security
|
||||||
|
- Fetch public key: `GET /monitor/public/server-public-key`
|
||||||
|
- Encrypt payload with RSA-OAEP
|
||||||
|
- Include `nonce` + `ts` (UTC) to prevent replay
|
||||||
|
- **Challenge valid**: 10 minutes
|
||||||
|
- **Offline threshold**: 7 minutes
|
||||||
|
|
||||||
|
## 5) Communication Flow
|
||||||
|
### 5.1 Handshake (WS)
|
||||||
|
1. Plugin reads `identifier + challenge_uuid`.
|
||||||
|
2. Fetch RSA public key.
|
||||||
|
3. Encrypt payload: `{identifier, challenge_uuid, nonce, ts}`.
|
||||||
|
4. Connect WS `/monitor/server/ws` and send `encrypted_payload`.
|
||||||
|
5. On success: begin periodic telemetry push.
|
||||||
|
|
||||||
|
### 5.2 Fallback (HTTP)
|
||||||
|
If WS fails:
|
||||||
|
- POST telemetry to `/monitor/server/heartbeat` with same payload fields.
|
||||||
|
- Retry with exponential backoff (cap 5–10 min).
|
||||||
|
|
||||||
|
## 6) Telemetry Schema (example)
|
||||||
|
```
|
||||||
|
{
|
||||||
|
identifier: "vps.t1",
|
||||||
|
openclaw_version: "x.y.z",
|
||||||
|
cpu_pct: 12.5,
|
||||||
|
mem_pct: 41.2,
|
||||||
|
disk_pct: 62.0,
|
||||||
|
swap_pct: 0.0,
|
||||||
|
agents: [ { id: "a1", name: "agent", status: "running" } ],
|
||||||
|
last_seen_at: "2026-03-11T21:00:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 7) Reliability
|
||||||
|
- Automatic reconnect on WS drop
|
||||||
|
- HTTP fallback if WS unavailable > 2 intervals
|
||||||
|
- Exponential backoff on failures
|
||||||
|
- Local cache for last successful payload
|
||||||
|
|
||||||
|
## 8) Deployment Options
|
||||||
|
- **Systemd service** (preferred for VPS)
|
||||||
|
- **Docker container** (optional)
|
||||||
|
- Single-binary build if using Go/Rust
|
||||||
|
|
||||||
|
## 9) Milestones
|
||||||
|
**M1 – POC (2–3 days)**
|
||||||
|
- CLI config loader + HTTP heartbeat
|
||||||
|
- See online + metrics in Monitor
|
||||||
|
|
||||||
|
**M2 – WS realtime (2–3 days)**
|
||||||
|
- Full handshake + WS streaming
|
||||||
|
- Reconnect & fallback logic
|
||||||
|
|
||||||
|
**M3 – Packaging (1–2 days)**
|
||||||
|
- systemd unit + sample config
|
||||||
|
- installation script
|
||||||
|
|
||||||
|
**M4 – Hardening & Docs (1–2 days)**
|
||||||
|
- logging, metrics, docs
|
||||||
|
- troubleshooting guide
|
||||||
|
|
||||||
|
## 10) Deliverables
|
||||||
|
- Plugin source
|
||||||
|
- Config template + systemd unit
|
||||||
|
- Integration docs
|
||||||
|
- Test script + example payloads
|
||||||
|
|
||||||
|
## 11) Open Questions
|
||||||
|
- Preferred language (Go/Python/Node/Rust)?
|
||||||
|
- How to read OpenClaw agent list (API vs local state)?
|
||||||
|
- Required log format / retention?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Next step:** confirm preferred runtime (Go/Python/Node) and I will scaffold the project structure + first heartbeat implementation.
|
||||||
Reference in New Issue
Block a user