Files
HarborForge.OpenclawPlugin/docs/monitor-server-connector-plan.md
2026-03-11 21:35:28 +00:00

113 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HarborForge OpenClaw Server Connector Plugin — Project Plan
## 1) Goal
Provide a secure, lightweight plugin/agent that connects servers to HarborForge Monitor, streams telemetry in real time, and falls back to HTTP heartbeat when WebSocket is unavailable.
## 2) Scope
- **Handshake + auth** using backend-issued challenge + RSA-OAEP encrypted payload.
- **WebSocket telemetry** to `/monitor/server/ws`.
- **HTTP heartbeat** to `/monitor/server/heartbeat` as fallback.
- **System metrics**: CPU/Mem/Disk/Swap/Uptime/OpenClaw version/Agents list.
- **Retry & backoff**, offline handling, and minimal local state.
## 3) Non-Goals
- No UI in the plugin.
- No provider billing calls from plugin.
- No multi-tenant auth beyond challenge + server identifier.
## 4) Architecture
```
plugin/
config/ # load config & secrets
crypto/ # RSA-OAEP encrypt/decrypt helpers
collector/ # system + openclaw metrics
transport/ # ws + http heartbeat
state/ # retry/backoff, last sent, cache
main.ts|py # entry
```
### 4.1 Config
- `backend_url`
- `identifier`
- `challenge_uuid`
- `report_interval_sec` (default: 20-30s)
- `http_fallback_interval_sec` (default: 60s)
- `log_level`
### 4.2 Security
- Fetch public key: `GET /monitor/public/server-public-key`
- Encrypt payload with RSA-OAEP
- Include `nonce` + `ts` (UTC) to prevent replay
- **Challenge valid**: 10 minutes
- **Offline threshold**: 7 minutes
## 5) Communication Flow
### 5.1 Handshake (WS)
1. Plugin reads `identifier + challenge_uuid`.
2. Fetch RSA public key.
3. Encrypt payload: `{identifier, challenge_uuid, nonce, ts}`.
4. Connect WS `/monitor/server/ws` and send `encrypted_payload`.
5. On success: begin periodic telemetry push.
### 5.2 Fallback (HTTP)
If WS fails:
- POST telemetry to `/monitor/server/heartbeat` with same payload fields.
- Retry with exponential backoff (cap 510 min).
## 6) Telemetry Schema (example)
```
{
identifier: "vps.t1",
openclaw_version: "x.y.z",
cpu_pct: 12.5,
mem_pct: 41.2,
disk_pct: 62.0,
swap_pct: 0.0,
agents: [ { id: "a1", name: "agent", status: "running" } ],
last_seen_at: "2026-03-11T21:00:00Z"
}
```
## 7) Reliability
- Automatic reconnect on WS drop
- HTTP fallback if WS unavailable > 2 intervals
- Exponential backoff on failures
- Local cache for last successful payload
## 8) Deployment Options
- **Systemd service** (preferred for VPS)
- **Docker container** (optional)
- Single-binary build if using Go/Rust
## 9) Milestones
**M1 POC (23 days)**
- CLI config loader + HTTP heartbeat
- See online + metrics in Monitor
**M2 WS realtime (23 days)**
- Full handshake + WS streaming
- Reconnect & fallback logic
**M3 Packaging (12 days)**
- systemd unit + sample config
- installation script
**M4 Hardening & Docs (12 days)**
- logging, metrics, docs
- troubleshooting guide
## 10) Deliverables
- Plugin source
- Config template + systemd unit
- Integration docs
- Test script + example payloads
## 11) Open Questions
- Preferred language (Go/Python/Node/Rust)?
- How to read OpenClaw agent list (API vs local state)?
- Required log format / retention?
---
**Next step:** confirm preferred runtime (Go/Python/Node) and I will scaffold the project structure + first heartbeat implementation.