Files
HarborForge.OpenclawPlugin/docs/monitor-server-connector-plan.md
2026-03-11 21:35:28 +00:00

3.2 KiB
Raw Blame History

HarborForge OpenClaw Server Connector Plugin — Project Plan

1) Goal

Provide a secure, lightweight plugin/agent that connects servers to HarborForge Monitor, streams telemetry in real time, and falls back to HTTP heartbeat when WebSocket is unavailable.

2) Scope

  • Handshake + auth using backend-issued challenge + RSA-OAEP encrypted payload.
  • WebSocket telemetry to /monitor/server/ws.
  • HTTP heartbeat to /monitor/server/heartbeat as fallback.
  • System metrics: CPU/Mem/Disk/Swap/Uptime/OpenClaw version/Agents list.
  • Retry & backoff, offline handling, and minimal local state.

3) Non-Goals

  • No UI in the plugin.
  • No provider billing calls from plugin.
  • No multi-tenant auth beyond challenge + server identifier.

4) Architecture

plugin/
  config/          # load config & secrets
  crypto/          # RSA-OAEP encrypt/decrypt helpers
  collector/       # system + openclaw metrics
  transport/       # ws + http heartbeat
  state/           # retry/backoff, last sent, cache
  main.ts|py        # entry

4.1 Config

  • backend_url
  • identifier
  • challenge_uuid
  • report_interval_sec (default: 20-30s)
  • http_fallback_interval_sec (default: 60s)
  • log_level

4.2 Security

  • Fetch public key: GET /monitor/public/server-public-key
  • Encrypt payload with RSA-OAEP
  • Include nonce + ts (UTC) to prevent replay
  • Challenge valid: 10 minutes
  • Offline threshold: 7 minutes

5) Communication Flow

5.1 Handshake (WS)

  1. Plugin reads identifier + challenge_uuid.
  2. Fetch RSA public key.
  3. Encrypt payload: {identifier, challenge_uuid, nonce, ts}.
  4. Connect WS /monitor/server/ws and send encrypted_payload.
  5. On success: begin periodic telemetry push.

5.2 Fallback (HTTP)

If WS fails:

  • POST telemetry to /monitor/server/heartbeat with same payload fields.
  • Retry with exponential backoff (cap 510 min).

6) Telemetry Schema (example)

{
  identifier: "vps.t1",
  openclaw_version: "x.y.z",
  cpu_pct: 12.5,
  mem_pct: 41.2,
  disk_pct: 62.0,
  swap_pct: 0.0,
  agents: [ { id: "a1", name: "agent", status: "running" } ],
  last_seen_at: "2026-03-11T21:00:00Z"
}

7) Reliability

  • Automatic reconnect on WS drop
  • HTTP fallback if WS unavailable > 2 intervals
  • Exponential backoff on failures
  • Local cache for last successful payload

8) Deployment Options

  • Systemd service (preferred for VPS)
  • Docker container (optional)
  • Single-binary build if using Go/Rust

9) Milestones

M1 POC (23 days)

  • CLI config loader + HTTP heartbeat
  • See online + metrics in Monitor

M2 WS realtime (23 days)

  • Full handshake + WS streaming
  • Reconnect & fallback logic

M3 Packaging (12 days)

  • systemd unit + sample config
  • installation script

M4 Hardening & Docs (12 days)

  • logging, metrics, docs
  • troubleshooting guide

10) Deliverables

  • Plugin source
  • Config template + systemd unit
  • Integration docs
  • Test script + example payloads

11) Open Questions

  • Preferred language (Go/Python/Node/Rust)?
  • How to read OpenClaw agent list (API vs local state)?
  • Required log format / retention?

Next step: confirm preferred runtime (Go/Python/Node) and I will scaffold the project structure + first heartbeat implementation.