fix(telemetry): discover agents from ~/.openclaw/agents and clean docs

- Fallback to agent directory discovery when agents.json is absent
- Count configured agent workspaces (excluding main)
- Rewrite plugin docs to API key-only flow
This commit is contained in:
zhi
2026-03-20 08:02:19 +00:00
parent 040cde8cad
commit ab42936408
3 changed files with 69 additions and 115 deletions

View File

@@ -1,112 +1,47 @@
# HarborForge OpenClaw Server Connector Plugin — Project Plan
# Monitor Server Connector Plan
## 1) Goal
Provide a secure, lightweight plugin/agent that connects servers to HarborForge Monitor, streams telemetry in real time, and falls back to HTTP heartbeat when WebSocket is unavailable.
## Current design
## 2) Scope
- **Handshake + auth** using backend-issued challenge + RSA-OAEP encrypted payload.
- **WebSocket telemetry** to `/monitor/server/ws`.
- **HTTP heartbeat** to `/monitor/server/heartbeat` as fallback.
- **System metrics**: CPU/Mem/Disk/Swap/Uptime/OpenClaw version/Agents list.
- **Retry & backoff**, offline handling, and minimal local state.
The plugin uses:
## 3) Non-Goals
- No UI in the plugin.
- No provider billing calls from plugin.
- No multi-tenant auth beyond challenge + server identifier.
- **HTTP heartbeat** to `/monitor/server/heartbeat-v2`
- **API Key authentication** via `X-API-Key`
- **Gateway lifecycle hooks**: `gateway_start` / `gateway_stop`
## 4) Architecture
```
plugin/
config/ # load config & secrets
crypto/ # RSA-OAEP encrypt/decrypt helpers
collector/ # system + openclaw metrics
transport/ # ws + http heartbeat
state/ # retry/backoff, last sent, cache
main.ts|py # entry
```
## No longer used
### 4.1 Config
- `backend_url`
- `identifier`
- `challenge_uuid`
- `report_interval_sec` (default: 20-30s)
- `http_fallback_interval_sec` (default: 60s)
- `log_level`
The following design has been retired:
### 4.2 Security
- Fetch public key: `GET /monitor/public/server-public-key`
- Encrypt payload with RSA-OAEP
- Include `nonce` + `ts` (UTC) to prevent replay
- **Challenge valid**: 10 minutes
- **Offline threshold**: 7 minutes
- challenge UUID
- RSA public key fetch
- encrypted handshake payload
- WebSocket telemetry
## 5) Communication Flow
### 5.1 Handshake (WS)
1. Plugin reads `identifier + challenge_uuid`.
2. Fetch RSA public key.
3. Encrypt payload: `{identifier, challenge_uuid, nonce, ts}`.
4. Connect WS `/monitor/server/ws` and send `encrypted_payload`.
5. On success: begin periodic telemetry push.
## Runtime flow
### 5.2 Fallback (HTTP)
If WS fails:
- POST telemetry to `/monitor/server/heartbeat` with same payload fields.
- Retry with exponential backoff (cap 510 min).
1. Gateway loads `harborforge-monitor`
2. Plugin reads config from OpenClaw plugin config
3. On `gateway_start`, plugin launches `server/telemetry.mjs`
4. Sidecar collects:
- system metrics
- OpenClaw version
- plugin version
- configured agents
5. Sidecar posts telemetry to backend with `X-API-Key`
## 6) Telemetry Schema (example)
```
## Payload
```json
{
identifier: "vps.t1",
openclaw_version: "x.y.z",
cpu_pct: 12.5,
mem_pct: 41.2,
disk_pct: 62.0,
swap_pct: 0.0,
agents: [ { id: "a1", name: "agent", status: "running" } ],
last_seen_at: "2026-03-11T21:00:00Z"
"identifier": "vps.t1",
"openclaw_version": "OpenClaw 2026.3.13 (61d171a)",
"plugin_version": "0.1.0",
"agents": [],
"cpu_pct": 10.5,
"mem_pct": 52.1,
"disk_pct": 81.0,
"swap_pct": 0.0,
"load_avg": [0.12, 0.09, 0.03],
"uptime_seconds": 12345
}
```
## 7) Reliability
- Automatic reconnect on WS drop
- HTTP fallback if WS unavailable > 2 intervals
- Exponential backoff on failures
- Local cache for last successful payload
## 8) Deployment Options
- **Systemd service** (preferred for VPS)
- **Docker container** (optional)
- Single-binary build if using Go/Rust
## 9) Milestones
**M1 POC (23 days)**
- CLI config loader + HTTP heartbeat
- See online + metrics in Monitor
**M2 WS realtime (23 days)**
- Full handshake + WS streaming
- Reconnect & fallback logic
**M3 Packaging (12 days)**
- systemd unit + sample config
- installation script
**M4 Hardening & Docs (12 days)**
- logging, metrics, docs
- troubleshooting guide
## 10) Deliverables
- Plugin source
- Config template + systemd unit
- Integration docs
- Test script + example payloads
## 11) Open Questions
- Preferred language (Go/Python/Node/Rust)?
- How to read OpenClaw agent list (API vs local state)?
- Required log format / retention?
---
**Next step:** confirm preferred runtime (Go/Python/Node) and I will scaffold the project structure + first heartbeat implementation.