Compare commits

...

3 Commits

Author SHA1 Message Date
db986e6bf4 Merge docs/readme-refresh into main
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:55:56 +01:00
e5df1eba0d docs: refresh README — accuracy pass + HarborForge platform context
Verified against current code; fixed stale/inaccurate sections and
documented previously-undocumented features/flags/endpoints. Added a
"Part of the HarborForge platform" reference and role/port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:50:01 +01:00
zhi
e136f1b290 fix: correct telemetry identifier and visibility when containerized
Three related fixes for running Monitor inside a container with
/:/host:ro bind-mounted and network_mode: host.

* config: when HF_MONITER_ROOTFS is set, read the default identifier
  from <rootFS>/etc/hostname instead of os.Hostname(). Under
  network_mode: host the UTS namespace is not shared, so os.Hostname()
  returns a random docker-assigned string that changes across
  recreations, causing the backend to treat each restart as a new
  server.

* telemetry: log gopsutil errors from BuildPayload instead of silently
  swallowing them. Previously a missing /host mount would send a
  payload full of zeroed fields with no indication of failure.

* docker-compose: drop the 'ports:' block. It is silently ignored
  under network_mode: host (the bridge server binds directly on the
  host's 127.0.0.1:MONITOR_PORT).
2026-04-15 23:02:44 +00:00
4 changed files with 138 additions and 86 deletions

146
README.md
View File

@@ -1,46 +1,54 @@
# HarborForge.Monitor # HarborForge.Monitor
轻量级 Go 遥测客户端,用于把服务器硬件状态上报到 HarborForge Monitor。 Lightweight Go telemetry client that reports server hardware status to the HarborForge backend.
它**不依赖 OpenClaw**,适合普通 Linux 主机、VPS、Nginx 机器等。 Part of the [HarborForge](../README.md) platform.
## 采集内容 - Role: standalone telemetry agent; **does not depend on OpenClaw**, suitable for plain Linux hosts, VPS, Nginx boxes, etc.
- Reports to the HarborForge backend (`POST /monitor/server/heartbeat`).
- Optional local bridge HTTP server on `127.0.0.1:<MONITOR_PORT>` (default port `9100`) for the HarborForge OpenClaw plugin.
- CPU 使用率 ## Collected Metrics
- 内存使用率
- 磁盘使用率
- Swap 使用率
- Load Average
- Uptime
- Nginx 是否安装
- `/etc/nginx/sites-enabled` 列表
## 上报接口 - CPU usage (`cpu_pct`)
- Memory usage (`mem_pct`)
- Disk usage (`disk_pct`, for the root / `rootFs` filesystem)
- Swap usage (`swap_pct`)
- Load average (`load_avg` — 1/5/15 min)
- Uptime (`uptime_seconds`)
- Nginx installed (`nginx_installed`)
- `/etc/nginx/sites-enabled` listing (`nginx_sites`)
客户端调用: When OpenClaw metadata has been pushed to the bridge, heartbeats are additionally enriched with `openclaw_version`, `plugin_version`, and `agents`.
- `POST /monitor/server/heartbeat` ## Reporting Endpoint
- Header: `X-API-Key`
## 项目结构 The client sends:
- `POST <backendUrl>/monitor/server/heartbeat`
- Header: `X-API-Key: <apiKey>`
- JSON body: the telemetry payload described above
## Project Structure
```text ```text
HarborForge.Monitor/ HarborForge.Monitor/
├── cmd/harborforge-monitor/ # 程序入口 ├── cmd/harborforge-monitor/ # Program entry point (main.go)
├── internal/config/ # 配置加载 ├── internal/config/ # Config loading (file + env + flags)
├── internal/telemetry/ # 指标采集与上报 ├── internal/telemetry/ # Metric collection and reporting
├── internal/bridge/ # MONITOR_PORT 本地桥接服务 ├── internal/bridge/ # Local MONITOR_PORT bridge server
├── Dockerfile # 容器化运行 ├── systemd/ # systemd unit file
├── docker-compose.yml # Docker Compose 配置 ├── Dockerfile # Container build
├── docker-compose.yml # Docker Compose configuration
├── config.example.json ├── config.example.json
└── README.md └── README.md
``` ```
## 配置 ## Configuration
先在 HarborForge Monitor 中注册服务器并生成 API Key First register the server in the HarborForge backend and generate an API Key.
然后准备配置文件,例如 `/etc/harborforge-monitor/config.json`: Then prepare a config file, e.g. `/etc/harborforge-monitor/config.json`:
```json ```json
{ {
@@ -49,51 +57,61 @@ HarborForge.Monitor/
"apiKey": "your-api-key", "apiKey": "your-api-key",
"reportIntervalSec": 30, "reportIntervalSec": 30,
"logLevel": "info", "logLevel": "info",
"rootFs": "/host",
"monitorPort": 9100 "monitorPort": 9100
} }
``` ```
也支持环境变量覆盖。为了兼容你的命名,这里优先支持: Resolution order is: defaults → config file → environment variables → command-line flags (later wins). `apiKey` is required; the process exits if it is empty.
- `HF_MONITER_BACKEND_URL` ### Environment Variables
- `HF_MONITER_IDENTIFIER`
- `HF_MONITER_API_KEY`
- `HF_MONITER_REPORT_INTERVAL`
- `HF_MONITER_LOG_LEVEL`
- `HF_MONITER_ROOTFS`
同时也兼容旧的/正确拼写的 `HF_MONITOR_*` 变量名。 Both the (intentionally compatible) `HF_MONITER_*` spelling and the `HF_MONITOR_*` spelling are accepted; `HF_MONITER_*` is checked first.
### MONITOR_PORT — 插件桥接端口 | Variable | Purpose | Default |
|----------|---------|---------|
| `HF_MONITER_BACKEND_URL` / `HF_MONITOR_BACKEND_URL` | Backend base URL | `https://monitor.hangman-lab.top` |
| `HF_MONITER_IDENTIFIER` / `HF_MONITOR_IDENTIFIER` | Server identifier | hostname |
| `HF_MONITER_API_KEY` / `HF_MONITOR_API_KEY` | Server API key | (required) |
| `HF_MONITER_REPORT_INTERVAL` / `HF_MONITOR_REPORT_INTERVAL` | Report interval (seconds) | `30` |
| `HF_MONITER_LOG_LEVEL` / `HF_MONITOR_LOG_LEVEL` | Log level | `info` |
| `HF_MONITER_ROOTFS` / `HF_MONITOR_ROOTFS` | Host root filesystem mount (for container use) | (empty) |
| `MONITOR_PORT` / `HF_MONITOR_PORT` | Local bridge port (`0` = disabled) | `0` |
`MONITOR_PORT` 设置为大于 0 的值时Monitor 会在 `127.0.0.1:<MONITOR_PORT>` 上启动一个本地 HTTP 服务,供 HarborForge OpenClaw 插件查询遥测数据。 When `rootFs` is set, `HOST_PROC` / `HOST_SYS` / `HOST_ETC` / `HOST_VAR` / `HOST_RUN` are derived from it (if not already set) so that gopsutil reads host metrics instead of the container's.
支持的端点: ### Command-line Flags
| 端点 | 说明 | ```text
|------|------| -config string Path to config file (default "/etc/harborforge-monitor/config.json")
| `GET /health` | 健康检查,返回 Monitor 版本和标识符 | -once Collect and send telemetry once, then exit
| `GET /telemetry` | 返回最新的遥测数据快照 | -print-payload Print the payload JSON before sending
| `POST /openclaw` | 接收 OpenClaw 插件推送的元数据(版本、代理等) | -dry-run Collect telemetry but do not send it
-version Print version and exit
-backend-url string Override backend URL
-identifier string Override identifier
-api-key string Override API key
-report-interval int Override report interval in seconds
-log-level string Override log level
-rootfs string Override root filesystem path
-monitor-port int Override monitor bridge port
```
### OpenClaw 元数据 enrichment ### MONITOR_PORT — Plugin Bridge
当 OpenClaw 插件通过 `POST /openclaw` 推送元数据后Monitor 会在后续的心跳上报中自动将这些信息附加到遥测数据中: When `MONITOR_PORT` (or `monitorPort`) is greater than 0, Monitor starts a local HTTP server on `127.0.0.1:<MONITOR_PORT>` for the HarborForge OpenClaw plugin to query telemetry.
- `openclaw_version` — OpenClaw 运行时版本 | Endpoint | Method | Description |
- `plugin_version` — 插件版本 |----------|--------|-------------|
- `agents` — 代理列表 | `/health` | `GET` | Health check; returns status, `monitor_version`, and `identifier` |
| `/telemetry` | `GET` | Returns the latest cached telemetry snapshot |
| `/openclaw` | `POST` | Receives OpenClaw metadata (version, plugin version, agents) from the plugin |
如果插件从未推送过元数据,这些字段会被省略,心跳上报完全不受影响。 After the plugin pushes metadata via `POST /openclaw`, Monitor attaches `openclaw_version`, `plugin_version`, and `agents` to subsequent heartbeats. If the plugin never pushes metadata, these fields are omitted and heartbeats are unaffected.
**重要**:桥接端口是可选的。如果 `MONITOR_PORT` 0 或未设置桥接服务不会启动Monitor 的心跳上报功能完全不受影响。即使桥接服务启动失败,心跳上报也会继续正常工作。 **Important:** the bridge is optional. If `MONITOR_PORT` is 0 or unset, the bridge does not start. Even if the bridge fails to start, heartbeat reporting continues normally (bridge errors are logged as non-fatal).
环境变量: ## Local Development
- `MONITOR_PORT` — 首选
- `HF_MONITOR_PORT` — 备选
## 本地开发
```bash ```bash
go mod tidy go mod tidy
@@ -101,29 +119,29 @@ go build ./cmd/harborforge-monitor
./harborforge-monitor -config ./config.example.json -dry-run -once ./harborforge-monitor -config ./config.example.json -dry-run -once
``` ```
## Docker 运行 ## Docker
构建镜像: Build the image:
```bash ```bash
docker build -t harborforge-monitor . docker build -t harborforge-monitor .
``` ```
### 使用 Docker Compose ### Docker Compose
```bash ```bash
# 设置环境变量
export HF_IDENTIFIER=my-server export HF_IDENTIFIER=my-server
export HF_API_KEY=your-api-key export HF_API_KEY=your-api-key
export MONITOR_PORT=9100 export MONITOR_PORT=9100
# 启动
docker compose up -d docker compose up -d
``` ```
### 手动 Docker 运行 The compose file runs with `network_mode: host` and mounts the host root filesystem read-only at `/host` (the image defaults `HF_MONITER_ROOTFS=/host`).
推荐以**宿主机 rootfs 只读挂载**方式运行,这样容器里采集到的是宿主机信息而不是容器自身: ### Manual Docker Run
Run with the **host rootfs mounted read-only** so the container collects host metrics instead of its own:
```bash ```bash
docker run -d \ docker run -d \
@@ -141,14 +159,14 @@ docker run -d \
## systemd ## systemd
也可以直接用 systemd 运行编译好的二进制: You can also run the compiled binary via systemd:
```bash ```bash
# 编译
go build -o /usr/local/bin/harborforge-monitor ./cmd/harborforge-monitor go build -o /usr/local/bin/harborforge-monitor ./cmd/harborforge-monitor
# 复制 systemd unit (见 systemd/ 目录)
cp systemd/harborforge-monitor.service /etc/systemd/system/ cp systemd/harborforge-monitor.service /etc/systemd/system/
systemctl daemon-reload systemctl daemon-reload
systemctl enable --now harborforge-monitor systemctl enable --now harborforge-monitor
``` ```
The provided unit runs `harborforge-monitor -config /etc/harborforge-monitor/config.json` as `root` with `Restart=always`.

View File

@@ -15,8 +15,8 @@ services:
- MONITOR_PORT=${MONITOR_PORT:-0} - MONITOR_PORT=${MONITOR_PORT:-0}
volumes: volumes:
- /:/host:ro - /:/host:ro
ports: # network_mode: host shares the host network namespace, so the bridge
# Expose MONITOR_PORT on 127.0.0.1 only for plugin communication. # server (if MONITOR_PORT > 0) listens directly on the host's
# Only active when MONITOR_PORT > 0. # 127.0.0.1:<MONITOR_PORT>. `ports:` is ignored under network_mode:
- "127.0.0.1:${MONITOR_PORT:-9100}:${MONITOR_PORT:-9100}" # host, so it is intentionally omitted.
network_mode: host network_mode: host

View File

@@ -5,6 +5,7 @@ import (
"fmt" "fmt"
"os" "os"
"path/filepath" "path/filepath"
"strings"
) )
type Config struct { type Config struct {
@@ -32,9 +33,19 @@ func Load(path string) (Config, error) {
} }
func LoadWithOverrides(path string, overrides Overrides) (Config, error) { func LoadWithOverrides(path string, overrides Overrides) (Config, error) {
// If running inside a container with the host FS bind-mounted, prefer
// the host's /etc/hostname for the default identifier. The container's
// own os.Hostname() is a docker-assigned random string under
// network_mode: host (UTS namespace is not shared).
rootFSEarly := getenvAny([]string{"HF_MONITER_ROOTFS", "HF_MONITOR_ROOTFS"}, "")
defaultIdentifier := hostHostname(rootFSEarly)
if defaultIdentifier == "" {
defaultIdentifier = hostnameOr("unknown-host")
}
cfg := Config{ cfg := Config{
BackendURL: getenvAny([]string{"HF_MONITER_BACKEND_URL", "HF_MONITOR_BACKEND_URL"}, "https://monitor.hangman-lab.top"), BackendURL: getenvAny([]string{"HF_MONITER_BACKEND_URL", "HF_MONITOR_BACKEND_URL"}, "https://monitor.hangman-lab.top"),
Identifier: getenvAny([]string{"HF_MONITER_IDENTIFIER", "HF_MONITOR_IDENTIFIER"}, hostnameOr("unknown-host")), Identifier: getenvAny([]string{"HF_MONITER_IDENTIFIER", "HF_MONITOR_IDENTIFIER"}, defaultIdentifier),
APIKey: getenvAny([]string{"HF_MONITER_API_KEY", "HF_MONITOR_API_KEY"}, ""), APIKey: getenvAny([]string{"HF_MONITER_API_KEY", "HF_MONITOR_API_KEY"}, ""),
ReportIntervalSec: getenvIntAny([]string{"HF_MONITER_REPORT_INTERVAL", "HF_MONITOR_REPORT_INTERVAL"}, 30), ReportIntervalSec: getenvIntAny([]string{"HF_MONITER_REPORT_INTERVAL", "HF_MONITOR_REPORT_INTERVAL"}, 30),
LogLevel: getenvAny([]string{"HF_MONITER_LOG_LEVEL", "HF_MONITOR_LOG_LEVEL"}, "info"), LogLevel: getenvAny([]string{"HF_MONITER_LOG_LEVEL", "HF_MONITOR_LOG_LEVEL"}, "info"),
@@ -153,11 +164,25 @@ func getenvIntAny(keys []string, fallback int) int {
} }
func hostnameOr(fallback string) string { func hostnameOr(fallback string) string {
name, err := os.Hostname() if name, err := os.Hostname(); err == nil && name != "" {
if err != nil || name == "" { return name
return fallback
} }
return name return fallback
}
// hostHostname reads the hostname from <rootFS>/etc/hostname. Used when
// Monitor runs inside a container and wants the host's hostname rather
// than the container's UTS namespace hostname (which docker randomizes
// unless hostname: is set).
func hostHostname(rootFS string) string {
if rootFS == "" {
return ""
}
buf, err := os.ReadFile(filepath.Join(rootFS, "etc", "hostname"))
if err != nil {
return ""
}
return strings.TrimSpace(string(buf))
} }
func applyHostFSEnv(rootFS string) { func applyHostFSEnv(rootFS string) {

View File

@@ -4,6 +4,7 @@ import (
"context" "context"
"encoding/json" "encoding/json"
"fmt" "fmt"
"log"
"net/http" "net/http"
"os" "os"
"os/exec" "os/exec"
@@ -50,12 +51,15 @@ func BuildPayload(ctx context.Context, cfg config.Config) (Payload, error) {
} }
cpuPct, err := cpu.PercentWithContext(ctx, time.Second, false) cpuPct, err := cpu.PercentWithContext(ctx, time.Second, false)
if err == nil && len(cpuPct) > 0 { if err != nil {
log.Printf("telemetry: cpu.Percent failed: %v", err)
} else if len(cpuPct) > 0 {
payload.CPUPct = round1(cpuPct[0]) payload.CPUPct = round1(cpuPct[0])
} }
vm, err := mem.VirtualMemoryWithContext(ctx) if vm, err := mem.VirtualMemoryWithContext(ctx); err != nil {
if err == nil { log.Printf("telemetry: mem.VirtualMemory failed: %v", err)
} else {
payload.MemPct = round1(vm.UsedPercent) payload.MemPct = round1(vm.UsedPercent)
} }
@@ -63,28 +67,33 @@ func BuildPayload(ctx context.Context, cfg config.Config) (Payload, error) {
if diskPath == "" { if diskPath == "" {
diskPath = "/" diskPath = "/"
} }
diskUsage, err := disk.UsageWithContext(ctx, diskPath) if diskUsage, err := disk.UsageWithContext(ctx, diskPath); err != nil {
if err == nil { log.Printf("telemetry: disk.Usage(%s) failed: %v", diskPath, err)
} else {
payload.DiskPct = round1(diskUsage.UsedPercent) payload.DiskPct = round1(diskUsage.UsedPercent)
} }
swapUsage, err := mem.SwapMemoryWithContext(ctx) if swapUsage, err := mem.SwapMemoryWithContext(ctx); err != nil {
if err == nil { log.Printf("telemetry: mem.SwapMemory failed: %v", err)
} else {
payload.SwapPct = round1(swapUsage.UsedPercent) payload.SwapPct = round1(swapUsage.UsedPercent)
} }
avg, err := gopsload.AvgWithContext(ctx) if avg, err := gopsload.AvgWithContext(ctx); err != nil {
if err == nil { log.Printf("telemetry: load.Avg failed: %v", err)
} else {
payload.LoadAvg = []float64{round2(avg.Load1), round2(avg.Load5), round2(avg.Load15)} payload.LoadAvg = []float64{round2(avg.Load1), round2(avg.Load5), round2(avg.Load15)}
} }
hostInfo, err := host.InfoWithContext(ctx) if hostInfo, err := host.InfoWithContext(ctx); err != nil {
if err == nil { log.Printf("telemetry: host.Info failed: %v", err)
} else {
payload.UptimeSeconds = hostInfo.Uptime payload.UptimeSeconds = hostInfo.Uptime
} }
nginxInstalled, nginxSites, err := detectNginx(cfg.RootFS) if nginxInstalled, nginxSites, err := detectNginx(cfg.RootFS); err != nil {
if err == nil { log.Printf("telemetry: detectNginx failed: %v", err)
} else {
payload.NginxInstalled = nginxInstalled payload.NginxInstalled = nginxInstalled
payload.NginxSites = nginxSites payload.NginxSites = nginxSites
} }