Compare commits

...

5 Commits

Author SHA1 Message Date
046a7753e6 log monitor push start + slow heartbeat
First successful push emits an info-level "monitor push started" so
operators can confirm the loop wired up correctly. Subsequent
successes log every 60 cycles ("monitor push heartbeat") so the
journal stays quiet but still proves the loop isn't dead. Errors
already log at warn; this fills the success-side gap so a silent
journal can't hide a "no successes, no errors" pathology.
2026-06-03 13:13:33 +01:00
6e3ad669f8 feat(monitor): active push loop replacing standalone monitor
Adds a periodic POST loop to <backend>/monitor/server/heartbeat so
HF plugin can take over the standalone harborforge-monitor daemon's
job — same X-API-Key header, same flat telemetry shape (cpu_pct /
mem_pct / disk_pct / swap_pct / load_avg / uptime_seconds /
plugin_version / agents[]). HF backend stays unchanged.

Config: monitor_push_enabled (default false; opt-in to avoid surprise
heartbeats from existing deployments), monitor_push_interval_seconds
(default 30), reuses apiKey for the X-API-Key header. Lift the
container's HF_MONITER_API_KEY into config.apiKey, flip
monitor_push_enabled true, then docker rm -f the container — DB
last_seen_at keeps advancing under the plugin's loop.

Collector grew swap + cpu sampling (two reads of /proc/stat over a
1-second window when SampleCPU=true). Bridge endpoint stays cheap
(SampleCPU=false on demand); push loop is the only caller paying the
sampling cost.

E2E in sim: monitor_push_enabled=true + apiKey from injected
MonitoredServer row → server_states.last_seen_at advances exactly
every interval_seconds (10s configured, 10s observed). cpu/mem/disk/
swap_pct all populate correctly.
2026-06-03 13:04:51 +01:00
472cecd771 feat: read agent_id from ctx (SDK now plumbs it)
Plexum-sdk-go now propagates the caller agent id via
`_meta.agent_id` on tools/call. AgentIDFromCtx prefers
plugin.AgentIDFromContext(ctx); falls back to the
single-active-calendar-slot heuristic for host-driven dispatch
paths (channel manager, CLI plugin-call) that lack ctx.

Drops bestEffortAgentID — the inline closure does the same thing
without the dead-Slot-iterate noise.
2026-06-03 12:54:57 +01:00
bc1ab7b6ea fix: snake_case SlotStatus + scheduler debug logs
Two issues found while end-to-end testing against a running
harborforge-backend:

  - SlotStatus enum values: backend stores snake_case
    ("not_started" / "ongoing" / …), not the camelCase the
    OpenClaw plugin's TypeScript types.ts misled the initial
    drop into using. Heartbeat responses came back with
    Slot.Status="not_started" which the scheduler never matched
    against SlotStatus("NotStarted"), so dispatchSlot never
    fired. Aligned with backend's actual enum string values
    (verified via heartbeat response shape).

  - Added info-level logs at slot selection + dispatchSlot
    entry + WakeAgent fire/result so operators can see the
    plugin's decision chain in production without enabling
    debug. Cheap (~one tick per agent per heartbeat interval).

E2E in sim: backend returns slots=1 → selection chosen=true →
dispatch enter → WakeAgent enqueued ok → backend slot ongoing
→ next heartbeat returns slots=0.
2026-06-03 11:42:18 +01:00
78b1ec5181 fix: align calendar API with actual HarborForge.Backend contract
Initial drop guessed the heartbeat shape; sim e2e against a running
harborforge-backend revealed the real contract is per-agent with
header auth, not server-wide with bearer:

  POST /calendar/agent/heartbeat
    headers: X-Agent-ID, X-Claw-Identifier
    body:    {claw_identifier, agent_id}
    response: {slots: [Slot], agent_status, message?}

  PATCH /calendar/slots/{id}/agent-update
  PATCH /calendar/slots/virtual/{vid}/agent-update
    body: {status, started_at?, actual_duration?}

  POST /calendar/agent/status
    body: {claw_identifier, agent_id, status}

Refactors:

  - internal/calendar/types.go now mirrors OpenclawPlugin/calendar/
    types.ts 1:1 (SlotStatus camelCase, real vs virtual slot id
    discrimination, event_data shape)
  - internal/calendar/bridge.go: header-based auth, per-agent method
    signatures, separate UpdateRealSlot vs UpdateVirtualSlot
  - internal/calendar/scheduler.go: per-agent heartbeat loop
    (one HTTP call per agent per tick), highest-priority slot
    selection, agent-update PATCH for terminal/non-terminal states
  - SingleActiveAgentID helper for main.bestEffortAgentID

Also fix two bugs found in sim:

  - bgCtx capture: AgentLister closures were capturing Init's ctx
    which dies the moment MCP initialize returns; switched to
    bgCtx (lifetime = plugin process)
  - tools.toolRestartStatus referenced a non-existent
    sch.RestartPending — HF backend has no restart endpoint per
    /openapi.json, so the tool now reports last_heartbeats freshness

Scheduler logs each tick + each heartbeat outcome at info so
operators can see backend connectivity without enabling debug.

E2E against http://harborforge-backend:8000 in sim:
  daemon → heartbeat → 404 "Agent not found"
  (= correct endpoint, correct headers, correct body — agent just
   isn't registered yet, which is expected for an untenanted
   plugin)
2026-06-03 11:28:05 +01:00
10 changed files with 961 additions and 426 deletions

View File

@@ -18,9 +18,19 @@ submodule of the HarborForge umbrella repo.
## What it does ## What it does
- **Monitor bridge** — HTTP server on `127.0.0.1:<monitor_port>` that - **Monitor push loop** — when `monitor_push_enabled: true`, posts a
responds to `/telemetry` with a Snapshot the HarborForge.Monitor flat telemetry payload (cpu/mem/disk/swap/load + per-agent state)
binary expects (system metrics + every Plexum agent's sm-state) to `<backendUrl>/monitor/server/heartbeat` every
`monitor_push_interval_seconds`. This replaces the standalone
`harborforge-monitor` daemon — the plugin's lifecycle (gateway
start/stop) bounds the loop, so a separate supervisor isn't needed.
Use the same `apiKey` value the standalone monitor's
`HF_MONITER_API_KEY` carried.
- **Monitor bridge** (optional) — HTTP server on
`127.0.0.1:<monitor_port>` that responds to `/telemetry` with a
Snapshot. Useful when the standalone monitor is still present and
you want it to enrich its push payload from the plugin's view of
agents. Disable by setting `monitor_port: 0`.
- **Calendar scheduler** — heartbeats `<backendUrl>/calendar/agent/ - **Calendar scheduler** — heartbeats `<backendUrl>/calendar/agent/
heartbeat` every interval, receives any TimeSlots due to fire, and heartbeat` every interval, receives any TimeSlots due to fire, and
dispatches them through `HostAPI.WakeAgent` (state-aware queue dispatches them through `HostAPI.WakeAgent` (state-aware queue
@@ -64,15 +74,24 @@ And configure at `~/.plexum/plugins/harbor-forge/config.json`:
```json ```json
{ {
"backendUrl": "https://monitor.hangman-lab.top", "backendUrl": "https://hf-api.hangman-lab.top",
"identifier": "server-t3", "identifier": "server-t3",
"apiKey": "g1_xxx", "apiKey": "g1_xxx",
"monitor_port": 9100, "monitor_push_enabled": true,
"monitor_push_interval_seconds": 30,
"monitor_port": 0,
"calendar_enabled": true, "calendar_enabled": true,
"calendar_heartbeat_interval_seconds": 30 "calendar_heartbeat_interval_seconds": 30
} }
``` ```
Replacing the standalone `harborforge-monitor` container: lift the
container's `HF_MONITER_API_KEY` into `apiKey`, set
`monitor_push_enabled: true`, then `docker rm -f harborforge-monitor`
once you've confirmed the plugin's pushes are landing (the backend's
`server_states.last_seen_at` should keep advancing without the
container running).
Restart the host (`systemctl --user restart plexum`) and verify: Restart the host (`systemctl --user restart plexum`) and verify:
```bash ```bash

View File

@@ -41,6 +41,7 @@ type harborForgePlugin struct {
host sdkplugin.HostAPI host sdkplugin.HostAPI
cfg hfcfg.Resolved cfg hfcfg.Resolved
bridge *monitor.Bridge bridge *monitor.Bridge
pusher *monitor.Pusher
sched *calendar.Scheduler sched *calendar.Scheduler
deps tools.Deps deps tools.Deps
cancelBg context.CancelFunc cancelBg context.CancelFunc
@@ -66,46 +67,76 @@ func (p *harborForgePlugin) Init(ctx context.Context, host sdkplugin.HostAPI) er
} }
p.cfg = hfcfg.Resolve(raw) p.cfg = hfcfg.Resolve(raw)
host.Log("info", "harbor-forge plugin initialized", map[string]any{ host.Log("info", "harbor-forge plugin initialized", map[string]any{
"version": Version, "version": Version,
"backend": p.cfg.BackendURL, "backend": p.cfg.BackendURL,
"identifier": p.cfg.Identifier, "identifier": p.cfg.Identifier,
"monitor_port": p.cfg.MonitorPort, "monitor_port": p.cfg.MonitorPort,
"calendar_enabled": p.cfg.CalendarEnabled, "monitor_push_enabled": p.cfg.MonitorPushEnabled,
"calendar_enabled": p.cfg.CalendarEnabled,
}) })
collect := func() telemetry.Snapshot {
return telemetry.Collect(telemetry.CollectOpts{
Identifier: p.cfg.Identifier,
Version: Version,
AgentLister: func() []telemetry.AgentInfo {
return p.listAgents(ctx, profileRoot)
},
})
}
p.bridge = monitor.New(p.cfg.MonitorPort, collect,
func(level, msg string, attrs map[string]any) { host.Log(level, msg, attrs) })
bgCtx, cancel := context.WithCancel(context.Background()) bgCtx, cancel := context.WithCancel(context.Background())
p.cancelBg = cancel p.cancelBg = cancel
// Listers + collectors capture bgCtx (not Init ctx) — Init returns
// once MCP initialize completes, but the plugin process lives on
// and so do the goroutines + closures we registered.
makeCollector := func(sampleCPU bool) func() telemetry.Snapshot {
return func() telemetry.Snapshot {
return telemetry.Collect(telemetry.CollectOpts{
Identifier: p.cfg.Identifier,
Version: Version,
SampleCPU: sampleCPU,
AgentLister: func() []telemetry.AgentInfo {
return p.listAgents(bgCtx, profileRoot)
},
})
}
}
// Bridge serves on-demand reads; cheap, no CPU sampling.
collect := makeCollector(false)
// Pusher runs the slow push loop; CPU sampling fine here.
collectForPush := makeCollector(true)
p.bridge = monitor.New(p.cfg.MonitorPort, collect,
func(level, msg string, attrs map[string]any) { host.Log(level, msg, attrs) })
if err := p.bridge.Start(bgCtx); err != nil { if err := p.bridge.Start(bgCtx); err != nil {
host.Log("warn", "monitor bridge failed to start", map[string]any{"err": err.Error()}) host.Log("warn", "monitor bridge failed to start", map[string]any{"err": err.Error()})
} }
// Active push loop — replaces the standalone harborforge-monitor
// container. Off by default; operator opts in via
// monitor_push_enabled + apiKey.
p.pusher = monitor.NewPusher(monitor.PusherConfig{
BackendURL: p.cfg.BackendURL,
APIKey: p.cfg.APIKey,
Interval: time.Duration(p.cfg.MonitorPushIntervalSeconds) * time.Second,
}, collectForPush,
func(level, msg string, attrs map[string]any) { host.Log(level, msg, attrs) })
if p.cfg.MonitorPushEnabled {
p.wg.Add(1)
go func() {
defer p.wg.Done()
if err := p.pusher.Run(bgCtx); err != nil && !errors.Is(err, context.Canceled) {
host.Log("warn", "monitor pusher exited", map[string]any{"err": err.Error()})
}
}()
}
calBackend := p.cfg.CalendarBackendURL calBackend := p.cfg.CalendarBackendURL
if calBackend == "" { if calBackend == "" {
calBackend = p.cfg.BackendURL calBackend = p.cfg.BackendURL
} }
bridge := calendar.New(calBackend, p.cfg.APIKey) bridge := calendar.New(calBackend, p.cfg.Identifier)
p.sched = calendar.NewScheduler( p.sched = calendar.NewScheduler(
calendar.Config{ calendar.Config{
HeartbeatInterval: time.Duration(p.cfg.CalendarHeartbeatIntervalSeconds) * time.Second, HeartbeatInterval: time.Duration(p.cfg.CalendarHeartbeatIntervalSeconds) * time.Second,
}, },
bridge, host, p.cfg.Identifier, bridge, host,
calendar.PluginInfoTag{Name: "harbor-forge", Version: Version, Backend: "plexum"}, calendar.PluginInfoTag{Name: "harbor-forge", Version: Version, Backend: "plexum"},
func() []calendar.ReportableAgent { func() []calendar.ReportableAgent {
return p.listReportableAgents(ctx, profileRoot) return p.listReportableAgents(bgCtx, profileRoot)
}, },
) )
if p.cfg.CalendarEnabled { if p.cfg.CalendarEnabled {
@@ -125,18 +156,20 @@ func (p *harborForgePlugin) Init(ctx context.Context, host sdkplugin.HostAPI) er
Version: Version, Version: Version,
Collect: collect, Collect: collect,
Bridge: p.bridge, Bridge: p.bridge,
Pusher: p.pusher,
Scheduler: p.sched, Scheduler: p.sched,
Host: host, Host: host,
AgentIDFromCtx: func(ctx context.Context) string { AgentIDFromCtx: func(ctx context.Context) string {
// Plexum stashes the calling agent id on the host-side // Host attaches the caller agent id via tools/call
// context (via WithAgent) before dispatching tool calls. // `_meta.agent_id`; SDK unpacks it into ctx.
// We can't directly import internal/agentloop from a if id := sdkplugin.AgentIDFromContext(ctx); id != "" {
// plugin, so we rely on PLEXUM_TOOL_AGENT_ID env-style return id
// (set per-call by host when we add that wiring) or fall }
// back to the only-active-agent heuristic. v1: prefer the // Fallback for host paths that don't carry an agent
// only-active wake-target (deterministic in single-agent // (channel-driven, CLI plugin-call). When a single
// HF deployments). // calendar slot is active we can deterministically
return p.bestEffortAgentID() // attribute the call to that slot's owner.
return p.sched.SingleActiveAgentID()
}, },
} }
@@ -203,21 +236,7 @@ func mapStateToCalendar(s string) calendar.AgentStatusValue {
case "offline": case "offline":
return calendar.AgentStatusOffline return calendar.AgentStatusOffline
} }
return calendar.AgentStatusUnknown return calendar.AgentStatusOffline
}
// bestEffortAgentID is a v1 stop-gap for tools that need the calling
// agent's id but don't have it on the ctx (Plexum SDK doesn't yet
// expose this — TODO upstream). Returns the only active calendar
// slot's agent if there's exactly one; otherwise empty. The calendar
// tools (the only ones that need agent context) usually fire when
// exactly one slot is active.
func (p *harborForgePlugin) bestEffortAgentID() string {
sch := p.sched.Status()
if len(sch.Active) == 1 {
return sch.Active[0].Slot.AgentID
}
return ""
} }
func manifestFromDisk() sdkplugin.Manifest { func manifestFromDisk() sdkplugin.Manifest {

View File

@@ -1,7 +1,8 @@
// Bridge — thin HTTP client for the HarborForge backend's Calendar API. // Bridge — typed HTTP client for HarborForge.Backend's calendar API.
// All operations carry the API key as Authorization: Bearer; absent // Endpoint shapes verified via the backend's /openapi.json and against
// key means missing-auth errors from the backend (caller should // HarborForge.OpenclawPlugin/plugin/calendar/calendar-bridge.ts so
// handle them as transient and log). // the two plugins drop into the same backend without per-plugin
// adapters.
package calendar package calendar
@@ -13,30 +14,33 @@ import (
"fmt" "fmt"
"io" "io"
"net/http" "net/http"
"strconv"
"strings" "strings"
"time" "time"
) )
// Bridge is the typed wrapper around an HTTP client + backend URL. // Bridge is constructed once per scheduler and reused across heartbeats.
type Bridge struct { type Bridge struct {
BackendURL string BaseURL string
APIKey string ClawIdentifier string
HTTP *http.Client HTTP *http.Client
} }
// New constructs a bridge with a sensible default timeout. // New constructs a bridge with a 20s default timeout.
func New(backendURL, apiKey string) *Bridge { func New(baseURL, clawIdentifier string) *Bridge {
return &Bridge{ return &Bridge{
BackendURL: strings.TrimRight(backendURL, "/"), BaseURL: strings.TrimRight(baseURL, "/"),
APIKey: apiKey, ClawIdentifier: clawIdentifier,
HTTP: &http.Client{Timeout: 20 * time.Second}, HTTP: &http.Client{Timeout: 20 * time.Second},
} }
} }
// Heartbeat POSTs /calendar/agent/heartbeat. Returns the backend's // Heartbeat POSTs /calendar/agent/heartbeat. Per-agent: each running
// reply or an error. // agent on this claw drives its own heartbeat (matches OpenClaw plugin
func (b *Bridge) Heartbeat(ctx context.Context, payload HeartbeatPayload) (HeartbeatResponse, error) { // semantics).
raw, err := b.post(ctx, "/calendar/agent/heartbeat", payload) func (b *Bridge) Heartbeat(ctx context.Context, agentID string) (HeartbeatResponse, error) {
body := HeartbeatRequest{ClawIdentifier: b.ClawIdentifier, AgentID: agentID}
raw, err := b.doJSON(ctx, http.MethodPost, "/calendar/agent/heartbeat", agentID, body)
if err != nil { if err != nil {
return HeartbeatResponse{}, err return HeartbeatResponse{}, err
} }
@@ -49,76 +53,64 @@ func (b *Bridge) Heartbeat(ctx context.Context, payload HeartbeatPayload) (Heart
return out, nil return out, nil
} }
// UpdateSlotStatus POSTs /calendar/slot/<id>/status to mark a slot // UpdateRealSlot PATCHes /calendar/slots/{id}/agent-update.
// completed / aborted / paused / resumed. func (b *Bridge) UpdateRealSlot(ctx context.Context, agentID string, slotID int64, update SlotAgentUpdate) error {
func (b *Bridge) UpdateSlotStatus(ctx context.Context, slotID string, update SlotUpdate) error { path := "/calendar/slots/" + strconv.FormatInt(slotID, 10) + "/agent-update"
if slotID == "" { _, err := b.doJSON(ctx, http.MethodPatch, path, agentID, update)
return errors.New("calendar: slot id required")
}
_, err := b.post(ctx, "/calendar/slot/"+slotID+"/status", update)
return err return err
} }
// RestartPending GETs /restart/status — returns the backend's // UpdateVirtualSlot PATCHes /calendar/slots/virtual/{vid}/agent-update.
// current restart-requested flag. // The backend materialises the virtual slot first; subsequent calls
func (b *Bridge) RestartPending(ctx context.Context) (bool, error) { // against the same logical slot should use UpdateRealSlot with the
raw, err := b.get(ctx, "/restart/status") // id returned in the response — but for v1 we don't round-trip the
if err != nil { // materialised id back to the scheduler (would require a separate
return false, err // fetch); the agent-update path tolerates re-PATCHing a virtual id.
} func (b *Bridge) UpdateVirtualSlot(ctx context.Context, agentID string, virtualID string, update SlotAgentUpdate) error {
var out struct { path := "/calendar/slots/virtual/" + virtualID + "/agent-update"
Pending bool `json:"pending"` _, err := b.doJSON(ctx, http.MethodPatch, path, agentID, update)
} return err
if len(raw) > 0 {
if err := json.Unmarshal(raw, &out); err != nil {
return false, fmt.Errorf("decode restart-status: %w", err)
}
}
return out.Pending, nil
} }
// post serialises body as JSON, attaches Authorization, returns // PushAgentStatus POSTs /calendar/agent/status. Used to push idle ↔
// response body bytes. Non-2xx becomes an error with the body // busy transitions out of the normal heartbeat cycle.
// included for diagnostics. func (b *Bridge) PushAgentStatus(ctx context.Context, agentID string, status AgentStatusValue) error {
func (b *Bridge) post(ctx context.Context, path string, body any) ([]byte, error) { body := AgentStatusPush{
ClawIdentifier: b.ClawIdentifier, AgentID: agentID, Status: status,
}
_, err := b.doJSON(ctx, http.MethodPost, "/calendar/agent/status", agentID, body)
return err
}
// doJSON serialises body, attaches the two auth headers, and returns
// the response bytes. Errors on non-2xx with truncated body.
func (b *Bridge) doJSON(ctx context.Context, method, path, agentID string, body any) ([]byte, error) {
if agentID == "" {
return nil, errors.New("calendar: agent_id required for auth headers")
}
raw, err := json.Marshal(body) raw, err := json.Marshal(body)
if err != nil { if err != nil {
return nil, fmt.Errorf("marshal %s: %w", path, err) return nil, fmt.Errorf("marshal %s %s: %w", method, path, err)
} }
req, err := http.NewRequestWithContext(ctx, http.MethodPost, b.BackendURL+path, bytes.NewReader(raw)) req, err := http.NewRequestWithContext(ctx, method, b.BaseURL+path, bytes.NewReader(raw))
if err != nil { if err != nil {
return nil, err return nil, err
} }
req.Header.Set("Content-Type", "application/json") req.Header.Set("Content-Type", "application/json")
if b.APIKey != "" { req.Header.Set("X-Agent-ID", agentID)
req.Header.Set("Authorization", "Bearer "+b.APIKey) req.Header.Set("X-Claw-Identifier", b.ClawIdentifier)
}
return b.do(req)
}
func (b *Bridge) get(ctx context.Context, path string) ([]byte, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, b.BackendURL+path, nil)
if err != nil {
return nil, err
}
if b.APIKey != "" {
req.Header.Set("Authorization", "Bearer "+b.APIKey)
}
return b.do(req)
}
func (b *Bridge) do(req *http.Request) ([]byte, error) {
res, err := b.HTTP.Do(req) res, err := b.HTTP.Do(req)
if err != nil { if err != nil {
return nil, fmt.Errorf("%s %s: %w", req.Method, req.URL.Path, err) return nil, fmt.Errorf("%s %s: %w", method, path, err)
} }
defer res.Body.Close() defer res.Body.Close()
body, _ := io.ReadAll(res.Body) out, _ := io.ReadAll(res.Body)
if res.StatusCode < 200 || res.StatusCode >= 300 { if res.StatusCode < 200 || res.StatusCode >= 300 {
return nil, fmt.Errorf("%s %s → %d: %s", return nil, fmt.Errorf("%s %s → %d: %s",
req.Method, req.URL.Path, res.StatusCode, truncate(body, 300)) method, path, res.StatusCode, truncate(out, 300))
} }
return body, nil return out, nil
} }
func truncate(b []byte, n int) string { func truncate(b []byte, n int) string {

View File

@@ -1,14 +1,16 @@
// Scheduler — main loop that heartbeats the backend, dispatches // Scheduler — loops over every Plexum agent, heartbeats per-agent,
// returned slots via Plexum's WakeAgent, and tracks per-agent active // picks the highest-priority pending slot for each, dispatches via
// slot state for the calendar_* tools. // host.WakeAgent. Mirrors HarborForge.OpenclawPlugin's per-agent
// scheduler loop (PLG-CAL-002).
// //
// State is in-memory: a daemon restart drops everything. Next // In-memory state: per-agent active slot map. A daemon restart drops
// heartbeat reconciles (backend keeps the canonical SlotStatus). // it; next heartbeat reconciles from the backend's canonical state.
// //
// Concurrency: // Wake semantics: WakeAgent is fire-and-forget; the SDK's wake queue
// - one heartbeat ticker goroutine // (depth 1 replace-newest) handles state-aware dispatch. We mark the
// - per-slot dispatch is fire-and-forget via WakeAgent (queue-aware) // slot Ongoing optimistically the moment we call WakeAgent; agents
// - mu guards activeBySlot + activeByAgent maps // drive complete/abort/pause/resume via the harborforge_calendar_*
// tools.
package calendar package calendar
@@ -22,22 +24,20 @@ import (
sdkplugin "git.hangman-lab.top/hzhang/Plexum-sdk-go/plugin" sdkplugin "git.hangman-lab.top/hzhang/Plexum-sdk-go/plugin"
) )
// Scheduler orchestrates the calendar loop. // Scheduler is the long-running calendar driver.
type Scheduler struct { type Scheduler struct {
cfg Config cfg Config
bridge *Bridge bridge *Bridge
host sdkplugin.HostAPI host sdkplugin.HostAPI
agentLister func() []ReportableAgent agentLister func() []ReportableAgent
identifier string pluginInfo PluginInfoTag
pluginInfo PluginInfoTag
mu sync.Mutex mu sync.Mutex
activeBySlotID map[string]*ActiveSlot activeByAgentID map[string]*ActiveSlot
activeByAgentID map[string]*ActiveSlot activeBySlotIdent map[string]*ActiveSlot
history []HistoryEntry history []HistoryEntry
lastHeartbeat time.Time lastHeartbeats map[string]time.Time
lastResponse HeartbeatResponse lastErrors map[string]string
restartPending bool
} }
// Config bundles scheduler tunables. // Config bundles scheduler tunables.
@@ -46,36 +46,42 @@ type Config struct {
HistoryCap int // bound on activity history; default 32 HistoryCap int // bound on activity history; default 32
} }
// ReportableAgent is the projection of a Plexum agent the scheduler // PluginInfoTag tags heartbeat reports so the backend knows which
// needs for heartbeat — id + model + current sm state. // plugin / version is reporting.
type ReportableAgent struct { type PluginInfoTag struct {
ID string Name string
Model string Version string
State AgentStatusValue Backend string // "plexum"
} }
// ActiveSlot tracks an in-flight slot (between WakeAgent dispatch and // ReportableAgent is the per-agent projection the scheduler needs for
// terminal status update). // heartbeat enumeration.
type ReportableAgent struct {
ID string
Model string
State AgentStatusValue
}
// ActiveSlot tracks an in-flight slot from dispatch to terminal state.
type ActiveSlot struct { type ActiveSlot struct {
Slot Slot Slot Slot
StartedAt time.Time StartedAt time.Time
LastHeartbeat time.Time LastHeartbeat time.Time
State SlotStatus
} }
// HistoryEntry is one resolved slot kept for the calendar_status tool. // HistoryEntry records one resolved slot for the calendar_status tool.
type HistoryEntry struct { type HistoryEntry struct {
SlotID string Ident string
AgentID string AgentID string
Status SlotStatus Status SlotStatus
ResolvedAt time.Time ResolvedAt time.Time
Reason string Reason string
Summary string Summary string
} }
// NewScheduler constructs a Scheduler in stopped state. // NewScheduler constructs a Scheduler in stopped state.
func NewScheduler(cfg Config, bridge *Bridge, host sdkplugin.HostAPI, func NewScheduler(cfg Config, bridge *Bridge, host sdkplugin.HostAPI,
identifier string, pluginInfo PluginInfoTag, pluginInfo PluginInfoTag,
agentLister func() []ReportableAgent) *Scheduler { agentLister func() []ReportableAgent) *Scheduler {
if cfg.HeartbeatInterval <= 0 { if cfg.HeartbeatInterval <= 0 {
cfg.HeartbeatInterval = 30 * time.Second cfg.HeartbeatInterval = 30 * time.Second
@@ -84,193 +90,232 @@ func NewScheduler(cfg Config, bridge *Bridge, host sdkplugin.HostAPI,
cfg.HistoryCap = 32 cfg.HistoryCap = 32
} }
return &Scheduler{ return &Scheduler{
cfg: cfg, cfg: cfg,
bridge: bridge, bridge: bridge,
host: host, host: host,
agentLister: agentLister, agentLister: agentLister,
identifier: identifier, pluginInfo: pluginInfo,
pluginInfo: pluginInfo, activeByAgentID: map[string]*ActiveSlot{},
activeBySlotID: map[string]*ActiveSlot{}, activeBySlotIdent: map[string]*ActiveSlot{},
activeByAgentID: map[string]*ActiveSlot{}, lastHeartbeats: map[string]time.Time{},
lastErrors: map[string]string{},
} }
} }
// Run blocks until ctx cancels, ticking heartbeats every // Run blocks until ctx cancels.
// cfg.HeartbeatInterval. Returns nil on graceful shutdown.
func (s *Scheduler) Run(ctx context.Context) error { func (s *Scheduler) Run(ctx context.Context) error {
t := time.NewTicker(s.cfg.HeartbeatInterval) t := time.NewTicker(s.cfg.HeartbeatInterval)
defer t.Stop() defer t.Stop()
// First heartbeat immediately so initial state lands fast. s.tick(ctx)
s.heartbeatOnce(ctx)
for { for {
select { select {
case <-ctx.Done(): case <-ctx.Done():
return nil return nil
case <-t.C: case <-t.C:
s.heartbeatOnce(ctx) s.tick(ctx)
} }
} }
} }
func (s *Scheduler) heartbeatOnce(ctx context.Context) { func (s *Scheduler) tick(ctx context.Context) {
payload := HeartbeatPayload{ if s.agentLister == nil {
Identifier: s.identifier,
APIKey: s.bridge.APIKey,
PluginInfo: s.pluginInfo,
CapturedAt: time.Now().UTC(),
}
if s.agentLister != nil {
for _, a := range s.agentLister() {
payload.AgentList = append(payload.AgentList, AgentReport{
ID: a.ID, Model: a.Model, Status: a.State,
})
}
}
resp, err := s.bridge.Heartbeat(ctx, payload)
s.mu.Lock()
s.lastHeartbeat = time.Now()
if err == nil {
s.lastResponse = resp
s.restartPending = resp.RestartPending
}
s.mu.Unlock()
if err != nil {
return // network blip; next tick retries
}
for _, slot := range resp.SlotsToFire {
s.dispatchSlot(ctx, slot)
}
}
// dispatchSlot fires the slot via host.WakeAgent and records it as
// active. WakeAgent handles state-aware queueing — if the agent is
// busy, our calendar slot enqueues at depth 1 and the previous wake
// is dropped per replace-newest semantics. We mark the slot
// in_progress optimistically when we ENQUEUED; backend reconciles on
// its own watchdog.
func (s *Scheduler) dispatchSlot(ctx context.Context, slot Slot) {
// Skip already-active slots (heartbeat may re-list a slot we
// already started — backend hasn't seen our optimistic update yet).
s.mu.Lock()
if _, ok := s.activeBySlotID[slot.ID]; ok {
s.mu.Unlock()
return return
} }
now := time.Now().UTC() now := time.Now().UTC()
act := &ActiveSlot{ agents := s.agentLister()
Slot: slot, StartedAt: now, LastHeartbeat: now, s.host.Log("info", "calendar tick", map[string]any{"agents": len(agents)})
State: SlotInProgress, for _, agent := range agents {
s.tickForAgent(ctx, agent, now)
} }
s.activeBySlotID[slot.ID] = act }
s.activeByAgentID[slot.AgentID] = act
s.mu.Unlock()
message := slot.WakeOptions.OverrideMessage func (s *Scheduler) tickForAgent(ctx context.Context, agent ReportableAgent, now time.Time) {
if message == "" { resp, err := s.bridge.Heartbeat(ctx, agent.ID)
message = slot.PromptText s.mu.Lock()
} s.lastHeartbeats[agent.ID] = now
if message == "" { if err != nil {
message = fmt.Sprintf("[calendar] slot %s: %s", slot.ID, slot.Title) s.lastErrors[agent.ID] = err.Error()
} s.mu.Unlock()
source := fmt.Sprintf("calendar:slot-%s", slot.ID) s.host.Log("warn", "calendar heartbeat failed", map[string]any{
if err := s.host.WakeAgent(ctx, sdkplugin.WakeAgentRequest{ "agent": agent.ID, "err": err.Error(),
AgentID: slot.AgentID, })
Message: message,
Source: source,
}); err != nil {
// Wake itself failed (plumbing). Mark slot aborted +
// notify backend.
s.resolveSlot(ctx, slot.ID, SlotAborted, "", "wake-agent failed: "+err.Error())
return return
} }
delete(s.lastErrors, agent.ID)
s.mu.Unlock()
s.host.Log("info", "calendar heartbeat ok", map[string]any{
"agent": agent.ID, "slots": len(resp.Slots), "agent_status": string(resp.AgentStatus),
})
// Pick highest-priority NotStarted slot; defer the rest.
var chosen *Slot
for i := range resp.Slots {
slot := &resp.Slots[i]
if slot.Status != SlotNotStarted && slot.Status != SlotDeferred {
continue
}
if chosen == nil || slot.Priority > chosen.Priority {
chosen = slot
}
}
s.host.Log("info", "calendar slot selection", map[string]any{
"agent": agent.ID, "available": len(resp.Slots), "chosen": chosen != nil,
})
if chosen != nil {
s.dispatchSlot(ctx, agent.ID, *chosen)
}
// Defer the other unchosen NotStarted/Deferred slots (priority +1)
// so they bubble up next heartbeat. We don't strictly need to push
// the update; the backend's priority bookkeeping survives without
// our nudge for v1. (OpenClaw plugin DOES push priority bumps —
// future v2 work if backend feedback shows starvation.)
} }
// resolveSlot moves an active slot to a terminal status, records // dispatchSlot fires WakeAgent + records the slot active. Marks the
// history, and tells the backend. Safe to call concurrently. // slot Ongoing on the backend so the dashboard reflects the
func (s *Scheduler) resolveSlot(ctx context.Context, slotID string, status SlotStatus, summary, reason string) error { // transition immediately.
func (s *Scheduler) dispatchSlot(ctx context.Context, agentID string, slot Slot) {
ident := slot.SlotIdent()
s.host.Log("info", "calendar dispatchSlot enter", map[string]any{
"agent": agentID, "slot_ident": ident,
})
s.mu.Lock() s.mu.Lock()
act, ok := s.activeBySlotID[slotID] if _, dup := s.activeBySlotIdent[ident]; dup {
if !ok {
s.mu.Unlock() s.mu.Unlock()
return fmt.Errorf("calendar: slot %s not active", slotID) s.host.Log("info", "calendar dispatchSlot skipped (already active)", map[string]any{"slot": ident})
return
} }
delete(s.activeBySlotID, slotID) if _, agentBusy := s.activeByAgentID[agentID]; agentBusy {
delete(s.activeByAgentID, act.Slot.AgentID) // Don't pick up another slot until the current one resolves.
s.appendHistoryLocked(HistoryEntry{ s.mu.Unlock()
SlotID: slotID, AgentID: act.Slot.AgentID, Status: status, s.host.Log("info", "calendar dispatchSlot skipped (agent has active slot)", map[string]any{"agent": agentID})
return
}
now := time.Now().UTC()
active := &ActiveSlot{Slot: slot, StartedAt: now, LastHeartbeat: now}
s.activeBySlotIdent[ident] = active
s.activeByAgentID[agentID] = active
s.mu.Unlock()
message := buildWakeMessage(slot)
source := "calendar:" + ident
s.host.Log("info", "calendar firing WakeAgent", map[string]any{
"agent": agentID, "slot": ident, "source": source, "msg_len": len(message),
})
if err := s.host.WakeAgent(ctx, sdkplugin.WakeAgentRequest{
AgentID: agentID, Message: message, Source: source,
}); err != nil {
s.host.Log("warn", "calendar WakeAgent failed", map[string]any{
"agent": agentID, "err": err.Error(),
})
s.resolveLocally(ident, agentID, SlotAborted, "", "wake failed: "+err.Error())
return
}
s.host.Log("info", "calendar WakeAgent enqueued ok", map[string]any{
"agent": agentID, "slot": ident,
})
// Mark Ongoing on the backend.
update := SlotAgentUpdate{
Status: SlotOngoing, StartedAt: now.Format("15:04:05"),
}
s.pushUpdate(ctx, agentID, slot, update)
}
func buildWakeMessage(slot Slot) string {
// Backend EventData → prompt. v1 is intentionally simple; refine
// when the prompt-engineering side of the plugin matures.
if slot.EventType != nil {
switch *slot.EventType {
case EventTypeSystemEvent:
if ev, ok := slot.EventData["event"].(string); ok {
return fmt.Sprintf("[calendar system_event] %s", ev)
}
case EventTypeJob:
code, _ := slot.EventData["code"].(string)
typ, _ := slot.EventData["type"].(string)
if code != "" {
return fmt.Sprintf("[calendar job %s/%s] please handle this", typ, code)
}
}
}
return fmt.Sprintf("[calendar slot %s] scheduled work — please proceed", slot.SlotIdent())
}
// CompleteForAgent → terminal; pushes Finished to backend.
func (s *Scheduler) CompleteForAgent(ctx context.Context, agentID, summary string) error {
act, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
now := time.Now().UTC()
duration := int(now.Sub(act.StartedAt).Minutes())
if duration < 1 {
duration = 1
}
if err := s.pushUpdate(ctx, agentID, act.Slot, SlotAgentUpdate{
Status: SlotFinished, ActualDuration: duration,
}); err != nil {
return err
}
s.resolveLocally(act.Slot.SlotIdent(), agentID, SlotFinished, summary, "")
return nil
}
// AbortForAgent → terminal; pushes Aborted to backend.
func (s *Scheduler) AbortForAgent(ctx context.Context, agentID, reason string) error {
act, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
if err := s.pushUpdate(ctx, agentID, act.Slot, SlotAgentUpdate{Status: SlotAborted}); err != nil {
return err
}
s.resolveLocally(act.Slot.SlotIdent(), agentID, SlotAborted, "", reason)
return nil
}
// PauseForAgent → non-terminal; pushes Paused.
func (s *Scheduler) PauseForAgent(ctx context.Context, agentID, reason string) error {
act, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
return s.pushUpdate(ctx, agentID, act.Slot, SlotAgentUpdate{Status: SlotPaused})
}
// ResumeForAgent → non-terminal; pushes Ongoing.
func (s *Scheduler) ResumeForAgent(ctx context.Context, agentID string) error {
act, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
return s.pushUpdate(ctx, agentID, act.Slot, SlotAgentUpdate{Status: SlotOngoing})
}
func (s *Scheduler) pushUpdate(ctx context.Context, agentID string, slot Slot, update SlotAgentUpdate) error {
if slot.HasRealID() {
return s.bridge.UpdateRealSlot(ctx, agentID, *slot.ID, update)
}
if slot.VirtualID != nil {
return s.bridge.UpdateVirtualSlot(ctx, agentID, *slot.VirtualID, update)
}
return errors.New("calendar: slot has neither real id nor virtual id")
}
func (s *Scheduler) resolveLocally(ident, agentID string, status SlotStatus, summary, reason string) {
s.mu.Lock()
defer s.mu.Unlock()
delete(s.activeBySlotIdent, ident)
delete(s.activeByAgentID, agentID)
s.history = append(s.history, HistoryEntry{
Ident: ident, AgentID: agentID, Status: status,
ResolvedAt: time.Now().UTC(), Summary: summary, Reason: reason, ResolvedAt: time.Now().UTC(), Summary: summary, Reason: reason,
}) })
s.mu.Unlock()
return s.bridge.UpdateSlotStatus(ctx, slotID, SlotUpdate{
Status: status, Summary: summary, Reason: reason,
})
}
// SetSlotState is a non-terminal status change (paused/resumed).
// Records the new state in-memory and tells the backend.
func (s *Scheduler) SetSlotState(ctx context.Context, slotID string, status SlotStatus, reason string) error {
s.mu.Lock()
act, ok := s.activeBySlotID[slotID]
if !ok {
s.mu.Unlock()
return fmt.Errorf("calendar: slot %s not active", slotID)
}
act.State = status
act.LastHeartbeat = time.Now().UTC()
s.mu.Unlock()
return s.bridge.UpdateSlotStatus(ctx, slotID, SlotUpdate{
Status: status, Reason: reason,
})
}
func (s *Scheduler) appendHistoryLocked(entry HistoryEntry) {
s.history = append(s.history, entry)
if len(s.history) > s.cfg.HistoryCap { if len(s.history) > s.cfg.HistoryCap {
s.history = s.history[len(s.history)-s.cfg.HistoryCap:] s.history = s.history[len(s.history)-s.cfg.HistoryCap:]
} }
} }
// CompleteForAgent / AbortForAgent / PauseForAgent / ResumeForAgent
// are the agent-facing tool entry points. They look up the agent's
// active slot, transition or terminate it, and notify the backend.
// CompleteForAgent terminates the agent's active slot as completed.
func (s *Scheduler) CompleteForAgent(ctx context.Context, agentID, summary string) error {
slot, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
return s.resolveSlot(ctx, slot.Slot.ID, SlotCompleted, summary, "")
}
// AbortForAgent terminates the agent's active slot as aborted.
func (s *Scheduler) AbortForAgent(ctx context.Context, agentID, reason string) error {
slot, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
return s.resolveSlot(ctx, slot.Slot.ID, SlotAborted, "", reason)
}
// PauseForAgent transitions the agent's slot to paused.
func (s *Scheduler) PauseForAgent(ctx context.Context, agentID, reason string) error {
slot, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
return s.SetSlotState(ctx, slot.Slot.ID, SlotPaused, reason)
}
// ResumeForAgent transitions the agent's slot back to in_progress.
func (s *Scheduler) ResumeForAgent(ctx context.Context, agentID string) error {
slot, ok := s.activeSlotForAgent(agentID)
if !ok {
return ErrNoActiveSlot
}
return s.SetSlotState(ctx, slot.Slot.ID, SlotInProgress, "")
}
// activeSlotForAgent returns the per-agent active slot copy under lock.
func (s *Scheduler) activeSlotForAgent(agentID string) (ActiveSlot, bool) { func (s *Scheduler) activeSlotForAgent(agentID string) (ActiveSlot, bool) {
s.mu.Lock() s.mu.Lock()
defer s.mu.Unlock() defer s.mu.Unlock()
@@ -281,36 +326,59 @@ func (s *Scheduler) activeSlotForAgent(agentID string) (ActiveSlot, bool) {
return *act, true return *act, true
} }
// Status returns the introspection shape for the calendar_status tool. // Status is the introspection shape calendar_status returns.
func (s *Scheduler) Status() SchedulerStatus { type Status struct {
Enabled bool `json:"enabled"`
LastHeartbeats map[string]time.Time `json:"last_heartbeats"`
LastErrors map[string]string `json:"last_errors,omitempty"`
HeartbeatEvery time.Duration `json:"heartbeat_every"`
Active []ActiveSlot `json:"active"`
History []HistoryEntry `json:"history"`
}
// SingleActiveAgentID returns the agent id when exactly one active
// slot exists, empty otherwise. Used by the plugin's bestEffortAgentID
// fallback for tool calls that don't carry agent context.
func (s *Scheduler) SingleActiveAgentID() string {
s.mu.Lock() s.mu.Lock()
defer s.mu.Unlock() defer s.mu.Unlock()
active := make([]ActiveSlot, 0, len(s.activeBySlotID)) if len(s.activeByAgentID) != 1 {
for _, a := range s.activeBySlotID { return ""
}
for k := range s.activeByAgentID {
return k
}
return ""
}
// Status returns the introspection shape calendar_status returns.
func (s *Scheduler) Status() Status {
s.mu.Lock()
defer s.mu.Unlock()
active := make([]ActiveSlot, 0, len(s.activeByAgentID))
for _, a := range s.activeByAgentID {
active = append(active, *a) active = append(active, *a)
} }
hb := make(map[string]time.Time, len(s.lastHeartbeats))
for k, v := range s.lastHeartbeats {
hb[k] = v
}
errs := make(map[string]string, len(s.lastErrors))
for k, v := range s.lastErrors {
errs[k] = v
}
history := make([]HistoryEntry, len(s.history)) history := make([]HistoryEntry, len(s.history))
copy(history, s.history) copy(history, s.history)
return SchedulerStatus{ return Status{
Enabled: true, Enabled: true,
LastHeartbeat: s.lastHeartbeat, LastHeartbeats: hb,
LastErrors: errs,
HeartbeatEvery: s.cfg.HeartbeatInterval, HeartbeatEvery: s.cfg.HeartbeatInterval,
Active: active, Active: active,
History: history, History: history,
RestartPending: s.restartPending,
} }
} }
// SchedulerStatus is the shape calendar_status returns. // ErrNoActiveSlot is returned when an agent calls calendar_complete /
type SchedulerStatus struct { // abort / pause / resume but has no slot active.
Enabled bool `json:"enabled"`
LastHeartbeat time.Time `json:"last_heartbeat"`
HeartbeatEvery time.Duration `json:"heartbeat_every"`
Active []ActiveSlot `json:"active"`
History []HistoryEntry `json:"history"`
RestartPending bool `json:"restart_pending"`
}
// ErrNoActiveSlot is returned by calendar_complete/abort/pause/resume
// when the agent has no slot in progress.
var ErrNoActiveSlot = errors.New("calendar: no active slot for agent") var ErrNoActiveSlot = errors.New("calendar: no active slot for agent")

View File

@@ -1,106 +1,154 @@
// Package calendar talks to the HarborForge backend's Calendar API // Types matching HarborForge.Backend's actual calendar API contract
// (heartbeat, slot fetch, status update, restart-pending check) and // (verified via /openapi.json on a running backend). Aligns 1:1 with
// drives a scheduler loop that fires Plexum wake events when slots // HarborForge.OpenclawPlugin/plugin/calendar/types.ts so the two
// come due. Types mirror HarborForge.OpenclawPlugin's calendar/types.ts // plugins can hit the same backend interchangeably.
// so the backend doesn't need to know which plugin is reporting.
package calendar package calendar
import "time" import "time"
// SlotStatus enumerates the slot lifecycle. // SlotStatus enumerates the lifecycle. String values match backend's
// SlotStatus enum verbatim (snake_case — verified via heartbeat
// response shape against running harborforge-backend).
type SlotStatus string type SlotStatus string
const ( const (
SlotNotStarted SlotStatus = "not_started" SlotNotStarted SlotStatus = "not_started"
SlotInProgress SlotStatus = "in_progress" SlotOngoing SlotStatus = "ongoing"
SlotCompleted SlotStatus = "completed" SlotFinished SlotStatus = "finished"
SlotAborted SlotStatus = "aborted" SlotAborted SlotStatus = "aborted"
SlotPaused SlotStatus = "paused"
SlotDeferred SlotStatus = "deferred" SlotDeferred SlotStatus = "deferred"
SlotPaused SlotStatus = "paused"
SlotSkipped SlotStatus = "skipped"
) )
// AgentStatusValue mirrors the backend AgentStatus enum used in // SlotType: work vs on_call. Affects whether the agent flips to busy.
// heartbeat responses (a hint about what the backend thinks the type SlotType string
// agent is doing).
const (
SlotTypeWork SlotType = "work"
SlotTypeOnCall SlotType = "on_call"
)
// EventType categorises what the slot represents.
type EventType string
const (
EventTypeJob EventType = "job"
EventTypeSystemEvent EventType = "system_event"
EventTypeEntertainment EventType = "entertainment"
)
// AgentStatusValue mirrors the backend AgentStatus enum.
type AgentStatusValue string type AgentStatusValue string
const ( const (
AgentStatusUnknown AgentStatusValue = "unknown" AgentStatusIdle AgentStatusValue = "idle"
AgentStatusIdle AgentStatusValue = "idle" AgentStatusBusy AgentStatusValue = "busy"
AgentStatusBusy AgentStatusValue = "busy" AgentStatusOffline AgentStatusValue = "offline"
AgentStatusOffline AgentStatusValue = "offline" AgentStatusOnCall AgentStatusValue = "on_call"
AgentStatusOnCall AgentStatusValue = "on_call" AgentStatusExhausted AgentStatusValue = "exhausted"
AgentStatusPaused AgentStatusValue = "paused"
) )
// SlotKind is "work" vs "on_call" — affects how the scheduler treats // HeartbeatRequest is the POST /calendar/agent/heartbeat body.
// the slot (on_call slots don't move the agent into busy). type HeartbeatRequest struct {
type SlotKind string ClawIdentifier string `json:"claw_identifier"`
AgentID string `json:"agent_id"`
const (
SlotKindWork SlotKind = "work"
SlotKindOnCall SlotKind = "on_call"
)
// Slot is one Calendar TimeSlot the backend serves.
type Slot struct {
ID string `json:"id"`
VirtualID string `json:"virtual_id,omitempty"`
AgentID string `json:"agent_id"`
ClawID string `json:"claw_identifier,omitempty"`
Kind SlotKind `json:"slot_type"`
Title string `json:"title,omitempty"`
Description string `json:"description,omitempty"`
ScheduledAt time.Time `json:"scheduled_at"`
ExpiresAt *time.Time `json:"expires_at,omitempty"`
Status SlotStatus `json:"status"`
PromptText string `json:"prompt,omitempty"`
WakeOptions WakeOpts `json:"wake_options,omitempty"`
} }
// WakeOpts customise how the scheduler should drive the agent. v1 // HeartbeatResponse is the backend's reply.
// honours only Force; the rest pass through as audit trail.
type WakeOpts struct {
Force bool `json:"force,omitempty"`
OverrideMessage string `json:"override_message,omitempty"`
ScopeSessionID string `json:"scope_session_id,omitempty"`
}
// HeartbeatPayload is what the plugin POSTs every interval.
type HeartbeatPayload struct {
Identifier string `json:"identifier"`
APIKey string `json:"api_key,omitempty"`
AgentList []AgentReport `json:"agents"`
PluginInfo PluginInfoTag `json:"plugin"`
CapturedAt time.Time `json:"captured_at"`
}
// AgentReport is one entry in HeartbeatPayload.AgentList.
type AgentReport struct {
ID string `json:"agent_id"`
Status AgentStatusValue `json:"status"`
Model string `json:"model,omitempty"`
}
// PluginInfoTag identifies which plugin / version is heartbeating.
type PluginInfoTag struct {
Name string `json:"name"` // "harbor-forge"
Version string `json:"version"` // e.g. 0.1.0
Backend string `json:"backend"` // "plexum"
}
// HeartbeatResponse is the backend's reply. SlotsToFire are slots
// the scheduler should attempt to start.
type HeartbeatResponse struct { type HeartbeatResponse struct {
SlotsToFire []Slot `json:"slots_to_fire,omitempty"` Slots []Slot `json:"slots"`
RestartPending bool `json:"restart_pending,omitempty"` AgentStatus AgentStatusValue `json:"agent_status"`
ServerTime time.Time `json:"server_time"` Message string `json:"message,omitempty"`
} }
// SlotUpdate is the body of POST /calendar/slot/<id>/status. // Slot is one calendar TimeSlot — real (has ID) or virtual
type SlotUpdate struct { // (has VirtualID). Field names mirror the backend's
Status SlotStatus `json:"status"` // CalendarSlotResponse schema.
Summary string `json:"summary,omitempty"` type Slot struct {
Reason string `json:"reason,omitempty"` ID *int64 `json:"id"` // real slot db id; null for virtual
VirtualID *string `json:"virtual_id"` // plan-{plan_id}-{date}; null for real
UserID int64 `json:"user_id"`
Date string `json:"date"` // YYYY-MM-DD
SlotType SlotType `json:"slot_type"`
EstimatedDuration int `json:"estimated_duration"` // minutes
ScheduledAt string `json:"scheduled_at"` // HH:MM:SS
StartedAt *string `json:"started_at"`
Attended bool `json:"attended"`
ActualDuration *int `json:"actual_duration"`
EventType *EventType `json:"event_type"`
EventData EventData `json:"event_data"`
Priority int `json:"priority"`
Status SlotStatus `json:"status"`
PlanID *int64 `json:"plan_id"`
}
// EventData is loosely-typed since the backend stores it as JSONB and
// the shape varies by event_type. Plugin code does best-effort
// unmarshal into JobData / SystemEventData when needed.
type EventData map[string]any
// JobData is the event_data shape when event_type=="job".
type JobData struct {
Type string `json:"type"` // Task|Support|Meeting|Essential
Code string `json:"code"` // e.g. "TASK-42"
WorkingSessions []string `json:"working_sessions"` // arbitrary session ids
}
// SystemEventData is the event_data shape when event_type=="system_event".
type SystemEventData struct {
Event string `json:"event"` // ScheduleToday | SummaryToday | ScheduledGatewayRestart
}
// SlotAgentUpdate is the body of PATCH /calendar/slots/{id}/agent-update
// (and the virtual variant). started_at + actual_duration are set
// depending on which status transition the agent is reporting.
type SlotAgentUpdate struct {
Status SlotStatus `json:"status"`
StartedAt string `json:"started_at,omitempty"` // HH:MM:SS
ActualDuration int `json:"actual_duration,omitempty"` // minutes
}
// AgentStatusPush is the body of POST /calendar/agent/status.
type AgentStatusPush struct {
ClawIdentifier string `json:"claw_identifier"`
AgentID string `json:"agent_id"`
Status AgentStatusValue `json:"status"`
}
// HasRealID reports whether a Slot is the materialized (DB row) flavor.
func (s Slot) HasRealID() bool { return s.ID != nil && *s.ID > 0 }
// SlotIdent returns a stable string identifier for log + map keys —
// "real:<id>" for materialized, "virtual:<vid>" for virtual.
func (s Slot) SlotIdent() string {
if s.HasRealID() {
return formatInt("real", *s.ID)
}
if s.VirtualID != nil {
return "virtual:" + *s.VirtualID
}
return "unknown:" + time.Now().UTC().Format(time.RFC3339Nano)
}
func formatInt(prefix string, n int64) string {
// avoid pulling fmt for one call
const digits = "0123456789"
if n == 0 {
return prefix + ":0"
}
neg := n < 0
if neg {
n = -n
}
buf := make([]byte, 0, 20)
for n > 0 {
buf = append([]byte{digits[n%10]}, buf...)
n /= 10
}
if neg {
buf = append([]byte{'-'}, buf...)
}
return prefix + ":" + string(buf)
} }

View File

@@ -35,6 +35,17 @@ type Config struct {
// server listens on. Zero/missing disables the bridge entirely. // server listens on. Zero/missing disables the bridge entirely.
MonitorPort int `json:"monitor_port,omitempty"` MonitorPort int `json:"monitor_port,omitempty"`
// MonitorPushEnabled toggles the active push loop that uploads
// system telemetry to BackendURL /monitor/server/heartbeat. Lets
// HF plugin replace the standalone harborforge-monitor container.
// nil (unset) defaults to false; operators must opt in explicitly
// since they need to provision APIKey too.
MonitorPushEnabled *bool `json:"monitor_push_enabled,omitempty"`
// MonitorPushIntervalSeconds — defaults to 30s when ≤0. Mirrors
// the standalone monitor's HF_MONITER_REPORT_INTERVAL knob.
MonitorPushIntervalSeconds int `json:"monitor_push_interval_seconds,omitempty"`
// CalendarHeartbeatIntervalSeconds — defaults to 30s when ≤0. // CalendarHeartbeatIntervalSeconds — defaults to 30s when ≤0.
CalendarHeartbeatIntervalSeconds int `json:"calendar_heartbeat_interval_seconds,omitempty"` CalendarHeartbeatIntervalSeconds int `json:"calendar_heartbeat_interval_seconds,omitempty"`
@@ -57,6 +68,8 @@ type Resolved struct {
Identifier string Identifier string
APIKey string APIKey string
MonitorPort int MonitorPort int
MonitorPushEnabled bool
MonitorPushIntervalSeconds int
CalendarEnabled bool CalendarEnabled bool
CalendarHeartbeatIntervalSeconds int CalendarHeartbeatIntervalSeconds int
CalendarBackendURL string CalendarBackendURL string
@@ -104,6 +117,8 @@ func Resolve(c Config) Resolved {
Identifier: c.Identifier, Identifier: c.Identifier,
APIKey: c.APIKey, APIKey: c.APIKey,
MonitorPort: c.MonitorPort, MonitorPort: c.MonitorPort,
MonitorPushEnabled: false,
MonitorPushIntervalSeconds: 30,
CalendarEnabled: true, CalendarEnabled: true,
CalendarHeartbeatIntervalSeconds: 30, CalendarHeartbeatIntervalSeconds: 30,
CalendarBackendURL: c.CalendarBackendURL, CalendarBackendURL: c.CalendarBackendURL,
@@ -127,6 +142,12 @@ func Resolve(c Config) Resolved {
if c.CalendarHeartbeatIntervalSeconds > 0 { if c.CalendarHeartbeatIntervalSeconds > 0 {
out.CalendarHeartbeatIntervalSeconds = c.CalendarHeartbeatIntervalSeconds out.CalendarHeartbeatIntervalSeconds = c.CalendarHeartbeatIntervalSeconds
} }
if c.MonitorPushEnabled != nil {
out.MonitorPushEnabled = *c.MonitorPushEnabled
}
if c.MonitorPushIntervalSeconds > 0 {
out.MonitorPushIntervalSeconds = c.MonitorPushIntervalSeconds
}
if c.RestartPollIntervalSeconds > 0 { if c.RestartPollIntervalSeconds > 0 {
out.RestartPollIntervalSeconds = c.RestartPollIntervalSeconds out.RestartPollIntervalSeconds = c.RestartPollIntervalSeconds
} }

238
internal/monitor/pusher.go Normal file
View File

@@ -0,0 +1,238 @@
// Pusher periodically uploads system telemetry to the HarborForge
// backend's /monitor/server/heartbeat endpoint. Replaces the standalone
// `harborforge-monitor` daemon — the plugin's lifecycle (host gateway
// start/stop) bounds the heartbeat loop, so no separate process need
// supervise it.
//
// Wire shape mirrors HarborForge.Monitor's `telemetry.Payload`
// (flat `cpu_pct/mem_pct/...` fields + `X-API-Key` header). The
// translation from internal `telemetry.Snapshot` to that shape lives
// in buildPayload; HF backend stays unchanged.
package monitor
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"strings"
"sync"
"time"
"git.hangman-lab.top/zhi/HarborForge.PlexumPlugin/internal/telemetry"
)
// PushPayload is the wire shape POSTed to /monitor/server/heartbeat —
// 1:1 with HarborForge.Monitor's `telemetry.Payload`.
type PushPayload struct {
Identifier string `json:"identifier"`
PluginVersion string `json:"plugin_version,omitempty"`
Agents []any `json:"agents"`
NginxInstalled bool `json:"nginx_installed"`
NginxSites []string `json:"nginx_sites"`
CPUPct float64 `json:"cpu_pct,omitempty"`
MemPct float64 `json:"mem_pct,omitempty"`
DiskPct float64 `json:"disk_pct,omitempty"`
SwapPct float64 `json:"swap_pct,omitempty"`
LoadAvg []float64 `json:"load_avg,omitempty"`
UptimeSeconds uint64 `json:"uptime_seconds,omitempty"`
}
// PusherConfig is the operator-supplied tuning.
type PusherConfig struct {
BackendURL string // e.g. https://hf-api.hangman-lab.top
APIKey string // sent as X-API-Key
Interval time.Duration // default 30s when <=0
}
// Pusher runs the periodic POST loop. One per plugin process.
type Pusher struct {
cfg PusherConfig
collect func() telemetry.Snapshot
log LogFunc
http *http.Client
// stats — for the monitor_telemetry tool / status surfacing.
mu sync.RWMutex
lastSentAt time.Time
lastStatus int
lastErr string
successHits uint64
errHits uint64
}
// NewPusher constructs the loop runner. collect must be a snapshot
// producer (caller usually wires it to telemetry.Collect with
// SampleCPU=true).
func NewPusher(cfg PusherConfig, collect func() telemetry.Snapshot, log LogFunc) *Pusher {
if cfg.Interval <= 0 {
cfg.Interval = 30 * time.Second
}
if log == nil {
log = func(string, string, map[string]any) {}
}
return &Pusher{
cfg: cfg,
collect: collect,
log: log,
http: &http.Client{Timeout: 15 * time.Second},
}
}
// Run drives the push loop until ctx is cancelled. Returns ctx.Err().
// First push happens immediately so the backend sees this claw alive
// without waiting an interval.
func (p *Pusher) Run(ctx context.Context) error {
if p.cfg.BackendURL == "" {
p.log("warn", "monitor push disabled (empty backendURL)", nil)
return nil
}
if p.cfg.APIKey == "" {
p.log("warn", "monitor push disabled (empty apiKey)", nil)
return nil
}
url := strings.TrimRight(p.cfg.BackendURL, "/") + "/monitor/server/heartbeat"
tick := time.NewTicker(p.cfg.Interval)
defer tick.Stop()
p.pushOnce(ctx, url)
for {
select {
case <-ctx.Done():
return ctx.Err()
case <-tick.C:
p.pushOnce(ctx, url)
}
}
}
func (p *Pusher) pushOnce(ctx context.Context, url string) {
snap := p.collect()
body, err := json.Marshal(buildPayload(snap))
if err != nil {
p.recordErr("marshal: " + err.Error())
return
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(body))
if err != nil {
p.recordErr("build req: " + err.Error())
return
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-API-Key", p.cfg.APIKey)
res, err := p.http.Do(req)
if err != nil {
p.recordErr("send: " + err.Error())
p.log("warn", "monitor push failed", map[string]any{"err": err.Error()})
return
}
defer res.Body.Close()
raw, _ := io.ReadAll(res.Body)
if res.StatusCode < 200 || res.StatusCode >= 300 {
p.recordErr(fmt.Sprintf("%d: %s", res.StatusCode, truncate(raw, 200)))
p.log("warn", "monitor push non-2xx", map[string]any{
"status": res.StatusCode, "body": truncate(raw, 200),
})
return
}
p.recordOK(res.StatusCode)
}
// Stats exposes a copy of the latest push state for diagnostics
// (harborforge_monitor_telemetry tool surfaces this).
type PushStats struct {
LastSentAt time.Time
LastStatus int
LastErr string
SuccessCount uint64
ErrorCount uint64
}
func (p *Pusher) Stats() PushStats {
p.mu.RLock()
defer p.mu.RUnlock()
return PushStats{
LastSentAt: p.lastSentAt,
LastStatus: p.lastStatus,
LastErr: p.lastErr,
SuccessCount: p.successHits,
ErrorCount: p.errHits,
}
}
func (p *Pusher) recordOK(status int) {
p.mu.Lock()
wasFirst := p.successHits == 0
p.lastSentAt = time.Now().UTC()
p.lastStatus = status
p.lastErr = ""
p.successHits++
count := p.successHits
p.mu.Unlock()
// First success is an operator signal that the push loop is live;
// log it loud so the journal carries proof. Subsequent successes
// log on a slow heartbeat (every 60 cycles) so the journal stays
// quiet but still proves the loop hasn't drifted into "0 successes
// but no errors either" territory.
if wasFirst {
p.log("info", "monitor push started", map[string]any{"status": status})
} else if count%60 == 0 {
p.log("info", "monitor push heartbeat", map[string]any{
"successes": count, "status": status,
})
}
}
func (p *Pusher) recordErr(msg string) {
p.mu.Lock()
defer p.mu.Unlock()
p.lastSentAt = time.Now().UTC()
p.lastErr = msg
p.errHits++
}
// buildPayload translates the internal Snapshot into the flat
// PushPayload shape the backend expects. agents is passed through as
// []any (one entry per agent — id/model/state preserved).
func buildPayload(snap telemetry.Snapshot) PushPayload {
agents := make([]any, 0, len(snap.Agents))
for _, a := range snap.Agents {
agents = append(agents, map[string]any{
"id": a.ID,
"model": a.Model,
"state": a.State,
})
}
return PushPayload{
Identifier: snap.Identifier,
PluginVersion: snap.PluginInfo.Version,
Agents: agents,
// nginx detection is independent monitor's responsibility today;
// HF plugin leaves it blank rather than rediscovering nginx
// state. Operators that need it can keep the standalone monitor
// alongside or wait for a follow-up commit.
NginxInstalled: false,
NginxSites: []string{},
CPUPct: round1(snap.CPU.UsedPercent),
MemPct: round1(snap.Memory.UsedPercent),
DiskPct: round1(snap.Disk.UsedPercent),
SwapPct: round1(snap.Swap.UsedPercent),
LoadAvg: []float64{round2(snap.Load.One), round2(snap.Load.Five), round2(snap.Load.Fifteen)},
UptimeSeconds: snap.UptimeSecs,
}
}
func round1(v float64) float64 { return float64(int64(v*10+0.5)) / 10 }
func round2(v float64) float64 { return float64(int64(v*100+0.5)) / 100 }
func truncate(b []byte, n int) string {
if len(b) <= n {
return string(b)
}
return string(b[:n]) + "…"
}

View File

@@ -29,14 +29,30 @@ type Snapshot struct {
Hostname string `json:"hostname"` Hostname string `json:"hostname"`
UptimeSecs uint64 `json:"uptime"` UptimeSecs uint64 `json:"uptime"`
Memory MemoryInfo `json:"memory"` Memory MemoryInfo `json:"memory"`
Swap SwapInfo `json:"swap"`
Load LoadInfo `json:"load"` Load LoadInfo `json:"load"`
Disk DiskInfo `json:"disk"` Disk DiskInfo `json:"disk"`
CPU CPUInfo `json:"cpu"`
Agents []AgentInfo `json:"agents"` Agents []AgentInfo `json:"agents"`
PluginInfo PluginInfo `json:"plugin"` PluginInfo PluginInfo `json:"plugin"`
CapturedAt time.Time `json:"captured_at"` CapturedAt time.Time `json:"captured_at"`
HostMetadata map[string]string `json:"host_metadata,omitempty"` HostMetadata map[string]string `json:"host_metadata,omitempty"`
} }
// SwapInfo is the system swap usage. Zeroes when swap isn't configured.
type SwapInfo struct {
Total uint64 `json:"total"`
Free uint64 `json:"free"`
Used uint64 `json:"used"`
UsedPercent float64 `json:"used_percent"`
}
// CPUInfo holds the most recent CPU usage estimate. UsedPercent is
// computed across one sample interval (see Collect's cpu helper).
type CPUInfo struct {
UsedPercent float64 `json:"used_percent"`
}
// MemoryInfo mirrors OpenclawPlugin's memory shape. // MemoryInfo mirrors OpenclawPlugin's memory shape.
type MemoryInfo struct { type MemoryInfo struct {
Total uint64 `json:"total"` // bytes Total uint64 `json:"total"` // bytes
@@ -84,15 +100,27 @@ type CollectOpts struct {
Identifier string Identifier string
Version string Version string
AgentLister func() []AgentInfo // resolved by the caller (plugin uses HostAPI to walk agents) AgentLister func() []AgentInfo // resolved by the caller (plugin uses HostAPI to walk agents)
// SampleCPU asks Collect to take a 1-second CPU sample. Off-path
// (status endpoint, bridge serve) leave false to keep calls cheap;
// the slow push loop sets it true.
SampleCPU bool
} }
// Collect produces a fresh snapshot from /proc + the supplied AgentLister. // Collect produces a fresh snapshot from /proc + the supplied AgentLister.
// SampleCPU=true takes a 1-second CPU sample (two reads of /proc/stat
// with a sleep between); otherwise CPU usage stays zero. Set true on
// the slow push loop, false on the cheap on-demand status endpoint.
func Collect(opts CollectOpts) Snapshot { func Collect(opts CollectOpts) Snapshot {
now := time.Now().UTC() now := time.Now().UTC()
host, _ := os.Hostname() host, _ := os.Hostname()
mem := readMemInfo() mem, swap := readMemAndSwap()
load := readLoadAvg() load := readLoadAvg()
disk := readDiskRoot() disk := readDiskRoot()
cpu := CPUInfo{}
if opts.SampleCPU {
cpu.UsedPercent = sampleCPUPercent(time.Second)
}
var agents []AgentInfo var agents []AgentInfo
if opts.AgentLister != nil { if opts.AgentLister != nil {
agents = opts.AgentLister() agents = opts.AgentLister()
@@ -103,8 +131,10 @@ func Collect(opts CollectOpts) Snapshot {
Hostname: host, Hostname: host,
UptimeSecs: readUptime(), UptimeSecs: readUptime(),
Memory: mem, Memory: mem,
Swap: swap,
Load: load, Load: load,
Disk: disk, Disk: disk,
CPU: cpu,
Agents: agents, Agents: agents,
PluginInfo: PluginInfo{ PluginInfo: PluginInfo{
Name: "harbor-forge", Name: "harbor-forge",
@@ -117,10 +147,10 @@ func Collect(opts CollectOpts) Snapshot {
// ---- /proc helpers ---- // ---- /proc helpers ----
func readMemInfo() MemoryInfo { func readMemAndSwap() (MemoryInfo, SwapInfo) {
f, err := os.Open("/proc/meminfo") f, err := os.Open("/proc/meminfo")
if err != nil { if err != nil {
return MemoryInfo{} return MemoryInfo{}, SwapInfo{}
} }
defer f.Close() defer f.Close()
fields := map[string]uint64{} fields := map[string]uint64{}
@@ -145,6 +175,12 @@ func readMemInfo() MemoryInfo {
// All MemInfo values are in KB; convert to bytes. // All MemInfo values are in KB; convert to bytes.
fields[key] = v * 1024 fields[key] = v * 1024
} }
mem := buildMemInfo(fields)
swap := buildSwapInfo(fields)
return mem, swap
}
func buildMemInfo(fields map[string]uint64) MemoryInfo {
total := fields["MemTotal"] total := fields["MemTotal"]
free := fields["MemAvailable"] free := fields["MemAvailable"]
if free == 0 { if free == 0 {
@@ -158,6 +194,67 @@ func readMemInfo() MemoryInfo {
return MemoryInfo{Total: total, Free: free, Used: used, UsedPercent: pct} return MemoryInfo{Total: total, Free: free, Used: used, UsedPercent: pct}
} }
func buildSwapInfo(fields map[string]uint64) SwapInfo {
total := fields["SwapTotal"]
free := fields["SwapFree"]
if total == 0 {
return SwapInfo{}
}
used := total - free
pct := float64(used) / float64(total) * 100
return SwapInfo{Total: total, Free: free, Used: used, UsedPercent: pct}
}
// sampleCPUPercent computes overall CPU usage across one sample
// interval. Two reads of /proc/stat's aggregate "cpu" line, derive
// busy-time delta as (1 - idle/total). Returns 0 on read failure.
func sampleCPUPercent(interval time.Duration) float64 {
total1, idle1, ok := readCPUStat()
if !ok {
return 0
}
time.Sleep(interval)
total2, idle2, ok := readCPUStat()
if !ok || total2 <= total1 {
return 0
}
totalDelta := total2 - total1
idleDelta := idle2 - idle1
if idleDelta > totalDelta {
return 0
}
return float64(totalDelta-idleDelta) / float64(totalDelta) * 100
}
func readCPUStat() (total, idle uint64, ok bool) {
f, err := os.Open("/proc/stat")
if err != nil {
return 0, 0, false
}
defer f.Close()
sc := bufio.NewScanner(f)
if !sc.Scan() {
return 0, 0, false
}
parts := strings.Fields(sc.Text())
if len(parts) < 5 || parts[0] != "cpu" {
return 0, 0, false
}
for i := 1; i < len(parts); i++ {
v, err := strconv.ParseUint(parts[i], 10, 64)
if err != nil {
return 0, 0, false
}
total += v
// idle is the 4th column (parts[4]); iowait (parts[5]) is also
// idle-ish but we count it as busy to match gopsutil's default.
if i == 4 {
idle = v
}
}
return total, idle, true
}
func readLoadAvg() LoadInfo { func readLoadAvg() LoadInfo {
raw, err := os.ReadFile("/proc/loadavg") raw, err := os.ReadFile("/proc/loadavg")
if err != nil { if err != nil {

View File

@@ -27,6 +27,7 @@ type Deps struct {
Version string Version string
Collect func() telemetry.Snapshot Collect func() telemetry.Snapshot
Bridge *monitor.Bridge Bridge *monitor.Bridge
Pusher *monitor.Pusher
Scheduler *calendar.Scheduler Scheduler *calendar.Scheduler
Host sdkplugin.HostAPI Host sdkplugin.HostAPI
@@ -89,11 +90,32 @@ func toolStatus(deps Deps) (sdkplugin.ToolResult, error) {
"queries": bs.Queries, "queries": bs.Queries,
"last_query": bs.LastQuery, "last_query": bs.LastQuery,
}, },
"calendar": sch, "monitor_push": monitorPushSummary(deps),
"calendar": sch,
} }
return jsonResult(out) return jsonResult(out)
} }
// monitorPushSummary returns the pusher's last-known state in the same
// JSON layout the status/monitor_telemetry tools surface. Nil-safe: if
// no pusher is wired (testing, push disabled), reports enabled=false.
func monitorPushSummary(deps Deps) map[string]any {
out := map[string]any{
"enabled": deps.Config.MonitorPushEnabled,
"interval_seconds": deps.Config.MonitorPushIntervalSeconds,
"endpoint": deps.Config.BackendURL + "/monitor/server/heartbeat",
}
if deps.Pusher != nil {
st := deps.Pusher.Stats()
out["last_sent_at"] = st.LastSentAt
out["last_status"] = st.LastStatus
out["last_err"] = st.LastErr
out["success_count"] = st.SuccessCount
out["error_count"] = st.ErrorCount
}
return out
}
func toolTelemetry(deps Deps) (sdkplugin.ToolResult, error) { func toolTelemetry(deps Deps) (sdkplugin.ToolResult, error) {
return jsonResult(deps.Collect()) return jsonResult(deps.Collect())
} }
@@ -101,11 +123,14 @@ func toolTelemetry(deps Deps) (sdkplugin.ToolResult, error) {
func toolMonitorTelemetry(deps Deps) (sdkplugin.ToolResult, error) { func toolMonitorTelemetry(deps Deps) (sdkplugin.ToolResult, error) {
bs := deps.Bridge.Stats() bs := deps.Bridge.Stats()
return jsonResult(map[string]any{ return jsonResult(map[string]any{
"port": bs.Port, "bridge": map[string]any{
"listening": bs.Listening, "port": bs.Port,
"queries": bs.Queries, "listening": bs.Listening,
"last_query": bs.LastQuery, "queries": bs.Queries,
"last_snapshot": bs.LastSnap, "last_query": bs.LastQuery,
"last_snapshot": bs.LastSnap,
},
"push": monitorPushSummary(deps),
}) })
} }
@@ -176,11 +201,15 @@ func toolCalendarResume(ctx context.Context, deps Deps) (sdkplugin.ToolResult, e
} }
func toolRestartStatus(deps Deps) (sdkplugin.ToolResult, error) { func toolRestartStatus(deps Deps) (sdkplugin.ToolResult, error) {
// HarborForge backend doesn't expose a restart-pending endpoint
// (verified via /openapi.json) so we report the most recent
// heartbeat freshness instead. Useful for operators sanity-
// checking that the plugin's calendar loop is still alive.
sch := deps.Scheduler.Status() sch := deps.Scheduler.Status()
return jsonResult(map[string]any{ return jsonResult(map[string]any{
"pending": sch.RestartPending, "pending": false,
"last_heartbeat": sch.LastHeartbeat, "last_heartbeats": sch.LastHeartbeats,
"observed_at": time.Now().UTC(), "observed_at": time.Now().UTC(),
}) })
} }

View File

@@ -37,12 +37,16 @@ Next steps:
"harbor-forge" "harbor-forge"
2. Write ${PLUGIN_DIR}/config.json — sample: 2. Write ${PLUGIN_DIR}/config.json — sample:
{ {
"backendUrl": "https://monitor.hangman-lab.top", "backendUrl": "https://hf-api.hangman-lab.top",
"identifier": "server-t3", "identifier": "server-t3",
"apiKey": "g1_xxx", "apiKey": "<copy from HF_MONITER_API_KEY>",
"monitor_port": 9100, "monitor_push_enabled": true,
"monitor_push_interval_seconds": 30,
"monitor_port": 0,
"calendar_enabled": true, "calendar_enabled": true,
"calendar_heartbeat_interval_seconds": 30 "calendar_heartbeat_interval_seconds": 30
} }
3. Restart the host: systemctl --user restart plexum 3. Restart the host: systemctl --user restart plexum
4. Verify push is landing (DB last_seen_at advancing) and then
remove the standalone harborforge-monitor container.
EOF EOF