fix(hf-plugin): wrap tool returns in MCP {content:[...]} shape

OpenClaw's Codex tool dispatcher (thread-lifecycle:255) expects every tool execute() to return { content: [...] } and calls result.content.reduce() to compute total text length. All 9 harborforge_* tools returned bare objects ({ running, processing, currentSlot, ... }) which has no .content field — so .reduce of undefined threw, and the agent saw the cryptic 'Cannot read properties of undefined (reading reduce)' on every call. This silently blocked every calendar slot transition on prod for hours: agents could call harborforge_calendar_complete but it always errored, so slots never moved out of not_started. Fix is at the registerTool boundary: api.registerTool is wrapped once to coerce every tool's execute return through ensureMcpContentShape. Tools that already return the correct shape are unchanged. No per-tool edits needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge pull request 'fix: wakeup message says 'continue in same session', not 'only reply WAKEUP_OK'' (#9 ) from fix/wakeup-message-no-ack-only into main
2026-05-23 08:48:05 +01:00 · 2026-05-21 10:05:34 +00:00 · 2026-05-21 11:05:28 +01:00 · 2026-05-21 09:54:55 +00:00 · 2026-05-21 10:54:36 +01:00 · 2026-05-21 09:39:51 +00:00
1 changed files with 16 additions and 126 deletions
--- a/plugin/index.ts
+++ b/plugin/index.ts
@@ -128,9 +128,7 @@ function register(api: PluginAPI): void {
    /** Resolve agent ID from env, config, or fallback. */
    function resolveAgentId(): string {
      if (process.env.AGENT_ID) return process.env.AGENT_ID;
-      // Read from cached `api.config` first — see pushMetaToMonitor for why
+      const cfg = api.runtime?.config?.loadConfig?.();
      // the deprecated `api.runtime?.config?.loadConfig?.()` path is heavy.
      const cfg = (api as any).config ?? api.runtime?.config?.loadConfig?.();
      return cfg?.agents?.list?.[0]?.id ?? cfg?.agents?.defaults?.id ?? 'unknown';
    }
@@ -186,25 +184,6 @@ function register(api: PluginAPI): void {
     * Push OpenClaw metadata to the Monitor bridge.
     * This enriches Monitor heartbeats with OpenClaw version/plugin/agent info.
     * Failures are non-fatal — Monitor continues to work without this data.
     *
     * IMPORTANT — read config from the cached `api.config` surface, NOT from
     * the deprecated `api.runtime?.config?.loadConfig?.()` path. The
     * deprecated path triggers a full plugin-metadata-snapshot rebuild on
     * every call: realpathSync walks every plugin's package.json + manifest
     * + source paths (lstats up the directory tree), `hashWatchedFiles`
     * fingerprints all watched plugin files, and `discoverInDirectory`
     * re-scans every `dist/extensions/<plugin>` dir. On t2 with ~100 plugins
     * each rebuild costs ~6-7s of CPU; with this push firing every 30s
     * (default reportIntervalSec) the chronic baseline was ~22-25% gateway
     * CPU even with zero agent activity (V8 profile 2026-05-27 08:14:00 60s:
     * lstat 44.2%, statSync 6.9%, hashWatchedFiles via memo key 1.7%, all
     * routed through readPersistedInstalledPluginIndexInstallRecordsSync ->
     * discoverInDirectory). Switching to `api.config` reads from the
     * already-loaded snapshot cache; the elsewhere-in-this-file pattern was
     * already `api.config ?? api.runtime?.config?.loadConfig?.()`.
     *
     * Same fix is applied to `resolveAgentId` below — that's read once at
     * gateway start so the impact is smaller, but it's the same anti-pattern.
     */
    async function pushMetaToMonitor() {
      const bridgeClient = getBridgeClient();
@@ -212,7 +191,7 @@ function register(api: PluginAPI): void {
      let agentNames: string[] = [];
      try {
-        const cfg = (api as any).config ?? api.runtime?.config?.loadConfig?.();
+        const cfg = api.runtime?.config?.loadConfig?.();
        const agentsList = cfg?.agents?.list;
        if (Array.isArray(agentsList)) {
          agentNames = agentsList
@@ -328,22 +307,21 @@ function register(api: PluginAPI): void {
          )}\n\`\`\``;
        }
-        // The wakeup dispatcher's `deliver` callback below only logs the
+        // First-line ack `WAKEUP_OK` is the plugin's ack-receipt token; the
-        // reply text — it does NOT inspect any ack token. The earlier
+        // agent MUST then continue in the same session and drive the
-        // `WAKEUP_OK` first-line-ack convention was prompt-only theatre;
+        // `hf-wakeup` workflow to completion (calendar_status → task fetch →
-        // nothing in this plugin or in openclaw acted on it. The only
+        // sub-workflow → calendar_complete/abort). Without that continuation
-        // thing that ends a wake cycle is the slot transitioning out of
+        // the scheduler keeps re-waking every 30s because the slot stays
-        // `not_started`, which happens when the agent calls
+        // `not_started` forever.
        // `harborforge_calendar_complete` or `harborforge_calendar_abort`.
        // Tell the agent that plainly instead of asking for a fake ack.
        const wakeupMessage =
-          `You have due slots. Drive the \`hf-wakeup\` workflow of skill ` +
+          `You have due slots. **First line of your reply MUST be exactly ` +
-          `\`hf-hangman-lab\` to completion in this session — read slot ` +
+          `\`WAKEUP_OK\`** so the plugin records the ack. Then, **in this ` +
-          `context, call the harborforge_calendar_* tools, route to the ` +
+          `same session**, drive the \`hf-wakeup\` workflow of skill ` +
-          `right sub-workflow, and finish with harborforge_calendar_complete ` +
+          `\`hf-hangman-lab\` to completion — read slot context, call the ` +
-          `or harborforge_calendar_abort. The scheduler keeps re-waking you ` +
+          `harborforge_calendar_* tools, route to the right sub-workflow, ` +
-          `every 30s until the slot transitions out of \`not_started\`, so ` +
+          `and finish with harborforge_calendar_complete or abort. Do NOT ` +
-          `partial work or silence just produces another wake.${slotBlock}`;
+          `stop after the ack — the scheduler will re-wake you every 30s ` +
          `until the slot transitions out of \`not_started\`.${slotBlock}`;
        const result = await dispatchInboundMessageWithDispatcher({
          ctx: {
@@ -418,94 +396,6 @@ function register(api: PluginAPI): void {
        }
      }
      // Cross-plugin exposure: agent status lookup for other plugins
      // (currently Fabric.OpenclawPlugin uses this to skip delivering
      // `announce` channel messages to busy agents — see DIALECTIC-V2
      // design doc, Phase 1). Backed by calendarBridge.getAgentStatus
      // with a small TTL cache to avoid hammering the HF backend.
      type HfStatus = 'idle' | 'on_call' | 'busy' | 'exhausted' | 'offline';
      const HF_STATUS_CACHE_TTL_MS = 30_000;
      const hfStatusCache = new Map<string, { status: HfStatus; at: number }>();
      const _G = globalThis as Record<string, unknown>;
      _G['__hfAgentStatus'] = {
        async get(agentId: string): Promise<HfStatus | undefined> {
          if (!agentId) return undefined;
          const cached = hfStatusCache.get(agentId);
          if (cached && Date.now() - cached.at < HF_STATUS_CACHE_TTL_MS) {
            return cached.status;
          }
          try {
            const status = await calendarBridge.getAgentStatus(agentId);
            if (status) {
              const typed = status as HfStatus;
              hfStatusCache.set(agentId, { status: typed, at: Date.now() });
              return typed;
            }
          } catch {
            /* fall through to cached-or-undefined */
          }
          return cached?.status;
        },
        /**
         * Approximate "does agent have an on_call slot covering [from, to]?"
         * for cross-plugin pre-check use (currently:
         * Dialectic.OpenclawPlugin's signup HF coverage).
         *
         * v1 honest scope: we only have today's slots in scheduleCache
         * (synced from /calendar/sync which is today-only). Returns:
         *   - true  iff window is same-day AND some cached on_call slot
         *           starts <= from AND ends >= to
         *   - false iff window is same-day AND no such slot
         *   - undefined for cross-day windows OR cache empty for this
         *     agent (caller treats undefined as "I don't know" — see
         *     Dialectic plugin's hf-precheck.ts which degrades to
         *     "skipped" gracefully)
         *
         * Phase TBD: when HF backend ships a `/calendar/slots?agent&from&to`
         * endpoint, swap this to call it for arbitrary windows. Until then,
         * same-day-only coverage gates ~all debates created by analyze-intel
         * (which schedules <2h windows) without needing a backend change.
         */
        async hasOnCallCovering(
          agentId: string,
          fromIso: string,
          toIso: string,
        ): Promise<boolean | undefined> {
          if (!agentId || !fromIso || !toIso) return undefined;
          const from = new Date(fromIso);
          const to = new Date(toIso);
          if (isNaN(from.getTime()) || isNaN(to.getTime())) return undefined;
          if (!(from < to)) return undefined;
          // Cross-day → cache only has today; can't decide.
          const fromDate = from.toISOString().slice(0, 10);
          const toDate = to.toISOString().slice(0, 10);
          if (fromDate !== toDate) return undefined;
          // Cache's cachedDate must match our window's date.
          const cacheStatus = scheduleCache.getStatus();
          if (cacheStatus.cachedDate !== fromDate) return undefined;
          const slots = scheduleCache.getAgentSlots(agentId);
          if (slots.length === 0) return undefined; // cache empty for this agent — can't decide
          for (const s of slots) {
            if (s.slot_type !== 'on_call') continue;
            // status: ignore aborted/cancelled, accept not_started / ongoing / finished
            if (s.status === 'aborted' || s.status === 'cancelled') continue;
            const startStr = s.scheduled_at;
            if (typeof startStr !== 'string') continue;
            // scheduled_at can be HH:MM:SS (cache-relative date) or full ISO
            const start =
              /^\d{2}:\d{2}(:\d{2})?$/.test(startStr)
                ? new Date(`${fromDate}T${startStr}Z`)
                : new Date(startStr);
            if (isNaN(start.getTime())) continue;
            const dur = typeof s.estimated_duration === 'number' ? s.estimated_duration : 0;
            const end = new Date(start.getTime() + dur * 60_000);
            if (start <= from && end >= to) return true;
          }
          return false;
        },
      };
      // Track wakes already dispatched for a slot in the current sync
      // window — the simplified inline scheduler does not PATCH slot
      // status server-side, so without dedupe the check loop re-wakes
Author	SHA1	Message	Date
hzhang	a1b4d347d9	fix(hf-plugin): wrap tool returns in MCP {content:[...]} shape OpenClaw's Codex tool dispatcher (thread-lifecycle:255) expects every tool execute() to return { content: [...] } and calls result.content.reduce() to compute total text length. All 9 harborforge_* tools returned bare objects ({ running, processing, currentSlot, ... }) which has no .content field — so .reduce of undefined threw, and the agent saw the cryptic 'Cannot read properties of undefined (reading reduce)' on every call. This silently blocked every calendar slot transition on prod for hours: agents could call harborforge_calendar_complete but it always errored, so slots never moved out of not_started. Fix is at the registerTool boundary: api.registerTool is wrapped once to coerce every tool's execute return through ensureMcpContentShape. Tools that already return the correct shape are unchanged. No per-tool edits needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 08:48:05 +01:00
h z	2a2a298d15	Merge pull request 'fix: wakeup message says 'continue in same session', not 'only reply WAKEUP_OK'' (#9 ) from fix/wakeup-message-no-ack-only into main	2026-05-21 10:05:34 +00:00
hanghang zhang	102809dc2a	fix(plugin): wakeup message says 'continue in same session', not 'only reply WAKEUP_OK' E2e showed the old wakeup text trapped agents in an ack-only loop: > "You have due slots. Follow the `hf-wakeup` workflow of skill > `hf-hangman-lab` to proceed. Only reply `WAKEUP_OK` in this session." The two clauses contradicted each other — "follow the workflow" vs "only reply WAKEUP_OK". MiniMax-M2.5 prioritised the literal "only" and never proceeded past the ack; the scheduler then re-woke every 30s because the slot stayed `not_started`, and the agent kept re-acking forever (verified: 3 consecutive WAKEUP_OK-only replies across slot 7). Rewrites the wakeup message to be explicit: - first line MUST be `WAKEUP_OK` (the ack token the plugin looks for) - then continue IN THE SAME session: drive calendar_status → task fetch → sub-workflow → calendar_complete/abort - flags the loop trap so the agent knows what to avoid Bumps version 0.3.3 → 0.3.4. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 11:05:28 +01:00
h z	065b0d3da3	Merge pull request 'fix: lift calendarScheduler to module scope (multi-register singleton)' (#8 ) from fix/scheduler-module-singleton into main	2026-05-21 09:54:55 +00:00
hanghang zhang	afb8b25558	fix(plugin): lift calendarScheduler to module scope (multi-register singleton) Trying the prior multi-agent-handle fix in dind-t2 surfaced a second bug that PR #7 didn't reach: `harborforge_calendar_status` still returned `Calendar scheduler not running` even though the gateway log showed the scheduler had started 30+ seconds before the agent's call. ## Root cause `register()` is invoked once per agent — `grep -c "HarborForge plugin registered" /tmp/gw-stdout.log` reports 5 for a 5-agent claw. Every invocation creates its own `let calendarScheduler` closure binding. But `gateway_start` fires once and we only call `startCalendarScheduler()` through that single hook, so exactly one of the five closures sees the handle and the other four keep their bindings at `null`. The host's tool router picks one of the five duplicate `harborforge_calendar_status` registrations to dispatch to — most of the time it's one of the four "null" closures, which is why every wakeup the agent saw `Calendar scheduler not running`. ## Fix Lift `let calendarScheduler` out of `register()` and into module scope. All five register-call closures now reference the same binding; once the single `gateway_start` initialises it, every tool sees it. `startCalendarScheduler()` now early-returns when `calendarScheduler` is already set, so duplicate `gateway_start` firings (if the host ever does that) don't double-install intervals. Bumps version 0.3.2 → 0.3.3. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 10:54:36 +01:00
h z	98e663a19b	Merge pull request 'fix: real per-agent slot handle for multi-agent calendar tools' (#7 ) from fix/multi-agent-scheduler-handle into main	2026-05-21 09:39:51 +00:00
hanghang zhang	d5cea9a44d	fix(plugin): real per-agent slot handle for multi-agent calendar tools In multi-agent sync mode every harborforge_calendar_* tool was returning `calendarScheduler.<method> is not a function`. The cause: index.ts replaced `calendarScheduler` (typed `CalendarScheduler \| null`) with a `{ stop() }` stub right after wiring the runSync/runCheck intervals, so `isRunning()`, `getCurrentSlot()`, `completeCurrentSlot()`, `abortCurrentSlot()`, `pauseCurrentSlot()`, `resumeCurrentSlot()`, `getState()`, `isRestartPending()` and `getStateFilePath()` all blew up at call time. Replaces the stub with a `MultiAgentSchedulerHandle` that: - tracks the last slot dispatched per agent (recorded by `wakeAgent`) - exposes status/complete/abort/pause/resume taking the calling agentId - resolves the implicit "current slot" via woken-cursor first then a cache scan over not_started/deferred/ongoing slots - PATCHes via `bridge.updateSlotAs(agentId, …)` so audit headers reflect the real caller (bridge constructor agentId is 'unused' in multi-agent) - mirrors the legacy `isRunning/isProcessing/getState/...` surface so the single-agent fallback (`CalendarScheduler`) keeps working unchanged Each calendar tool factory now takes `OpenClawPluginToolContext`, reads `ctx.agentId`, and dispatches through the handle. Single-agent path (when `calendarScheduler` is a real `CalendarScheduler`) is preserved behind `instanceof` checks. Drops the dead `trackSessionCompletion` poll loop (only definition, no caller) which referenced the removed `completeCurrentSlot`. Bumps plugin version 0.2.0 → 0.3.2. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 10:38:57 +01:00
h z	f627845543	Merge pull request 'fix: wake dedupe + inline slot context + complete contracts.tools' (#6 ) from fix/wake-dedupe-and-contracts into main	2026-05-20 14:48:06 +00:00
hanghang zhang	b878fa2a41	fix: wake dedupe + inline slot context + complete contracts.tools Three issues making HF→agent wakeup unusable in practice, surfaced by DinD sim end-to-end test (recruiter agent + slot for 招募 manager task): 1. Plugin re-woke the same slot every 30s. The inline runCheck only destructured agentId from scheduleCache.getAgentsWithDueSlots() and dropped the slots array, then called wakeAgent without recording the wake. The simplified inline scheduler also never PATCHes slot status server-side from not_started→ongoing, so the next 30s check sees the slot still due and wakes again. After 4 wakes the agent's wakeup session was full of WAKEUP_OK noise. Fix: keep slots in runCheck, add an in-memory wakedSlotKeys set keyed by (agentId, slotId\|virtual_id\|scheduled_at). Dedupe on this set; clear it inside the sync interval (fresh wake budget per sync). Server-side slot transition still TODO (requires re-introducing the CalendarScheduler class path or PATCH /calendar/slots/.../agent-update here); the dedupe at least stops the wake spam. 2. Wakeup message had no slot context. The wakeup body just said 'follow hf-wakeup workflow' with no slot id/event_data/task_code. The agent then had to call harborforge_calendar_status to learn anything — which itself is broken in the simplified scheduler (it queries a CalendarScheduler instance that never gets created). Fix: pass dueSlots into wakeAgent and inline the highest-priority slot's {slot_id, scheduled_at, priority, slot_type, event_data} as a JSON block in the wakeup message. The agent reads event_data. task_code directly and routes via workflow_lookup without any round-trip. Per PLG-CAL-001 docs in hf-hangman-lab SKILL.md, this is the documented contract; we are bringing the message in line. 3. contracts.tools listed 5 of the 9 registered tools. Manifest had harborforge_status/telemetry/monitor_telemetry/calendar_status/ calendar_complete. Code also registers calendar_abort, calendar_pause, calendar_resume, harborforge_restart_status. With the new OpenClaw plugin host enforcement (same gotcha that bit Meridian — see zhi/Meridian#2), undeclared tools are silently dropped from the agent's tool list, so abort/pause/resume cannot be called by the agent. plugin doctor was emitting: 'plugin tool is undeclared (harbor-forge): harborforge_calendar_abort' for each missing tool. Fix: add the 4 missing tool names to contracts.tools. Also use api.config as the primary config source in wakeAgent (current public API), falling back to runtime.config.loadConfig() for older hosts — same pattern as the Meridian fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 12:02:25 +01:00