7 Commits

Author SHA1 Message Date
2977ab369e refactor(install): clone HarborForge.Cli to /tmp instead of fixed path
`installCli()` used to look for the CLI source at a fixed path relative
to the plugin checkout: either `./HarborForge.Cli` or `../HarborForge.Cli`.
That breaks any install layout where the plugin lives on its own — the
script just logs "Skipping CLI installation" and returns. Same anti-
pattern installManagedMonitor had already fixed for the monitor binary.

Mirror the monitor flow:

  1. git clone --depth 1 --branch <cliBranch> CLI_REPO_URL → /tmp/<dir>
  2. go build -ldflags Version=<date>+<branch>-<sha> -o $openclaw/bin/hf
  3. chmod 755 + delete tmp dir on success or failure

Adds `--cli-branch <name>` (default: main) for parity with --monitor-branch.

Also stamps the binary with a real version string (was 'dev' before this
patch) so `hf version` is informative for debugging.
2026-05-29 08:52:14 +01:00
zhi
c8998c6b0d Merge pull request 'perf(meta-push): use cached api.config instead of deprecated loadConfig() — kills ~25% chronic baseline CPU' (#11) from fix/meta-push-use-cached-api-config into main 2026-05-27 08:25:34 +00:00
686f2c7cb0 perf(meta-push): use cached api.config instead of deprecated loadConfig()
`pushMetaToMonitor` and `resolveAgentId` were both calling
`api.runtime?.config?.loadConfig?.()` to read the agent list. That
deprecated path (openclaw warns at gateway start:
"plugin runtime config.loadConfig() is deprecated; use config.current()")
synchronously rebuilds the full plugin-metadata snapshot — realpathSync
walks every plugin's package.json + manifest + source up the directory
tree, hashWatchedFiles fingerprints every watched plugin file, and
discoverInDirectory re-scans every `dist/extensions/<plugin>` (~100 of
them on prod t2). Each rebuild costs ~6-7s of gateway CPU.

`pushMetaToMonitor` fires every `reportIntervalSec` (default 30s)
from `hooks/gateway-start.js`. With 100 plugins that put the gateway
into a chronic ~22-30% CPU baseline even with zero agent activity. V8
profile 2026-05-27 08:14:00 60s window (0 turns, 2 metadata pushes
during): lstat 44.2%, statSync(buildInstalledManifestRegistryIndexKey)
6.9%, hashWatchedFiles via memo key 1.7%, all routed through
`readPersistedInstalledPluginIndexInstallRecordsSync` -> per-plugin
`discoverInDirectory`.

Switching to `(api as any).config ?? api.runtime?.config?.loadConfig?.()`
reads from the snapshot cache the gateway already maintains — the same
pattern already used elsewhere in this file (e.g. the calendar wakeAgent
dispatcher at line 284). Same change applied to `resolveAgentId` (only
runs once at start, but same anti-pattern).

This is a plugin-side perf workaround. The underlying openclaw bug is
that `loadConfig()` rebuilds the snapshot rather than returning the
cached one — a chronic 'all sync cache validity checks pay the full
discovery cost' design issue worth pushing upstream separately (the
walks per-call cost we measured here is unrelated to and amplifies any
agent-turn-triggered walk path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 09:17:39 +01:00
h z
81d40ae63d fix(wakeup): drop WAKEUP_OK ack-token theatre (#10) 2026-05-26 08:13:42 +00:00
65a3fb8d2d fix(wakeup): drop WAKEUP_OK ack-token theatre from wakeup message
The wakeup dispatcher's `deliver` callback only does
`logger.info(reply.slice(0,100))` — no token detection, no scheduler
state change. The "first line of your reply MUST be exactly WAKEUP_OK
so the plugin records the ack" instruction was prompt theatre that
nothing in this plugin (or in openclaw) acted on. Confirmed by
reading openclaw/dist/plugin-sdk/src/auto-reply/tokens.d.ts which
declares HEARTBEAT_OK and SILENT_REPLY tokens but nothing for wakeup.

Symptom in the wild: agents would replay WAKEUP_OK every turn for no
gain — costing model budget on a no-op token — and the workflow doc
(`ClawSkills/workflows/hf-wakeup/flow.md`) carried a wandering
appendix explaining the ack "doesn't actually do anything anyway".

Rewrite the wakeup message to tell the agent the truth: drive the
hf-wakeup workflow to completion; the scheduler keeps re-waking
every 30s until the slot transitions out of `not_started` via
harborforge_calendar_complete or _abort. No ack token expected.

ClawSkills companion change (lyn/ClawSkills d0109f3) removes
WAKEUP_OK from skills/hf-hangman-lab/SKILL.md and
workflows/hf-wakeup/flow.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 09:10:47 +01:00
c2d00c18a7 feat(hf-plugin): __hfAgentStatus.hasOnCallCovering(agentId, from, to)
Cross-plugin accessor for "does agent have on_call slot covering this
window?" — first consumer is Dialectic.OpenclawPlugin signup pre-check
(its hf-precheck.ts has been degrading to "skipped" since Phase 3
ship pending this).

v1 honest scope: same-day windows only (scheduleCache is today-only
from /calendar/sync). Cross-day or empty-cache windows return undefined
which the caller treats as "skipped" (Dialectic backend stores
pre_validated:false as audit signal — same as before, just now we
actually validate when we can).

Logic: for each cached slot where slot_type=on_call AND status not in
{aborted,cancelled}, parse scheduled_at (HH:MM:SS or full ISO) and
estimated_duration to compute end; return true iff start<=from AND
end>=to. Returns false (not undefined) when cache has slots for the
agent on this date but none covers — that means "actually no coverage"
vs "I dont know".

Pairs with Dialectic.OpenclawPlugin/src/hf-precheck.ts which already
calls hf.hasOnCallCovering and handles all 3 return shapes.

No backend change required.
2026-05-23 14:58:37 +01:00
709f7e09ab feat(hf-plugin): expose globalThis.__hfAgentStatus.get(agentId)
Cross-plugin agent-status accessor for use by Fabric.OpenclawPlugin's
presence-sync loop (and any future plugin needing 'is agent X busy
right now'). Backed by CalendarBridgeClient.getAgentStatus() with a
30s in-memory TTL cache to avoid hammering the HF backend.

Returns one of 'idle' | 'on_call' | 'busy' | 'exhausted' | 'offline'
or undefined when the agent isn't known to HF. Cache miss + bridge
failure returns the last cached value (stale-data better than no
data for delivery-decision use cases).

Part of DIALECTIC-V2 Phase 1 (Fabric announce channel + busy-discard).
See /home/hzhang/arch/DIALECTIC-V2-DESIGN.md sections 7+8.
2026-05-23 11:31:27 +01:00
2 changed files with 159 additions and 41 deletions

View File

@@ -128,7 +128,9 @@ function register(api: PluginAPI): void {
/** Resolve agent ID from env, config, or fallback. */
function resolveAgentId(): string {
if (process.env.AGENT_ID) return process.env.AGENT_ID;
const cfg = api.runtime?.config?.loadConfig?.();
// Read from cached `api.config` first — see pushMetaToMonitor for why
// the deprecated `api.runtime?.config?.loadConfig?.()` path is heavy.
const cfg = (api as any).config ?? api.runtime?.config?.loadConfig?.();
return cfg?.agents?.list?.[0]?.id ?? cfg?.agents?.defaults?.id ?? 'unknown';
}
@@ -184,6 +186,25 @@ function register(api: PluginAPI): void {
* Push OpenClaw metadata to the Monitor bridge.
* This enriches Monitor heartbeats with OpenClaw version/plugin/agent info.
* Failures are non-fatal — Monitor continues to work without this data.
*
* IMPORTANT — read config from the cached `api.config` surface, NOT from
* the deprecated `api.runtime?.config?.loadConfig?.()` path. The
* deprecated path triggers a full plugin-metadata-snapshot rebuild on
* every call: realpathSync walks every plugin's package.json + manifest
* + source paths (lstats up the directory tree), `hashWatchedFiles`
* fingerprints all watched plugin files, and `discoverInDirectory`
* re-scans every `dist/extensions/<plugin>` dir. On t2 with ~100 plugins
* each rebuild costs ~6-7s of CPU; with this push firing every 30s
* (default reportIntervalSec) the chronic baseline was ~22-25% gateway
* CPU even with zero agent activity (V8 profile 2026-05-27 08:14:00 60s:
* lstat 44.2%, statSync 6.9%, hashWatchedFiles via memo key 1.7%, all
* routed through readPersistedInstalledPluginIndexInstallRecordsSync ->
* discoverInDirectory). Switching to `api.config` reads from the
* already-loaded snapshot cache; the elsewhere-in-this-file pattern was
* already `api.config ?? api.runtime?.config?.loadConfig?.()`.
*
* Same fix is applied to `resolveAgentId` below — that's read once at
* gateway start so the impact is smaller, but it's the same anti-pattern.
*/
async function pushMetaToMonitor() {
const bridgeClient = getBridgeClient();
@@ -191,7 +212,7 @@ function register(api: PluginAPI): void {
let agentNames: string[] = [];
try {
const cfg = api.runtime?.config?.loadConfig?.();
const cfg = (api as any).config ?? api.runtime?.config?.loadConfig?.();
const agentsList = cfg?.agents?.list;
if (Array.isArray(agentsList)) {
agentNames = agentsList
@@ -307,21 +328,22 @@ function register(api: PluginAPI): void {
)}\n\`\`\``;
}
// First-line ack `WAKEUP_OK` is the plugin's ack-receipt token; the
// agent MUST then continue in the same session and drive the
// `hf-wakeup` workflow to completion (calendar_status → task fetch →
// sub-workflow → calendar_complete/abort). Without that continuation
// the scheduler keeps re-waking every 30s because the slot stays
// `not_started` forever.
// The wakeup dispatcher's `deliver` callback below only logs the
// reply text — it does NOT inspect any ack token. The earlier
// `WAKEUP_OK` first-line-ack convention was prompt-only theatre;
// nothing in this plugin or in openclaw acted on it. The only
// thing that ends a wake cycle is the slot transitioning out of
// `not_started`, which happens when the agent calls
// `harborforge_calendar_complete` or `harborforge_calendar_abort`.
// Tell the agent that plainly instead of asking for a fake ack.
const wakeupMessage =
`You have due slots. **First line of your reply MUST be exactly ` +
`\`WAKEUP_OK\`** so the plugin records the ack. Then, **in this ` +
`same session**, drive the \`hf-wakeup\` workflow of skill ` +
`\`hf-hangman-lab\` to completion — read slot context, call the ` +
`harborforge_calendar_* tools, route to the right sub-workflow, ` +
`and finish with harborforge_calendar_complete or abort. Do NOT ` +
`stop after the ack — the scheduler will re-wake you every 30s ` +
`until the slot transitions out of \`not_started\`.${slotBlock}`;
`You have due slots. Drive the \`hf-wakeup\` workflow of skill ` +
`\`hf-hangman-lab\` to completion in this session — read slot ` +
`context, call the harborforge_calendar_* tools, route to the ` +
`right sub-workflow, and finish with harborforge_calendar_complete ` +
`or harborforge_calendar_abort. The scheduler keeps re-waking you ` +
`every 30s until the slot transitions out of \`not_started\`, so ` +
`partial work or silence just produces another wake.${slotBlock}`;
const result = await dispatchInboundMessageWithDispatcher({
ctx: {
@@ -396,6 +418,94 @@ function register(api: PluginAPI): void {
}
}
// Cross-plugin exposure: agent status lookup for other plugins
// (currently Fabric.OpenclawPlugin uses this to skip delivering
// `announce` channel messages to busy agents — see DIALECTIC-V2
// design doc, Phase 1). Backed by calendarBridge.getAgentStatus
// with a small TTL cache to avoid hammering the HF backend.
type HfStatus = 'idle' | 'on_call' | 'busy' | 'exhausted' | 'offline';
const HF_STATUS_CACHE_TTL_MS = 30_000;
const hfStatusCache = new Map<string, { status: HfStatus; at: number }>();
const _G = globalThis as Record<string, unknown>;
_G['__hfAgentStatus'] = {
async get(agentId: string): Promise<HfStatus | undefined> {
if (!agentId) return undefined;
const cached = hfStatusCache.get(agentId);
if (cached && Date.now() - cached.at < HF_STATUS_CACHE_TTL_MS) {
return cached.status;
}
try {
const status = await calendarBridge.getAgentStatus(agentId);
if (status) {
const typed = status as HfStatus;
hfStatusCache.set(agentId, { status: typed, at: Date.now() });
return typed;
}
} catch {
/* fall through to cached-or-undefined */
}
return cached?.status;
},
/**
* Approximate "does agent have an on_call slot covering [from, to]?"
* for cross-plugin pre-check use (currently:
* Dialectic.OpenclawPlugin's signup HF coverage).
*
* v1 honest scope: we only have today's slots in scheduleCache
* (synced from /calendar/sync which is today-only). Returns:
* - true iff window is same-day AND some cached on_call slot
* starts <= from AND ends >= to
* - false iff window is same-day AND no such slot
* - undefined for cross-day windows OR cache empty for this
* agent (caller treats undefined as "I don't know" — see
* Dialectic plugin's hf-precheck.ts which degrades to
* "skipped" gracefully)
*
* Phase TBD: when HF backend ships a `/calendar/slots?agent&from&to`
* endpoint, swap this to call it for arbitrary windows. Until then,
* same-day-only coverage gates ~all debates created by analyze-intel
* (which schedules <2h windows) without needing a backend change.
*/
async hasOnCallCovering(
agentId: string,
fromIso: string,
toIso: string,
): Promise<boolean | undefined> {
if (!agentId || !fromIso || !toIso) return undefined;
const from = new Date(fromIso);
const to = new Date(toIso);
if (isNaN(from.getTime()) || isNaN(to.getTime())) return undefined;
if (!(from < to)) return undefined;
// Cross-day → cache only has today; can't decide.
const fromDate = from.toISOString().slice(0, 10);
const toDate = to.toISOString().slice(0, 10);
if (fromDate !== toDate) return undefined;
// Cache's cachedDate must match our window's date.
const cacheStatus = scheduleCache.getStatus();
if (cacheStatus.cachedDate !== fromDate) return undefined;
const slots = scheduleCache.getAgentSlots(agentId);
if (slots.length === 0) return undefined; // cache empty for this agent — can't decide
for (const s of slots) {
if (s.slot_type !== 'on_call') continue;
// status: ignore aborted/cancelled, accept not_started / ongoing / finished
if (s.status === 'aborted' || s.status === 'cancelled') continue;
const startStr = s.scheduled_at;
if (typeof startStr !== 'string') continue;
// scheduled_at can be HH:MM:SS (cache-relative date) or full ISO
const start =
/^\d{2}:\d{2}(:\d{2})?$/.test(startStr)
? new Date(`${fromDate}T${startStr}Z`)
: new Date(startStr);
if (isNaN(start.getTime())) continue;
const dur = typeof s.estimated_duration === 'number' ? s.estimated_duration : 0;
const end = new Date(start.getTime() + dur * 60_000);
if (start <= from && end >= to) return true;
}
return false;
},
};
// Track wakes already dispatched for a slot in the current sync
// window — the simplified inline scheduler does not PATCH slot
// status server-side, so without dedupe the check loop re-wakes

View File

@@ -31,6 +31,7 @@ const OLD_PLUGIN_NAME = 'harborforge-monitor';
const PLUGIN_SRC_DIR = join(__dirname, 'plugin');
const SKILLS_SRC_DIR = join(__dirname, 'skills');
const MONITOR_REPO_URL = 'https://git.hangman-lab.top/zhi/HarborForge.Monitor.git';
const CLI_REPO_URL = 'https://git.hangman-lab.top/zhi/HarborForge.Cli.git';
const args = process.argv.slice(2);
const options = {
@@ -43,6 +44,7 @@ const options = {
installCli: args.includes('--install-cli'),
installMonitor: 'no',
monitorBranch: 'main',
cliBranch: 'main',
};
const profileIdx = args.indexOf('--openclaw-profile-path');
@@ -60,6 +62,11 @@ if (monitorBranchIdx !== -1 && args[monitorBranchIdx + 1]) {
options.monitorBranch = String(args[monitorBranchIdx + 1]);
}
const cliBranchIdx = args.indexOf('--cli-branch');
if (cliBranchIdx !== -1 && args[cliBranchIdx + 1]) {
options.cliBranch = String(args[cliBranchIdx + 1]);
}
function resolveOpenclawPath() {
if (options.openclawProfilePath) return options.openclawProfilePath;
if (process.env.OPENCLAW_PATH) return resolve(process.env.OPENCLAW_PATH);
@@ -316,39 +323,40 @@ async function installCli() {
if (!options.installCli) return;
const totalSteps = 6;
logStep(5, totalSteps, 'Building and installing hf CLI...');
const openclawPath = resolveOpenclawPath();
const binDir = join(openclawPath, 'bin');
mkdirSync(binDir, { recursive: true });
// Find CLI source — look for HarborForge.Cli relative to project root
const projectRoot = resolve(__dirname, '..');
const cliDir = join(projectRoot, 'HarborForge.Cli');
if (!existsSync(cliDir)) {
// Try parent directory (monorepo layout)
const monoCliDir = resolve(projectRoot, '..', 'HarborForge.Cli');
if (!existsSync(monoCliDir)) {
logErr(`Cannot find HarborForge.Cli at ${cliDir} or ${monoCliDir}`);
logWarn('Skipping CLI installation');
return;
}
}
const effectiveCliDir = existsSync(cliDir)
? cliDir
: resolve(projectRoot, '..', 'HarborForge.Cli');
log(` Building hf from ${effectiveCliDir}...`, 'blue');
// Clone CLI repo to /tmp, build there, copy artifact out. Mirrors
// installManagedMonitor so the install never depends on a checked-out
// sibling repo at a fixed path.
const tmpDir = join('/tmp', `harborforge-cli-${Date.now()}`);
const hfBinary = join(binDir, 'hf');
try {
const hfBinary = join(binDir, 'hf');
exec(`go build -o ${hfBinary} ./cmd/hf`, { cwd: effectiveCliDir, silent: !options.verbose });
log(` Cloning ${CLI_REPO_URL} (branch ${options.cliBranch}) → ${tmpDir}...`, 'blue');
exec(`git clone --branch ${shellEscape(options.cliBranch)} --depth 1 ${shellEscape(CLI_REPO_URL)} ${shellEscape(tmpDir)}`, { silent: !options.verbose });
// Stamp the binary with the version string the prod CLI surfaces in
// `hf version`. Fall back to a date-only label if rev-parse fails for
// any reason (shallow clone shouldn't, but be defensive).
let versionLabel = `${new Date().toISOString().slice(0, 10)}+install`;
try {
const sha = exec(`git rev-parse --short HEAD`, { cwd: tmpDir, silent: true }).trim();
if (sha) versionLabel = `${new Date().toISOString().slice(0, 10)}+${options.cliBranch}-${sha}`;
} catch { /* keep fallback */ }
log(` Building hf (version=${versionLabel})...`, 'blue');
const ldflags = `-X git.hangman-lab.top/zhi/HarborForge.Cli/internal/commands.Version=${versionLabel}`;
exec(`go build -ldflags ${shellEscape(ldflags)} -o ${shellEscape(hfBinary)} ./cmd/hf`, { cwd: tmpDir, silent: !options.verbose });
chmodSync(hfBinary, 0o755);
logOk(`hf binary → ${hfBinary}`);
logOk(`hf binary → ${hfBinary} (branch hint: ${options.cliBranch})`);
} catch (err) {
logErr(`Failed to build hf CLI: ${err.message}`);
logWarn('CLI installation failed, plugin still installed');
} finally {
rmSync(tmpDir, { recursive: true, force: true });
}
}