perf(meta-push): use cached api.config instead of deprecated loadConfig() — kills ~25% chronic baseline CPU #11

Merged
zhi merged 1 commits from fix/meta-push-use-cached-api-config into main 2026-05-27 08:25:34 +00:00
Owner

Why

t2 gateway sustains 22-30% CPU baseline even with zero agent activity / zero turn / zero queued message. V8 profile 2026-05-27 08:14:00 (60s window, 0 session turns, 2 metadata pushes during):

23323 44.2% lstat               <- realpathSync <- lstatSync
 3634  6.9% stat                <- statSync(buildInstalledManifestRegistryIndexKey)
 2542  4.8% tryReadJsonSync     <- discoverInDirectory <- readPersistedInstalledPluginIndexInstallRecordsSync
 2351  4.5% existsSync          <- detectBundleManifestFormat
  906  1.7% hashJson            <- hashWatchedFiles <- computePluginMetadataSnapshotMemoKey

Tracked back to pushMetaToMonitor in this plugin: it called api.runtime?.config?.loadConfig?.() to read the agent name list. That deprecated path (plugin runtime config.loadConfig() is deprecated; use config.current() — emitted on every gateway start) synchronously rebuilds the full plugin-metadata snapshot every call: realpathSync walks every plugin's package.json + manifest + source up the directory tree, hashWatchedFiles fingerprints every watched plugin file, discoverInDirectory re-scans every dist/extensions/<plugin> (~100 of them on prod t2). ~6-7s of CPU per rebuild.

pushMetaToMonitor fires every reportIntervalSec (default 30s) from hooks/gateway-start.js. 100 plugins × ~7s walk / 30s = ~23% sustained = matches the measured baseline almost exactly.

What

pushMetaToMonitor (and resolveAgentId, same anti-pattern, only runs once at gateway-start) now read from the cached (api as any).config ?? api.runtime?.config?.loadConfig?.(). The cached api.config is the snapshot the gateway maintains internally; the deprecated path was a fallback for older host versions. Other code in this same file (line 284, the calendar wakeAgent dispatcher) already uses this patternpushMetaToMonitor was just the only place that hadn't been updated.

Verification plan

Deploy to prod t2, wait 2-3 minutes for the deprecated path to stop firing, take a fresh 60s V8 profile during a zero-turn window. Expected: lstat % drops from ~44% → near 0%, gateway idle baseline back to ~99% idle. Will validate before merging.

Upstream issue (separate)

The underlying openclaw bug is that loadConfig() rebuilds the snapshot rather than returning the cached one. Even the api.config path has its own cache-validity check that walks hashWatchedFiles (~800 statx per call) every time loadPluginMetadataSnapshot runs — the "fast" path is still O(N watched files). That's a chronic baseline overhead per agent turn (~13.5% lstat in single-turn local repro). Worth pushing upstream — separate from this plugin-side fix which just stops the every-30s with no agent activity baseline.

🤖 Generated with Claude Code

## Why t2 gateway sustains 22-30% CPU baseline even with **zero agent activity / zero turn / zero queued message**. V8 profile 2026-05-27 08:14:00 (60s window, 0 session turns, 2 metadata pushes during): ``` 23323 44.2% lstat <- realpathSync <- lstatSync 3634 6.9% stat <- statSync(buildInstalledManifestRegistryIndexKey) 2542 4.8% tryReadJsonSync <- discoverInDirectory <- readPersistedInstalledPluginIndexInstallRecordsSync 2351 4.5% existsSync <- detectBundleManifestFormat 906 1.7% hashJson <- hashWatchedFiles <- computePluginMetadataSnapshotMemoKey ``` Tracked back to `pushMetaToMonitor` in this plugin: it called `api.runtime?.config?.loadConfig?.()` to read the agent name list. That deprecated path (`plugin runtime config.loadConfig() is deprecated; use config.current()` — emitted on every gateway start) **synchronously rebuilds the full plugin-metadata snapshot every call**: realpathSync walks every plugin's package.json + manifest + source up the directory tree, hashWatchedFiles fingerprints every watched plugin file, discoverInDirectory re-scans every `dist/extensions/<plugin>` (~100 of them on prod t2). ~6-7s of CPU per rebuild. `pushMetaToMonitor` fires every `reportIntervalSec` (default 30s) from `hooks/gateway-start.js`. 100 plugins × ~7s walk / 30s = ~23% sustained = matches the measured baseline almost exactly. ## What `pushMetaToMonitor` (and `resolveAgentId`, same anti-pattern, only runs once at gateway-start) now read from the cached `(api as any).config ?? api.runtime?.config?.loadConfig?.()`. The cached `api.config` is the snapshot the gateway maintains internally; the deprecated path was a fallback for older host versions. **Other code in this same file (line 284, the calendar wakeAgent dispatcher) already uses this pattern** — `pushMetaToMonitor` was just the only place that hadn't been updated. ## Verification plan Deploy to prod t2, wait 2-3 minutes for the deprecated path to stop firing, take a fresh 60s V8 profile during a zero-turn window. Expected: lstat % drops from ~44% → near 0%, gateway idle baseline back to ~99% idle. Will validate before merging. ## Upstream issue (separate) The underlying openclaw bug is that `loadConfig()` rebuilds the snapshot rather than returning the cached one. Even the `api.config` path has its own cache-validity check that walks `hashWatchedFiles` (~800 statx per call) every time `loadPluginMetadataSnapshot` runs — the "fast" path is still O(N watched files). That's a chronic baseline overhead per agent turn (~13.5% lstat in single-turn local repro). Worth pushing upstream — separate from this plugin-side fix which just stops the *every-30s with no agent activity* baseline. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
zhi added 1 commit 2026-05-27 08:18:08 +00:00
`pushMetaToMonitor` and `resolveAgentId` were both calling
`api.runtime?.config?.loadConfig?.()` to read the agent list. That
deprecated path (openclaw warns at gateway start:
"plugin runtime config.loadConfig() is deprecated; use config.current()")
synchronously rebuilds the full plugin-metadata snapshot — realpathSync
walks every plugin's package.json + manifest + source up the directory
tree, hashWatchedFiles fingerprints every watched plugin file, and
discoverInDirectory re-scans every `dist/extensions/<plugin>` (~100 of
them on prod t2). Each rebuild costs ~6-7s of gateway CPU.

`pushMetaToMonitor` fires every `reportIntervalSec` (default 30s)
from `hooks/gateway-start.js`. With 100 plugins that put the gateway
into a chronic ~22-30% CPU baseline even with zero agent activity. V8
profile 2026-05-27 08:14:00 60s window (0 turns, 2 metadata pushes
during): lstat 44.2%, statSync(buildInstalledManifestRegistryIndexKey)
6.9%, hashWatchedFiles via memo key 1.7%, all routed through
`readPersistedInstalledPluginIndexInstallRecordsSync` -> per-plugin
`discoverInDirectory`.

Switching to `(api as any).config ?? api.runtime?.config?.loadConfig?.()`
reads from the snapshot cache the gateway already maintains — the same
pattern already used elsewhere in this file (e.g. the calendar wakeAgent
dispatcher at line 284). Same change applied to `resolveAgentId` (only
runs once at start, but same anti-pattern).

This is a plugin-side perf workaround. The underlying openclaw bug is
that `loadConfig()` rebuilds the snapshot rather than returning the
cached one — a chronic 'all sync cache validity checks pay the full
discovery cost' design issue worth pushing upstream separately (the
walks per-call cost we measured here is unrelated to and amplifies any
agent-turn-triggered walk path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zhi merged commit c8998c6b0d into main 2026-05-27 08:25:34 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: zhi/HarborForge.OpenclawPlugin#11