Files
OpenclawPerfCache/plugin/index.ts
hzhang 49bcde41ec init: OpenClaw Perf Cache — fs.{stat,lstat,realpath}{,Sync} TTL memo for plugin-tree paths
Wraps the global fs functions with a 1s TTL memo, scoped via path
whitelist to plugin-discovery paths only. Workaround for upstream
openclaw issue #86791: `loadPluginMetadataSnapshot()`'s cache-validity
check re-runs `hashWatchedFiles` on every lookup, which walks every
plugin's package.json + manifest + source via realpathSync ->
ancestor lstat chain. On prod t2 with ~100 plugins, one cache-check
pass is ~6 400 lstat + ~400 stat (~6-7s CPU per call). Fires on every
agent turn, every loadConfig() call, every channel routing decision.

This plugin doesn't fix the upstream design; it just absorbs the
repeated stats within a 1s window so the same paths aren't re-statted
6× per second during a discovery walk.

Verified on prod t2 (2026-05-27):
  - Cache hit ratio: 92.1-98.2% (stable across windows)
  - Idle baseline (0 turn, 0 push): 0.6-3.7% CPU (was 25%+ pre-fix)
  - Per-turn cost: notably reduced; previously 100% sustained per turn

Path whitelist:
  - /openclaw/dist/extensions/
  - /.openclaw/plugins/
  - /node_modules/@openclaw/
  - /openclaw/plugin-sdk/

All other paths pass through to original fs functions unchanged.

Manifest requires `activation.onStartup: true` so openclaw register()s
the plugin even though it exposes no tools/contracts (otherwise jiti
caches the module without ever calling register).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 10:17:54 +01:00

278 lines
10 KiB
TypeScript
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
/**
* OpenClaw Perf Cache — fs.stat/lstat/realpath TTL memo for plugin-tree paths.
*
* Why this exists
* ===============
*
* Upstream openclaw's `loadPluginMetadataSnapshot()` (in
* `dist/plugin-metadata-snapshot-*.js`) maintains a memo cache of the plugin
* registry, but the cache-validity check it runs on every lookup itself does
* O(N) filesystem work:
*
* resolvePersistedRegistryMemoStateForLookup(params, memo):
* ...
* if (registryState && contextHash matches && fastHash matches
* && hashWatchedFiles(registryState.watchedFiles) === registryState.watchedFilesHash)
* return registryState; // ← `hashWatchedFiles` re-fingerprints every watched file
*
* `hashWatchedFiles` calls `fileFingerprint(path)` (statx) for every plugin
* package.json + openclaw.plugin.json + source + setupSource, and the
* watched-file collection is built by `persistedPluginFileFingerprint` which
* in turn calls `resolvePluginFilePath` -> `tryRealpath` (which is
* `fs.realpathSync`, walking the ancestor chain via `lstat` for each path
* segment).
*
* On prod t2 (~100 installed plugins, mostly bundled extensions under
* `/usr/lib/node_modules/openclaw/dist/extensions/<id>/`), one cache-check
* call costs roughly:
*
* 100 plugins × 4 watched files/plugin × 2 realpath/file × ~8 lstat/realpath
* ≈ 6400 lstat + ~400 stat per call (~6-7s of CPU per call)
*
* The deprecated `loadConfig()` path was firing one of these every 30s from
* HF plugin's `pushMetaToMonitor` (separate fix: zhi/HarborForge.OpenclawPlugin#11),
* and every agent turn fires one too (per the tool middleware loader). The
* push-driven baseline is gone; the per-turn cost is the remaining chronic
* load.
*
* Two upstream tickets have closed without a fix for this same hot path:
* - https://github.com/openclaw/openclaw/issues/67040 (closed as not planned)
* - https://github.com/openclaw/openclaw/issues/75297 (closed, no fix; rollback to 2026.4.23 was the workaround)
*
* What this plugin does
* =====================
*
* On `register()` (which runs before any agent turn), wrap the global
* `fs.statSync`, `fs.lstatSync`, `fs.realpathSync` and their `fs.promises`
* counterparts with a small TTL memo. The wrapper is a no-op (pass-through)
* for any path that is NOT under a plugin tree, so general fs use elsewhere
* is unaffected.
*
* Path whitelist (anything else falls through to the original):
* - `/openclaw/dist/extensions/` (bundled openclaw channel SDKs)
* - `/.openclaw/plugins/` (user-installed plugins)
* - `/node_modules/@openclaw/` (managed npm plugin packages)
* - `/openclaw/plugin-sdk/` (SDK module imports)
*
* TTL: 1000ms. Within that window, repeated stats of the same path return the
* cached result. Two cache-check calls back-to-back (which is what the
* snapshot lookup does on each invocation) now cost ~0 instead of ~7s.
*
* Safety
* ======
*
* - The wrappers are bound on the original functions, so `this` and the full
* argument list are preserved (including `options` like `{ bigint: true }`).
* - Cache key includes a JSON of the trailing args so different option shapes
* for the same path don't collide (e.g. `statSync(p)` vs `statSync(p,{bigint:true})`).
* - Pass-through for non-plugin paths: business code (logs, session files,
* skills/, secrets/, anything outside the whitelist) sees the unmodified fs.
* - 1s TTL: plugin manifest mtime resolution is ms-level, so a manifest change
* becomes visible at most ~1s later. dev-loop impact is negligible.
* - Bounded memory: cache.clear() when >4000 entries (~few hundred KB max).
* - Idempotent: a sentinel flag prevents double-wrapping across plugin reloads.
* - Counts are tracked and logged every minute so we can see hit ratio in
* journalctl and validate the workaround is actually firing.
*
* If openclaw ever fixes the upstream cache-validity-check (issue text in
* the comment above), this plugin can be uninstalled without consequence.
*/
import fs from 'node:fs';
import type { Stats, BigIntStats } from 'node:fs';
const TTL_MS = 1000;
const SOFT_CAP = 4000;
// Path-prefix substring match. Anything that matches → memoized. Anything
// that doesn't → pass-through to the original. Keep the list short and only
// add patterns where the same path is statted many times per second by the
// plugin-discovery hot path.
const HOT_PATH_NEEDLES = [
'/openclaw/dist/extensions/',
'/.openclaw/plugins/',
'/node_modules/@openclaw/',
'/openclaw/plugin-sdk/',
];
function isHotPath(p: unknown): p is string {
if (typeof p !== 'string') return false;
for (const needle of HOT_PATH_NEEDLES) {
if (p.includes(needle)) return true;
}
return false;
}
interface CacheEntry {
result: unknown;
isError: boolean;
expiresAt: number;
}
const cache = new Map<string, CacheEntry>();
const counters = { hits: 0, misses: 0, passthrough: 0, errors: 0 };
function evictIfFull() {
if (cache.size > SOFT_CAP) cache.clear();
}
function buildKey(name: string, path: string, args: unknown[]): string {
// args[0] for these is typically a Buffer encoding or {bigint:true}
// option — keep it in the key so different shapes don't collide.
let opts = '';
if (args.length > 0) {
try { opts = JSON.stringify(args); } catch { opts = String(args.length); }
}
return `${name}\x00${path}\x00${opts}`;
}
function wrapSync<F extends (...args: any[]) => any>(
name: string,
orig: F,
): F {
const wrapped = function (this: unknown, path: unknown, ...rest: any[]): any {
if (!isHotPath(path)) {
counters.passthrough++;
return orig.call(this, path, ...rest);
}
const now = Date.now();
const key = buildKey(name, path, rest);
const hit = cache.get(key);
if (hit && hit.expiresAt > now) {
counters.hits++;
if (hit.isError) throw hit.result;
return hit.result;
}
counters.misses++;
evictIfFull();
let result: unknown;
let isError = false;
try {
result = orig.call(this, path, ...rest);
} catch (err) {
// ENOENT and friends — cache the error too so repeated "does this file
// exist" probes don't restat the kernel. Same TTL applies.
result = err;
isError = true;
counters.errors++;
}
cache.set(key, { result, isError, expiresAt: now + TTL_MS });
if (isError) throw result;
return result;
};
return wrapped as unknown as F;
}
function wrapAsync<F extends (...args: any[]) => Promise<any>>(
name: string,
orig: F,
): F {
const wrapped = async function (this: unknown, path: unknown, ...rest: any[]): Promise<any> {
if (!isHotPath(path)) {
counters.passthrough++;
return orig.call(this, path, ...rest);
}
const now = Date.now();
const key = buildKey(name, path, rest);
const hit = cache.get(key);
if (hit && hit.expiresAt > now) {
counters.hits++;
if (hit.isError) throw hit.result;
return hit.result;
}
counters.misses++;
evictIfFull();
try {
const result = await orig.call(this, path, ...rest);
cache.set(key, { result, isError: false, expiresAt: now + TTL_MS });
return result;
} catch (err) {
counters.errors++;
cache.set(key, { result: err, isError: true, expiresAt: now + TTL_MS });
throw err;
}
};
return wrapped as unknown as F;
}
interface PluginAPI {
logger?: {
info?: (...args: unknown[]) => void;
warn?: (...args: unknown[]) => void;
debug?: (...args: unknown[]) => void;
};
on?: (event: string, handler: () => void) => void;
}
const SENTINEL = '__openclawPerfCacheInstalled' as const;
const _G = globalThis as Record<string, unknown>;
let statsTimer: ReturnType<typeof setInterval> | null = null;
function install(logger: PluginAPI['logger']): void {
if (_G[SENTINEL]) {
logger?.debug?.('[perf-cache] already installed; skipping');
return;
}
_G[SENTINEL] = true;
// Sync versions
fs.statSync = wrapSync('statSync', fs.statSync) as typeof fs.statSync;
fs.lstatSync = wrapSync('lstatSync', fs.lstatSync) as typeof fs.lstatSync;
fs.realpathSync = wrapSync('realpathSync', fs.realpathSync) as typeof fs.realpathSync;
// Async (promises) versions — `lstat` was hot on prod profile (~38% even
// after the push-driven baseline was killed)
const fsp = fs.promises;
fsp.stat = wrapAsync('stat', fsp.stat.bind(fsp)) as typeof fsp.stat;
fsp.lstat = wrapAsync('lstat', fsp.lstat.bind(fsp)) as typeof fsp.lstat;
fsp.realpath = wrapAsync('realpath', fsp.realpath.bind(fsp)) as typeof fsp.realpath;
logger?.info?.(
'[perf-cache] installed fs.{stat,lstat,realpath}Sync + fs.promises.{stat,lstat,realpath} ' +
`with ${TTL_MS}ms TTL for plugin-tree paths only`,
);
// Periodic counter log so we can see hit ratio in journalctl. Reset after
// logging so each line is "since last log".
statsTimer = setInterval(() => {
const total = counters.hits + counters.misses;
if (total === 0 && counters.passthrough === 0) return;
const hitRatio = total > 0 ? ((counters.hits / total) * 100).toFixed(1) : '0.0';
logger?.info?.(
`[perf-cache] last 60s: hits=${counters.hits} misses=${counters.misses} ` +
`(hit-ratio ${hitRatio}%) passthrough=${counters.passthrough} ` +
`errors=${counters.errors} cache_size=${cache.size}`,
);
counters.hits = 0;
counters.misses = 0;
counters.passthrough = 0;
counters.errors = 0;
}, 60_000);
// Don't keep the process alive just for this timer.
statsTimer.unref?.();
}
export default {
id: 'openclaw-perf-cache',
name: 'OpenClaw Perf Cache',
register(api: PluginAPI): void {
// Install *immediately* on register() — the snapshot lookups happen during
// plugin loading and per agent turn, both of which need the wrapper in
// place before they run. There's no `gateway_start` hook on the critical
// path that fires before those.
install(api.logger);
api.on?.('gateway_stop', () => {
if (statsTimer) {
clearInterval(statsTimer);
statsTimer = null;
}
// Note: we intentionally DON'T uninstall the fs wrappers on gateway_stop.
// jiti caches module instances, and uninstalling at stop with reinstall
// at next start would double-wrap on reload. Leaving the wrappers in
// place is harmless — the sentinel check in install() guarantees we
// never wrap twice.
});
},
};