fix(channel): add describeAccount so health-monitor sees real configured state #10
Reference in New Issue
Block a user
Delete Branch "fix/describe-account-stops-default-restart-loop"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
[fabric:default] health-monitor: restarting (reason: stopped)every ~10 minutes on prod t2 (and sim) since forever. Root cause traced today.openclaw'schannelManager.getRuntimeSnapshot()— whatchannel-health-monitorreads — runs each account through:Fabric never defined
describeAccount, so every cycle:For the synthetic
defaultaccount (listFabricAccountIdsfalls back to[default]whenchannels.fabric.accountsis empty — the prod shape),runningis permanently false because fabric'sgateway.startAccountis absent, sostartChannelInternalreturns early. The restart action is a no-op — but the log noise wasted real triage time today chasing it as a real failure.(
status -- openclaw status --json's account snapshot path goes throughbuildChannelAccountSnapshotFromAccount, which DOES callisConfigured(account)and so reportedconfigured: falsecorrectly. That's why CLI displayedFabric default: enabled, not configuredwhile health-monitor saw the opposite.)What
Add
describeAccountmirroringisConfigured:Real per-agent accounts (managed via
~/.openclaw/fabric-identity.jsonon prod) still go throughgateway_start→FabricInbound.start()as before. The framework just no longer thinksdefault(or any keyless account) is something it should restart.Verification (sim)
Temporary
console.loginstrumentation inchannel-health-policy-D_eDwUBm.jsconfirmed:evaluateChannelHealthgot{enabled:true, configured:true, accountId:"default"}every 10min → restart fired{enabled:true, configured:false, accountId:"default"}every cycle →isManagedAccount=false→ unmanaged → no restartSim gateway up 8+ minutes with the patch, 0 restart events (pre-patch sim with same config restarted at 5min mark).
Pending separately
Not covered here:
_fabricInboundStartedglobal guard prevents plugin-only reload from re-binding inbound (needs full gateway restart to pick up plugin code changes). Out of scope.channels.fabric.accounts, those accounts getrunning:falsefor the same reasondefaultdoes — fabric doesn't implementgateway.startAccountlifecycle. That would still restart. The current prod doesn't use that shape (everything goes through identity registry), so this PR makes the existing prod silent. Refactoring to the framework lifecycle is a bigger separate piece.🤖 Generated with Claude Code
openclaw's `channelManager.getRuntimeSnapshot()` — called every minute by the channel-health-monitor — runs accounts through `applyDescribedAccountFields(next, plugin.config.describeAccount?.(...))`. When the callback is missing it defaults `configured: true`. Fabric never defined it, so every health-monitor cycle: snapshot = { enabled: true, configured: true, running: false } For fabric's synthetic 'default' account (returned by `listFabricAccountIds` when `channels.fabric.accounts` is empty — the prod shape, where per-agent api-keys live in `~/.openclaw/fabric-identity.json` and the channel framework never runs `startAccount` so `running` stays false): isManagedAccount({enabled:true, configured:true}) === true -> not-running -> 'stopped' -> restart every ~10 min, logging '[fabric:default] health-monitor: restarting (reason: stopped)' The restart is a no-op (fabric's `gateway.startAccount` is absent so `startChannelInternal` returns early), but the log is loud and operators chasing real outages keep wasting time on it. Mirror `isConfigured` from describeAccount so the snapshot truthfully reports configured:false for any account without a fabricApiKey. The fabric plugin still self-manages real agents via `gateway_start` -> `FabricInbound.start()`; the framework just no longer thinks 'default' is something it should restart. Verified in sim (this patch alone, no debug instrumentation): - gateway up 8+ minutes, 0 restart events - pre-patch sim with same config restarted at 5min mark - evaluateChannelHealth snapshot for both 'default' and 'recruiter' accountId reads configured:false (instrumented with temporary console.log in channel-health-policy, since reverted) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>