feat: Phase F-3 + F-4 — exp backoff + agent tool surface (batch 1)
F-3 refinements:
- internal/inbound: replace fixed 3s reconnect wait with exponential
backoff (1s → 60s, ×2, reset when prior session lasted >30s); proxy
for "healthy" vs "flapping" and avoids hot reconnect loops when the
server is sick
F-4 agent tool surface (port of openclaw plugin's tools.ts):
- internal/tools/tools.go (~370 LOC): Registry binds Deps {Client,
Tokens, Identities} and exposes 8 agent-facing tools:
fabric-send-message post a normal message to any channel
fabric-send-sys-msg post a kind=sys message (bypasses turn engine)
fabric-channel-list list channels visible in a guild
fabric-guild-list list guilds the agent is in
fabric-message-history paginate channel messages by seq
fabric-channel-set-purpose PATCH the channel's purpose
fabric-channel fetch metadata + members for one channel
fabric-canvas get/share/update/remove channel canvas
- internal/tools/contracts.go: static ToolContract list — kept in sync
with install.sh's manifest emitter
- Every agent-scoped tool requires agent_id in input args (Plexum SDK
doesn't propagate calling agent id through CallTool today)
- guild_node_id defaults to agent's first guild for fabric-send-message
internal/fabric/client.go: new REST methods needed by tools —
PostSystemMessage, CreateChannel, CloseChannel, JoinChannel,
LeaveChannel, SetChannelPurpose, GetCanvas, ShareCanvas, UpdateCanvas,
RemoveCanvas, SyncCommands.
cmd/plexum-fabric-channel-plugin/main.go:
- Manifest declares the tool surface via tools.New(...).Contracts()
- CallTool dispatches "send" to handleSend (outbound for channel
manager), everything else to tools.Registry.Handler(name)
scripts/install.sh:
- Manifest tools[] now lists all 9 tools with schemas — matches what
internal/tools/contracts.go advertises
Live verified against running Fabric stack:
$ plexum plugin-call fabric-guild-list '{"agent_id":"fabrictester"}'
→ "guilds for agent fabrictester (1): test-guild2 @ http://localhost:7003"
$ plexum plugin-call fabric-channel-list '{...,"guild_node_id":"test-guild2"}'
→ 2 channels listed
$ plexum plugin-call fabric-message-history '{...,"limit":5}'
→ 5 messages with timestamps + authors
F-5+ deferred:
- create-{chat,work,report,discussion}-channel (batch 2)
- sub-discussion family (state store + 3 tools)
- presence-sync + command-sync
- attachments
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -32,10 +32,19 @@ import (
|
||||
// that arrived without a push event. Matches openclaw's 60s.
|
||||
const ChannelSyncInterval = 60 * time.Second
|
||||
|
||||
// ReconnectBackoff is the wait between reconnect attempts when a
|
||||
// socket drops + Connect returns. Keep small for snappier recovery;
|
||||
// exponential backoff is a future improvement.
|
||||
const ReconnectBackoff = 3 * time.Second
|
||||
// ReconnectBackoffInitial / Max / Factor drive exponential backoff
|
||||
// between reconnect attempts. Starts at 1s, doubles up to 60s; resets
|
||||
// on successful connect (signalled by connectOnce returning after at
|
||||
// least one event was received — proxied via wall-clock duration).
|
||||
const (
|
||||
ReconnectBackoffInitial = 1 * time.Second
|
||||
ReconnectBackoffMax = 60 * time.Second
|
||||
ReconnectBackoffFactor = 2.0
|
||||
// ReconnectResetAfter: if the previous connection survived at least
|
||||
// this long, reset the backoff to Initial (we made meaningful
|
||||
// progress; transient drop, not a flapping-failure loop).
|
||||
ReconnectResetAfter = 30 * time.Second
|
||||
)
|
||||
|
||||
// Notifier pushes one inbound message to the Plexum host. The plugin
|
||||
// main wires this to HostAPI.EmitNotification.
|
||||
@@ -110,27 +119,41 @@ func (s *Supervisor) Run(ctx context.Context) error {
|
||||
}
|
||||
|
||||
// runAgentGuild keeps one socket.io connection alive for (agent, guild)
|
||||
// until ctx cancels. Reconnects with fresh auth on every drop.
|
||||
// until ctx cancels. Reconnects with fresh auth on every drop using
|
||||
// exponential backoff (resets if the previous session survived long
|
||||
// enough to look healthy).
|
||||
func (s *Supervisor) runAgentGuild(ctx context.Context, agentID string, guild fabric.GuildInfo) {
|
||||
logger := s.Logger.With("agent", agentID, "guild", guild.NodeID)
|
||||
backoff := ReconnectBackoffInitial
|
||||
for {
|
||||
if err := ctx.Err(); err != nil {
|
||||
return
|
||||
}
|
||||
startedAt := time.Now()
|
||||
err := s.connectOnce(ctx, agentID, guild, logger)
|
||||
if ctx.Err() != nil {
|
||||
return
|
||||
}
|
||||
// Reset backoff if the prior session was meaningfully long
|
||||
// (proxy for "we connected + did work" — not a flap loop).
|
||||
if time.Since(startedAt) > ReconnectResetAfter {
|
||||
backoff = ReconnectBackoffInitial
|
||||
}
|
||||
errStr := "(nil)"
|
||||
if err != nil {
|
||||
errStr = err.Error()
|
||||
}
|
||||
logger.Warn("inbound: socket connection ended; reconnecting", "err", errStr)
|
||||
logger.Warn("inbound: socket ended; reconnecting",
|
||||
"err", errStr, "backoff", backoff.String())
|
||||
select {
|
||||
case <-time.After(ReconnectBackoff):
|
||||
case <-time.After(backoff):
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
backoff = time.Duration(float64(backoff) * ReconnectBackoffFactor)
|
||||
if backoff > ReconnectBackoffMax {
|
||||
backoff = ReconnectBackoffMax
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user