Summary
Bayesian per-agent reputation scoring for robust multi-agent coordination, derived from the RAPS architecture. Directly extends Zeph's Thompson Sampling AgentRouter with a principled malice/degradation detection layer.
Source: arXiv 2602.08009 — "Towards Adaptive, Scalable, and Robust Coordination of LLM Agents: A Dynamic Ad-Hoc Networking Perspective" (Li et al., 2026)
Technique (reputation component only)
Each agent/model maintains a Beta-distribution reputation score per peer based on observed behavior (task success, output quality, response time vs. expectation). Agents with degraded reputation are automatically down-weighted in routing without central coordination. Reputation updates are Bayesian: success → α += 1, failure → β += 1 on a Beta(α, β) prior.
This is structurally identical to Thompson Sampling's Beta distribution already used in zeph-llm's router — it is essentially adding a quality/reliability dimension alongside the latency EMA.
Applicability to Zeph
MEDIUM-HIGH. Zeph already uses Thompson Sampling (Beta distribution) in AgentRouter for model selection exploration. The Bayesian reputation layer is an extension of the same math applied to output quality:
- Extend
ModelStats in the router with a reputation: Beta field tracking quality outcomes (not just latency)
- Feed tool execution errors, LLM parse failures, and plan failures back as reputation signals
- Route away from degraded models/agents proportionally to reputation score decay
- Integrates with
#1841 (Agent Stability Index) — ASI provides the coherence signal; reputation tracks the cumulative outcome history
Implementation sketch
- Extend
ModelStats with quality_alpha: f64, quality_beta: f64 (Beta parameters)
- On tool execution / plan step: record success/failure → update quality params
- Routing score: combine EMA latency weight with reputation sample
Beta(α,β).sample()
- New config key:
[routing.reputation] enabled = false, decay_factor = 0.95
Summary
Bayesian per-agent reputation scoring for robust multi-agent coordination, derived from the RAPS architecture. Directly extends Zeph's Thompson Sampling AgentRouter with a principled malice/degradation detection layer.
Source: arXiv 2602.08009 — "Towards Adaptive, Scalable, and Robust Coordination of LLM Agents: A Dynamic Ad-Hoc Networking Perspective" (Li et al., 2026)
Technique (reputation component only)
Each agent/model maintains a Beta-distribution reputation score per peer based on observed behavior (task success, output quality, response time vs. expectation). Agents with degraded reputation are automatically down-weighted in routing without central coordination. Reputation updates are Bayesian: success → α += 1, failure → β += 1 on a Beta(α, β) prior.
This is structurally identical to Thompson Sampling's Beta distribution already used in
zeph-llm's router — it is essentially adding a quality/reliability dimension alongside the latency EMA.Applicability to Zeph
MEDIUM-HIGH. Zeph already uses Thompson Sampling (Beta distribution) in AgentRouter for model selection exploration. The Bayesian reputation layer is an extension of the same math applied to output quality:
ModelStatsin the router with areputation: Betafield tracking quality outcomes (not just latency)#1841(Agent Stability Index) — ASI provides the coherence signal; reputation tracks the cumulative outcome historyImplementation sketch
ModelStatswithquality_alpha: f64,quality_beta: f64(Beta parameters)Beta(α,β).sample()[routing.reputation] enabled = false, decay_factor = 0.95