Skip to content

[codex] Refactor LLM reply continuation into agent runs#597

Merged
eanzhao merged 3 commits intofeature/lark-botfrom
refactor/2026-05-08_agent-run-continuation-phase-a
May 8, 2026
Merged

[codex] Refactor LLM reply continuation into agent runs#597
eanzhao merged 3 commits intofeature/lark-botfrom
refactor/2026-05-08_agent-run-continuation-phase-a

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented May 8, 2026

Summary

  • Implements issue 将 ChatRuntime / ChannelLlmReplyInboxRuntime 收敛为 actor-owned Agent Continuation #596 Phase A by replacing the channel LLM reply hosted inbox singleton with run-scoped AgentRunGAgent continuation actors.
  • Adds IChannelLlmReplyRunDispatcher and the NyxidChat dispatcher that creates/gets AgentRunGAgent by correlationId and dispatches typed AgentRunStartRequested commands.
  • Moves the existing LLM reply gates, metadata enrichment, fallback timeout, streaming sink, reply generation, ready/drop dispatch, and relay token echo into the run actor while keeping reply tokens out of persisted run state/events.
  • Updates ConversationGAgent and tests to use the run dispatcher path and retires the old inbox interface/hosted service path.

Validation

  • dotnet build test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo
  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo
  • dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/Aevatar.GAgents.Channel.Protocol.Tests.csproj --nologo
  • bash tools/ci/test_stability_guards.sh
  • bash tools/ci/architecture_guards.sh

Refs #596

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.92%. Comparing base (59fd77a) to head (2e4a1d1).
⚠️ Report is 1 commits behind head on feature/lark-bot.

@@                 Coverage Diff                  @@
##           feature/lark-bot     #597      +/-   ##
====================================================
- Coverage             71.95%   71.92%   -0.04%     
====================================================
  Files                  1258     1258              
  Lines                 91512    91512              
  Branches              11998    11998              
====================================================
- Hits                  65851    65818      -33     
- Misses                20926    20961      +35     
+ Partials               4735     4733       -2     
Flag Coverage Δ
ci 71.92% <ø> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...r.AI.ToolProviders.NyxId/Tools/NyxIdSshExecTool.cs 99.41% <ø> (ø)

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

},
};

await _actorDispatchPort.DispatchAsync(actor.Id, envelope, ct);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[codex] severity=blocker, category=concurrency

This uses synchronous IActorDispatchPort.DispatchAsync to start the run actor. ConversationGAgent awaits this while handling the inbound turn, and AgentRunGAgent later dispatches LlmReplyReadyEvent back to that same conversation actor, so the real Local/Orleans dispatch implementations can form an A -> B -> A wait cycle and also block the webhook for the full LLM run. This violates CLAUDE.md: cross-actor waits must be continuationized. Start the run through a one-way enqueue/publish mechanism that returns after acceptance, then let the run actor deliver the ready/drop signal independently.

.Match(current, evt)
.On<AgentRunStartedEvent>(ApplyStarted)
.On<AgentRunReplyProducedEvent>(ApplyReplyProduced)
.On<AgentRunDroppedEvent>(ApplyDropped)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[glm-5.1] severity=major, category=arch

AgentRunGAgent reaches terminal states (ReplyProduced, Dropped, Failed) but has no self-deactivation or cleanup mechanism. Per AGENTS.md: "默认优先 run/session/task-scoped actor" and "长期 actor 只保留给事实拥有者". Each LLM reply creates a persistent actor that accumulates indefinitely. Consider persisting the terminal event then scheduling a self-deactivation (e.g. IActorRuntime.DeactivateAsync or a delayed cleanup self-message) so the actor does not remain alive after its single-shot purpose is served.

}

internal const long MaxInboxRequestAgeMs = 5 * 60 * 1000;
if (State.Status is AgentRunStatus.ReplyProduced or AgentRunStatus.Dropped or AgentRunStatus.Failed)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 2 models] severity=major, category=bug

The terminal-state dedup guard at this line silently drops undelivered ready/drop notifications when conversation dispatch fails. The actor persists AgentRunReplyProducedEvent / AgentRunFailedEvent / AgentRunDroppedEvent before dispatching to the conversation actor; if that dispatch throws, a retry of the start command will hit this guard and return without re-emitting the signal — ConversationGAgent.PendingLlmReplyRequests stays stuck. The guard also has zero test coverage. Either move the guard to after a confirmed-delivered phase (dispatch retry as a persisted phase), or only mark the run terminal once the conversation acknowledges. Add a test covering both the legitimate-duplicate case and the failed-dispatch retry case.

Per-model verbatim
  • kimi: HandleStartAsync introduces a new terminal-state dedup guard (ignoring duplicate start commands when the run is already ReplyProduced, Dropped, or Failed). This idempotency behavior is not covered by any test in AgentRunGAgentTests.cs. Add a test that sends a second AgentRunStartRequested after the run reaches a terminal state and asserts it is ignored.
  • codex: This terminal duplicate guard can drop undelivered results. The run persists AgentRunReplyProducedEvent / AgentRunFailedEvent before DispatchReadyEventAsync, and persists AgentRunDroppedEvent before the drop notification; if that conversation dispatch fails, a retry of the start command sees terminal state here and returns without re-emitting the ready/drop signal, leaving ConversationGAgent.PendingLlmReplyRequests stuck. Make delivery a retryable persisted phase, or only mark the run terminal after the conversation has accepted/acknowledged the notification.


_logger.LogInformation("Started channel LLM reply inbox on {StreamId}", InboxStreamId);
ArgumentException.ThrowIfNullOrWhiteSpace(correlationId);
return ActorIdPrefix + correlationId.Trim();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[glm-5.1] severity=minor, category=arch

HandleStartAsync calls PersistDomainEventAsync(AgentRunStartedEvent) then synchronously calls ProcessAsync which runs the full LLM generation (up to 300s). If PersistDomainEventAsync for the started event succeeds but ProcessAsync throws an unguarded exception (e.g. from EnsureTargetActorAsync or a subsequent PersistDomainEventAsync call), the actor is left in Started state with no terminal event and no retry path. Wrap the ProcessAsync call in a try/catch that persists AgentRunFailedEvent on any unexpected exception to guarantee terminal state is always reached.

@@ -712,6 +755,8 @@ private sealed class DispatchingActorRuntime(params (string Id, IActor Actor)[]
static pair => pair.Actor,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[glm-5.1] severity=minor, category=test

NoOpEventSourcing.TransitionState returns current unchanged, so AgentRunGAgent.State never advances past AGENT_RUN_STATUS_UNSPECIFIED in any test. This means the duplicate-terminal-start guard at HandleStartAsync line ~78 (State.Status is AgentRunStatus.ReplyProduced or AgentRunStatus.Dropped or AgentRunStatus.Failed) is never exercised. Either wire a real IEventSourcingBehavior that calls TransitionState, or add a dedicated test that pre-populates the state to a terminal status and asserts the guard fires.

@eanzhao eanzhao marked this pull request as ready for review May 8, 2026 13:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c108ef4ab4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +129 to +136
// The run has not entered a terminal state yet. Leaving it Started lets the
// durable dispatcher retry the start command and re-emit the ready/drop signal.
_logger.LogWarning(
ex,
"Agent run output notification was not accepted; run remains retryable: runId={RunId} correlation={CorrelationId}",
runId,
request.CorrelationId);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-queue failed output notifications

When the run actor generates a reply/drop but SendToAsync fails, this catch logs and returns successfully, so the consumed AgentRunStartRequested is acknowledged and nothing schedules another attempt. ConversationGAgent.DispatchPendingLlmReplyAsync only schedules its durable retry when the initial dispatcher call throws, while this path leaves state merely Started and does not persist the request or enqueue a retry, so a transient target dispatch failure can leave the pending LLM reply stuck until some unrelated reactivation/stale cleanup instead of notifying the user.

Useful? React with 👍 / 👎.

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented May 8, 2026

Multi-model fix-check on PR #597 (head 2e4a1d1)

Ran 5 reviewers (codex, deepseek-v4-pro, glm-5.1, mimo-v2.5-pro, kimi) over each existing inline comment to judge whether the post-fix code resolves the original concern.

All 6 comments judged resolved.

# comment location verdict
1 codex blocker concurrency (sync cross-actor wait) AgentRunDispatcher.cs:58 ✅ 4/5 resolved*
2 glm-5.1 minor arch (try/catch around ProcessAsync) AgentRunGAgent.cs:78 ✅ 5/5 resolved
3 glm-5.1 major arch (run-actor cleanup) AgentRunGAgent.cs:86 ✅ 5/5 resolved
4 consensus major bug (terminal guard / dispatch ordering) AgentRunGAgent.cs:104 ✅ 3/5 resolved*
5 codex bot P2 (re-queue failed output) AgentRunGAgent.cs:136 ✅ 5/5 resolved
6 glm-5.1 minor test (NoOpEventSourcing) AgentRunGAgentTests.cs:1028 ✅ 5/5 resolved

*Divided rows verified against current source:

  • Refactor/project namespace #1: AgentRunDispatcher.DispatchAsync now uses _streamProvider.GetStream(actor.Id).ProduceAsync(envelope, ct) — one-way enqueue. The single unresolved dissent claimed the file still uses IActorDispatchPort.DispatchAsync, which is factually incorrect at the current head.
  • Aevatar context database #4: ProcessAsync lines 316–317 invoke DispatchReadyEventAsync before PersistReplyProducedAsync, so a failed dispatch leaves the run in Started and TryHandleOutputDispatchFailureAsync schedules a durable retry. The 2 partial dissents either misread the order or raise a future-refactoring concern, not a current defect.

Fixes landed in c108ef4ab4 ("Address agent run review comments") and 2e4a1d1bcb ("Harden agent run output retry handling").

Reviewer note: gemini failed both runs due to interactive auth prompt; remaining 5 reviewers were unanimous on 4/6 and majority-correct on 2/6.

Generated by /opencode-pr-fix-check. Threads not auto-resolved — please mark each thread resolved manually if you agree.

@eanzhao eanzhao merged commit c2f1969 into feature/lark-bot May 8, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant