[codex] Refactor LLM reply continuation into agent runs by eanzhao · Pull Request #597 · aevatarAI/aevatar

eanzhao · 2026-05-08T09:03:39Z

Summary

Implements issue 将 ChatRuntime / ChannelLlmReplyInboxRuntime 收敛为 actor-owned Agent Continuation #596 Phase A by replacing the channel LLM reply hosted inbox singleton with run-scoped AgentRunGAgent continuation actors.
Adds IChannelLlmReplyRunDispatcher and the NyxidChat dispatcher that creates/gets AgentRunGAgent by correlationId and dispatches typed AgentRunStartRequested commands.
Moves the existing LLM reply gates, metadata enrichment, fallback timeout, streaming sink, reply generation, ready/drop dispatch, and relay token echo into the run actor while keeping reply tokens out of persisted run state/events.
Updates ConversationGAgent and tests to use the run dispatcher path and retires the old inbox interface/hosted service path.

Validation

dotnet build test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo
dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo
dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/Aevatar.GAgents.Channel.Protocol.Tests.csproj --nologo
bash tools/ci/test_stability_guards.sh
bash tools/ci/architecture_guards.sh

codecov · 2026-05-08T09:23:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.92%. Comparing base (59fd77a) to head (2e4a1d1).
⚠️ Report is 1 commits behind head on feature/lark-bot.

@@                 Coverage Diff                  @@
##           feature/lark-bot     #597      +/-   ##
====================================================
- Coverage             71.95%   71.92%   -0.04%     
====================================================
  Files                  1258     1258              
  Lines                 91512    91512              
  Branches              11998    11998              
====================================================
- Hits                  65851    65818      -33     
- Misses                20926    20961      +35     
+ Partials               4735     4733       -2

Flag	Coverage Δ
ci	`71.92% <ø> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...r.AI.ToolProviders.NyxId/Tools/NyxIdSshExecTool.cs	`99.41% <ø> (ø)`

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eanzhao · 2026-05-08T10:25:31Z

+            },
+        };
+
+        await _actorDispatchPort.DispatchAsync(actor.Id, envelope, ct);


[codex] severity=blocker, category=concurrency

This uses synchronous IActorDispatchPort.DispatchAsync to start the run actor. ConversationGAgent awaits this while handling the inbound turn, and AgentRunGAgent later dispatches LlmReplyReadyEvent back to that same conversation actor, so the real Local/Orleans dispatch implementations can form an A -> B -> A wait cycle and also block the webhook for the full LLM run. This violates CLAUDE.md: cross-actor waits must be continuationized. Start the run through a one-way enqueue/publish mechanism that returns after acceptance, then let the run actor deliver the ready/drop signal independently.

eanzhao · 2026-05-08T10:25:32Z

+            .Match(current, evt)
+            .On<AgentRunStartedEvent>(ApplyStarted)
+            .On<AgentRunReplyProducedEvent>(ApplyReplyProduced)
+            .On<AgentRunDroppedEvent>(ApplyDropped)


[glm-5.1] severity=major, category=arch

AgentRunGAgent reaches terminal states (ReplyProduced, Dropped, Failed) but has no self-deactivation or cleanup mechanism. Per AGENTS.md: "默认优先 run/session/task-scoped actor" and "长期 actor 只保留给事实拥有者". Each LLM reply creates a persistent actor that accumulates indefinitely. Consider persisting the terminal event then scheduling a self-deactivation (e.g. IActorRuntime.DeactivateAsync or a delayed cleanup self-message) so the actor does not remain alive after its single-shot purpose is served.

eanzhao · 2026-05-08T10:25:33Z

-    }
-
-    internal const long MaxInboxRequestAgeMs = 5 * 60 * 1000;
+        if (State.Status is AgentRunStatus.ReplyProduced or AgentRunStatus.Dropped or AgentRunStatus.Failed)


[Consensus: 2 models] severity=major, category=bug

The terminal-state dedup guard at this line silently drops undelivered ready/drop notifications when conversation dispatch fails. The actor persists AgentRunReplyProducedEvent / AgentRunFailedEvent / AgentRunDroppedEvent before dispatching to the conversation actor; if that dispatch throws, a retry of the start command will hit this guard and return without re-emitting the signal — ConversationGAgent.PendingLlmReplyRequests stays stuck. The guard also has zero test coverage. Either move the guard to after a confirmed-delivered phase (dispatch retry as a persisted phase), or only mark the run terminal once the conversation acknowledges. Add a test covering both the legitimate-duplicate case and the failed-dispatch retry case.

Per-model verbatim

kimi: HandleStartAsync introduces a new terminal-state dedup guard (ignoring duplicate start commands when the run is already ReplyProduced, Dropped, or Failed). This idempotency behavior is not covered by any test in AgentRunGAgentTests.cs. Add a test that sends a second AgentRunStartRequested after the run reaches a terminal state and asserts it is ignored.

codex: This terminal duplicate guard can drop undelivered results. The run persists AgentRunReplyProducedEvent / AgentRunFailedEvent before DispatchReadyEventAsync, and persists AgentRunDroppedEvent before the drop notification; if that conversation dispatch fails, a retry of the start command sees terminal state here and returns without re-emitting the ready/drop signal, leaving ConversationGAgent.PendingLlmReplyRequests stuck. Make delivery a retryable persisted phase, or only mark the run terminal after the conversation has accepted/acknowledged the notification.

eanzhao · 2026-05-08T10:25:34Z

-
-        _logger.LogInformation("Started channel LLM reply inbox on {StreamId}", InboxStreamId);
+        ArgumentException.ThrowIfNullOrWhiteSpace(correlationId);
+        return ActorIdPrefix + correlationId.Trim();


[glm-5.1] severity=minor, category=arch

HandleStartAsync calls PersistDomainEventAsync(AgentRunStartedEvent) then synchronously calls ProcessAsync which runs the full LLM generation (up to 300s). If PersistDomainEventAsync for the started event succeeds but ProcessAsync throws an unguarded exception (e.g. from EnsureTargetActorAsync or a subsequent PersistDomainEventAsync call), the actor is left in Started state with no terminal event and no retry path. Wrap the ProcessAsync call in a try/catch that persists AgentRunFailedEvent on any unexpected exception to guarantee terminal state is always reached.

eanzhao · 2026-05-08T10:25:35Z

@@ -712,6 +755,8 @@ private sealed class DispatchingActorRuntime(params (string Id, IActor Actor)[]
            static pair => pair.Actor,


[glm-5.1] severity=minor, category=test

NoOpEventSourcing.TransitionState returns current unchanged, so AgentRunGAgent.State never advances past AGENT_RUN_STATUS_UNSPECIFIED in any test. This means the duplicate-terminal-start guard at HandleStartAsync line ~78 (State.Status is AgentRunStatus.ReplyProduced or AgentRunStatus.Dropped or AgentRunStatus.Failed) is never exercised. Either wire a real IEventSourcingBehavior that calls TransitionState, or add a dedicated test that pre-populates the state to a terminal status and asserts the guard fires.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c108ef4ab4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T13:43:17Z

+            // The run has not entered a terminal state yet. Leaving it Started lets the
+            // durable dispatcher retry the start command and re-emit the ready/drop signal.
+            _logger.LogWarning(
+                ex,
+                "Agent run output notification was not accepted; run remains retryable: runId={RunId} correlation={CorrelationId}",
+                runId,
+                request.CorrelationId);
+        }


Re-queue failed output notifications

When the run actor generates a reply/drop but SendToAsync fails, this catch logs and returns successfully, so the consumed AgentRunStartRequested is acknowledged and nothing schedules another attempt. ConversationGAgent.DispatchPendingLlmReplyAsync only schedules its durable retry when the initial dispatcher call throws, while this path leaves state merely Started and does not persist the request or enqueue a retry, so a transient target dispatch failure can leave the pending LLM reply stuck until some unrelated reactivation/stale cleanup instead of notifying the user.

Useful? React with 👍 / 👎.

eanzhao · 2026-05-08T14:26:08Z

Multi-model fix-check on PR #597 (head `2e4a1d1`)

Ran 5 reviewers (codex, deepseek-v4-pro, glm-5.1, mimo-v2.5-pro, kimi) over each existing inline comment to judge whether the post-fix code resolves the original concern.

All 6 comments judged resolved.

#	comment	location	verdict
1	codex blocker concurrency (sync cross-actor wait)	`AgentRunDispatcher.cs:58`	✅ 4/5 resolved*
2	glm-5.1 minor arch (try/catch around ProcessAsync)	`AgentRunGAgent.cs:78`	✅ 5/5 resolved
3	glm-5.1 major arch (run-actor cleanup)	`AgentRunGAgent.cs:86`	✅ 5/5 resolved
4	consensus major bug (terminal guard / dispatch ordering)	`AgentRunGAgent.cs:104`	✅ 3/5 resolved*
5	codex bot P2 (re-queue failed output)	`AgentRunGAgent.cs:136`	✅ 5/5 resolved
6	glm-5.1 minor test (NoOpEventSourcing)	`AgentRunGAgentTests.cs:1028`	✅ 5/5 resolved

*Divided rows verified against current source:

Refactor/project namespace #1: AgentRunDispatcher.DispatchAsync now uses _streamProvider.GetStream(actor.Id).ProduceAsync(envelope, ct) — one-way enqueue. The single unresolved dissent claimed the file still uses IActorDispatchPort.DispatchAsync, which is factually incorrect at the current head.
Aevatar context database #4: ProcessAsync lines 316–317 invoke DispatchReadyEventAsync before PersistReplyProducedAsync, so a failed dispatch leaves the run in Started and TryHandleOutputDispatchFailureAsync schedules a durable retry. The 2 partial dissents either misread the order or raise a future-refactoring concern, not a current defect.

Fixes landed in c108ef4ab4 ("Address agent run review comments") and 2e4a1d1bcb ("Harden agent run output retry handling").

Reviewer note: gemini failed both runs due to interactive auth prompt; remaining 5 reviewers were unanimous on 4/6 and majority-correct on 2/6.

_{Generated by /opencode-pr-fix-check. Threads not auto-resolved — please mark each thread resolved manually if you agree.}

Refactor LLM reply continuation into agent runs

f078068

eanzhao commented May 8, 2026

View reviewed changes

Address agent run review comments

c108ef4

eanzhao marked this pull request as ready for review May 8, 2026 13:39

eanzhao requested review from jason-aelf and louis4li as code owners May 8, 2026 13:39

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

Harden agent run output retry handling

2e4a1d1

eanzhao merged commit c2f1969 into feature/lark-bot May 8, 2026
13 checks passed

This was referenced May 8, 2026

Fix #599: per-conversation LLM reply executor (drop silo-wide inbox) #600

Closed

Lark bot LLM reply 全局串行：单 hosted-service 订阅一条共享 stream #599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Refactor LLM reply continuation into agent runs#597

[codex] Refactor LLM reply continuation into agent runs#597
eanzhao merged 3 commits intofeature/lark-botfrom
refactor/2026-05-08_agent-run-continuation-phase-a

eanzhao commented May 8, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 8, 2026 •

edited

Loading

Uh oh!

eanzhao May 8, 2026

Uh oh!

eanzhao May 8, 2026

Uh oh!

eanzhao May 8, 2026

Uh oh!

eanzhao May 8, 2026

Uh oh!

eanzhao May 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

eanzhao commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -712,6 +755,8 @@ private sealed class DispatchingActorRuntime(params (string Id, IActor Actor)[]
		static pair => pair.Actor,

Conversation

eanzhao commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eanzhao May 8, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao May 8, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao May 8, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao May 8, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao commented May 8, 2026

Multi-model fix-check on PR #597 (head 2e4a1d1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eanzhao commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading

Multi-model fix-check on PR #597 (head `2e4a1d1`)