Skip to content

ServiceConnect v7: clean-architecture rewrite — DI-first, async-first, STJ-on-the-wire, operator-grade telemetry#67

Open
twatson83 wants to merge 1507 commits into
masterfrom
v7-clean-architecture
Open

ServiceConnect v7: clean-architecture rewrite — DI-first, async-first, STJ-on-the-wire, operator-grade telemetry#67
twatson83 wants to merge 1507 commits into
masterfrom
v7-clean-architecture

Conversation

@twatson83
Copy link
Copy Markdown
Collaborator

Overview

This PR merges v7-clean-architecture into master as the v7 line of ServiceConnect.

v7 is a clean rewrite, not an upgrade: the static Bus is gone, every public path is async with CancellationToken, the wire format is System.Text.Json, the public surface is shrunk to extension contracts only, and the solution is consolidated from 17 production projects to seven shipping NuGet packages. Reliability, security, and observability work landed in the same drop — header trust boundaries, cardinality and amplification caps, persistence-lease correctness, OTel semconv 1.x messaging attributes, and a ServiceConnect.HealthChecks package.

This is not source- or binary-compatible with v6. See the v7 entry in website/src/content/docs/releases.mdx for the full audience-oriented changelog (breaking changes, new features, hardening, removals, migration steps). A short summary follows.

Highlights

Architecture and API

  • AddServiceConnect(...) DI extension + fluent ServiceConnectBuilder replace the static Bus. IBus is IAsyncDisposable and single-use (latched on stop/dispose).
  • IBus, IProducer, IConsumer, IConsumeContext, IProcessManagerFinder, ITimeoutStore, IAggregatorPersistor, middleware, and filters are all async with CancellationToken end-to-end. All sync-over-async paths removed from the RabbitMQ transport.
  • Handler signatures take the context/stream as a parameter (HandleAsync(T msg, IConsumeContext ctx, CancellationToken ct)); the unsafe settable Context/Stream properties are gone.
  • Public surface tightened: Bus, MessageDispatcher, concrete persistors, concrete RabbitMQ classes, every *Configuration type, all hosted services are now internal. Public API = interfaces in ServiceConnect.Interfaces + the builder/extension methods on the bus configuration root.
  • Typed pipeline builders: AddOutgoingFilter<T>, AddBeforeConsumingFilter<T>, AddOnConsumedSuccessfullyFilter<T> (new), AddAfterConsumingFilter<T>, AddSendMessageMiddleware<T>, AddMessageProcessingMiddleware<T>.
  • New processor chain (HandlerProcessor, ReplyProcessor, ProcessManagerProcessor, AggregatorProcessor, StreamProcessor) with pluggable registries — no per-message reflection.
  • Polymorphic dispatch with type-hierarchy walking. Native streaming via bus.CreateStream<T>(endpoint)IMessageBusWriteStream.
  • Request fan-out is explicit: SendToManyAsync / PublishRequestAsync replace SendOptions.EndPoints / RequestOptions.EndPoints. SendRequestMultiAsync throws RequestTimeoutException (with PartialReplies) on under-delivery instead of returning silently.

Transport (RabbitMQ)

  • Rewritten async-first with lazy Producer connection, safe dispose, nack on retry failure, async Reply/Route.
  • Consumer split into a Client facade + RabbitMqConsumerHost; audit/retry concerns extracted (MessageAuditPublisher, MessageRetryHandler, HeaderHelpers).
  • TLS on by default (SslEnabled = true, port 5671); plaintext outside loopback emits a Warning.
  • Publisher confirms on by default; configuring PublisherAcknowledgements=false with a finite PublishTimeout is a startup error.
  • PublishOptions.RoutingKey honoured (IProducer.SupportsRoutingKey capability flag for third-party transports).
  • MessageId survives publish retries; TimeoutException is retriable under the at-least-once contract (channel reset + same BasicProperties). RabbitMqOptions.MaxPublishWaitTime caps the retry-loop budget.
  • Retry/error publishes use mandatory:true — topology gaps surface as PublishException rather than silent drops.
  • MessageTypeExchangeName is now version-stable (type.FullName + assembly.GetName().Name); first v7 deploy on a broker shared with v6 will see new exchange names.

Persistence

  • ServiceConnect.Persistence.MongoDb: bumped to MongoDB.Driver 3.8.0; bundled persistor handles GuidRepresentationMode = V3 and RenderArgs<T>. Startup guards on WriteConcern.Unacknowledged, conflicting Guid-serializer registrations, and the legacy (Locked, Time) index. IProcessManagerTypeRegistry enables pre-startup CorrelationId unique index creation. Aggregator Name partition value changed (Aggregator<T> → concrete type); generic-saga collection names sanitized — both require rename scripts before deploy.
  • ServiceConnect.Persistence.InMemory: no more 2-day TTL drop; deep-clones on read; types are IDisposable. IAggregatorPersistor parameter/return types tightened to IHasCorrelationId / IReadOnlyList<IHasCorrelationId>. Added CountResolvedAsync and ReleaseSnapshotAsync (additive default-interface-methods).
  • IKeyValueStore / ICacheProvider switch from Get<> to TryGet<>.
  • ServiceConnect.Persistence.SqlServer and ServiceConnect.Persistence.Redis dropped — no replacement.

Serialization

  • System.Text.Json across all shipping packages — Newtonsoft.Json is no longer transitive.
  • IMessageSerializer reduced to four methods (Serialize<T>(T, IBufferWriter<byte>), Deserialize<T>(ReadOnlyMemory<byte>), Deserialize(ReadOnlyMemory<byte>, Type), default ReadOnlySequence<byte> overload).
  • IProducer body is ReadOnlyMemory<byte>; streaming WriteAsync cascades to ReadOnlyMemory<byte>.
  • ServiceConnect.SerializationCompatTests enforces v6↔v7 round-trip on every PR.

Telemetry and operations

  • New ServiceConnect.HealthChecks package: BusConsumingHealthCheck, ConsumerConnectionHealthCheck, ProducerConnectionHealthCheck. Configurable recoveryGraceWindow (default 30s) absorbs transient broker flap; permanent broker-cancel bypasses grace.
  • One ActivitySource ("ServiceConnect.Bus"); telemetry static state removed (options + attributes are method parameters now). Per-direction toggles.
  • Operator metrics: OTel messaging.* instruments + ServiceConnect-specific counters (retry.attempts, retry.drops, publish.confirm_timeouts, audit.drops, outgoing_filters.blocked) + in-flight UpDownCounter.
  • OTel semconv 1.x: messaging.operation.type / messaging.operation.name; messaging.destination.name now reflects broker exchange/routing key (not CLR FullName). Dashboards filtering on the legacy attributes must update.
  • W3C traceparent/tracestate injected on outbound.
  • MaxTagValueLength (default 256) + ExceptionMessageSanitiser for cardinality / PII safety.
  • Connection-lifecycle Info logs (ConnectionOpened / ProducerConnectionOpened / ConnectionRecovered / ConnectionLost) carry the connection's MessageId.

Security

  • Server-authoritative inbound headers (DestinationAddress, MessageId, MessageType, TypeName, FullTypeName); reply routing resolves through registered handlers only (no Type.GetType on caller-supplied wire data).
  • Cardinality/DoS caps: message size, header count/size (recursive byte budget on AMQP table/array nested values), routing-slip hops, active streams, in-flight requests, per-tag length. Gzip magic-byte check + decompression size cap.
  • DeepClone switched off Newtonsoft.Json TypeNameHandling.Auto — removes the deserialisation-gadget surface and MongoDB.Bson coupling.
  • Random.Shared for jitter; exponential backoff with overflow cap.

Reliability

  • Audit-publish failures no longer trigger redelivery. Malformed RetryCount headers route to error queue. Unresolved message types are terminal. Reply-shaped messages route safely without IRequestReplyManager registered.
  • Mongo timeout-store: candidate sort tie-broken by Id; cancellation between UpdateMany and FindAsync performs best-effort lease cleanup; $facet aggregation eliminates the second round-trip.
  • Aggregator: RemoveDataAsync distinguishes KeyNotFoundException from ConcurrencyException; batch-size flush gates on resolved-count; OnTimerFired register-before-recheck closes disposal-snapshot race.
  • Consumer-host: auto-recovery refreshes consumer tag; stale UnregisteredAsync events ignored; queues bound before BasicConsume; parallel host disposal.
  • Bus lifecycle is single-use and DI-owned; Bus no longer disposes transport singletons; DisposeAsync is bounded by DisposeTimeout (default 30s).
  • Telemetry activity leak on enricher OCE closed; body copy gated on listener presence.

Build, packaging, license

  • Target frameworks: net8.0 and net10.0 on every package. net8.0 will be dropped in the first major after Microsoft's EoL (2026-11-10).
  • 17 production projects collapsed to 7 NuGet packages, all v7.0.0: ServiceConnect, ServiceConnect.Interfaces, ServiceConnect.Client.RabbitMQ, ServiceConnect.Persistence.MongoDb, ServiceConnect.Persistence.InMemory, ServiceConnect.Telemetry, ServiceConnect.HealthChecks (new).
  • csproj-driven dotnet pack; metadata centralised in src/Directory.Build.props. SourceLink + .snupkg sidecar on every package.
  • TreatWarningsAsErrors=true, EnforceCodeStyleInBuild=true, NetAnalyzers + VSTHRD + Meziantou repo-wide.
  • CI runs build + unit + E2E + pack-validate on Linux and Windows; release workflow runs unit + E2E before pushing to NuGet.org.
  • License: MIT (relicensed from the prior more-restrictive license).

Removed

  • ServiceConnect.Filters.MessageDeduplication (silently dropped legitimate redeliveries; replacement pattern in examples/CustomFilterAndMiddleware uses the new OnConsumedSuccessfully stage).
  • ServiceConnect.Persistence.SqlServer, ServiceConnect.Persistence.Redis (no replacement).
  • Static Bus, IProducer.DisconnectAsync, IProcessMessageMiddleware, SendOptions.EndPoints / RequestOptions.EndPoints, SendEventArgs.EndPoints (plural), byte[] / ReadOnlySpan<byte> overloads on IMessageSerializer, Get<TKey,TValue> on IKeyValueStore / ICacheProvider, pre-1.x messaging.operation attribute, three separate ActivitySources, the seven .nuspec files, the DocFX site.
  • Newtonsoft.Json from all production packages.

Docs and examples

  • New Astro + Starlight docs site at website/, deployed to GitHub Pages from this branch. Replaces the prior DocFX site.
  • Runnable examples in examples/: Aggregator, CompetingConsumers, ContentBasedRouting, CustomFilterAndMiddleware, Filters, PointToPoint, PolymorphicMessages, ProcessManager, PublishSubscribe, RequestReply, RoutingSlip, ScatterGather, Streaming, StressHarness, Telemetry. Each ships docker-compose + run.sh/.ps1.
  • Operations docs cover clustering + quorum queues.

twatson83 added 30 commits May 16, 2026 23:49
Disabling library topology recovery (the previous v-major decision)
broke HA cluster failover: the application only redeclares topology
during Consumer.StartConsumingAsync at startup, so an auto-recovered
connection to a fresh broker node found no exchanges, queues, or
bindings. The application has no listener on
IConnection.RecoverySucceededAsync, so consumers silently died and
producers' first publish 404'd until the retry path bumped the
generation.

RabbitMQ.Client's topology recovery is idempotent for our declarations
(durable, no passive calls, no mismatched arguments at runtime), and
the application's per-connection generation cache on the producer side
prevents duplicate cached entries when the library redeclares.
Empty-destination rows previously incremented sentCount, driving the
catch-up loop to MaxCatchUpIterationsPerTick redundant polls per tick.
Move the increment inside the destination guard so only real sends
count toward the catch-up signal.
CloseAsync's drain-spin loop throws TimeoutException when in-flight writes
don't drain within _closeDrainTimeout. DisposeAsync caught only
OperationCanceledException, so the TimeoutException leaked through
'await using'. Add the matching TimeoutException catch so best-effort
dispose holds under all CloseAsync failure modes.
The v-major bundle changed the aggregator persistor `Name` partition value
from the closed-generic base type FullName (with embedded assembly version)
to the concrete user-subclass FullName. Existing MongoDB rows under the
old name are invisible to v8 code. Add a copy-pasteable migration script
to the v8 release notes and cross-reference from the persistor reference.
Plain abstract { get; set; } is a source-breaking interface addition
for users with custom ITransportConfiguration implementations. Add a
default interface body matching ServerAddress / ServerPort's pattern
so existing implementors do not need to add the property to compile.
The earlier fix for the consumer-channel broker-cancel flag was
asymmetric — publish-channel close events still flipped
_consumerCancelledByBroker during graceful stop if the broker happened
to close the publish channel in that window (e.g. retry/error/audit
publish hitting a topology error).
The ResolvedIds.Count == 0 early-exit leaked the lease on all-unresolved
batches. Unresolved rows had been claimed by GetSnapshotAsync but were
never released until TTL. The release filter is sessionId-gated and is
a server-side no-op when nothing was claimed, so always running it is safe.
Previously only handlerType.BaseType was inspected; multi-level
subclass hierarchies (e.g. Concrete : Abstract : Aggregator<T>) were
silently dropped from the registry — no error, no log, handler never
fired. Walk the chain to typeof(object) so any depth of subclassing
resolves the closed-generic base correctly.
IBus and BusHostedService permanently latch _stopped after Stop, and
a subsequent Start throws. Pin the behaviour with a test and document
the contract on the interface and the hosted-service xmldoc so
container/orchestrator paths reusing the instance get a clear signal.
The xmldoc remarks for IBus.StartConsumingAsync state the stopped flag
is latched permanently after StopConsumingAsync. The hosted-service
pinning test uses a mock to simulate that throw; this test asserts
the actual Bus implementation itself surfaces the InvalidOperationException
so a refactor that removed the latch would not slip through.
Publish and Send truncate the destination first and append the operation
suffix; Consume concatenated first then truncated, so " process" got
sliced off when destination was at MaxTagValueLength. Match the
Publish/Send order so the operation suffix is always preserved on
the DisplayName.
The remarks block previously claimed the framework invokes ConfigureMapper
twice per delivery. The mapper instance is built once in
ProcessManagerProcessor and reused for both the initial saga lookup and
the post-handler persistence find. The purity advice remains correct
but the framing is wrong.
CLAUDE.md prohibits ticket/phase/task identifiers and 'before/after
the fix' framing in source comments. Preserve the technical content;
strip the meta-references. Affects 13 files across production and
test code.
Two leftover doc gaps surfaced by the final full review:

1. The telemetry reference page still documented
   ServiceConnectActivitySource.Shutdown() as a public API and instructed
   callers to invoke it from ALC unloading code. The method became
   internal in the follow-up bundle (test-only reset helper). Remove
   the entry and the corresponding line from the public-surface stub.

2. The observability log-event table claimed
   "TopologyRecoveryEnabled = false; the application owns topology" —
   accurate for the original v-major bundle but reversed by the follow-up.
   Update the ConnectionRecovered row to describe the current behaviour:
   library auto-recovery replays the topology on the new channel.
…ckKey

SagaLockKey.Equals(KeyValue, other.KeyValue) falls back to object.Equals
when the mapped correlation property is a byte[] or reference type that
doesn't override Equals(object). Two deliveries of the same business key
from different deserialisation passes produce different boxed instances,
miss the per-key SemaphoreSlim, and run handlers concurrently — defeating
the per-saga serialisation the lock is meant to provide.

Validate at first dispatch per (DataType, MessageType) pair, cache the
verdict, and throw InvalidOperationException naming the offending type
when the key falls back to reference equality. Registry-build-time
validation isn't viable because ConfigureMapper is invoked per-delivery
with the live handler instance and may legally read per-instance state.
…e, MessageType)

The mapper expression is Expression<Func<TMessage, object>>, so the runtime
type of the resolved value can legitimately vary across messages of the same
MessageType (e.g. an IFoo-typed property returning Foo1 then Foo2). Keying
the cache on (DataType, MessageType) pinned the first-seen verdict for all
subsequent values — either silently allowing a later reference-equality type
through, or rejecting a later value-equal type. The verdict is purely a
function of the value's runtime type; key the cache by valueType alone.

Tighten the surrounding catch so the validation's InvalidOperationException
no longer needs a rethrow stair to escape the mapping-misconfiguration
catch. Backfill the missing edge cases: enum, nullable value type,
TimeSpan, generic reference type, and a class that inherits Equals from
its base.
Bus.RouteAsync stamped HeaderKeys.RoutingSlipHopsCompleted into the header
dictionary BEFORE the send middleware ran. ISendMessageMiddleware has full
mutation access to context.Headers, so a buggy or hostile middleware could
reset the counter to 0/1 and silently disable the cross-service amplification
cap (MaxRoutingSlipHops) that the per-hop validator relies on.

Carry the outbound count on SendContext.RoutingSlipHopsCompleted (init-only)
and stamp it into headers inside OutboundHeaderBuilder, after middleware runs.
Add RoutingSlipHopsCompleted to OutboundHeaderBuilder.OverwrittenHeaderKeys so
caller-supplied entries are dropped with a warning like the other reserved
framework headers.

Thread the value from SendContext through SendMessagePipeline's terminal
function via a new IProducer.SendAsync(string, Type, ReadOnlyMemory<byte>,
int?, ...) DIM overload, which Producer overrides to forward routingSlipHopsCompleted
into BuildHeaders. Third-party IProducer implementations fall back to the
existing 4-arg SendAsync path via the DIM, matching the same pattern used
for the routing-key overload.
The default-method overload of IProducer.SendAsync(..., int? routingSlipHopsCompleted, ...)
forwarded to the 4-arg path and discarded the hop counter. Third-party transports
that hadn't recompiled against the new overload would silently drop the cross-service
amplification cap — the first-party RabbitMQ Producer overrides the DIM, but the
safety net was missing for everyone else.

Inject HeaderKeys.RoutingSlipHopsCompleted into a copy of the caller headers when
the DIM body runs and a hop counter is supplied, so third-party transports
preserve the wire stamp even without recompiling.

Pin the cross-pipeline invariant with an integration test: a SendMessagePipeline
running an ISendMessageMiddleware that mutates Headers[RoutingSlipHopsCompleted]
results in the framework value (not the middleware's value) reaching the producer.

Also bump StampedHeaderCount to 12 (RoutingSlipHopsCompleted is conditional but
can push past 11), and reword SendContext.RoutingSlipHopsCompleted xmldoc to
accurately describe the middleware contract instead of overstating the invariant.
StreamProcessor.InvokeHandlerAsync caught every OperationCanceledException
and rethrew. An OCE bound to the user's own linked CTS (a handler that
times out an internal subcall) would propagate up with that handler-CT
attached; downstream metrics gate cancelled-classification on
cancellationToken.IsCancellationRequested, so an unrelated OCE escaping
a stream handler produced a misleading 'error.type=cancelled' tag on the
consume span — a graceful-shutdown false alarm in dashboards.

Mirror the HandlerProcessor fix: split the catch into a caller-CT branch
(ThrowIfCancellationRequested with the caller's token) and an
unrelated-OCE branch that logs, clears the dispatch flag, and rethrows
with the caller's CT so the dispatch pipeline's when-filter evaluates
correctly and classifies it as a handler error.
…anion

The original Task-3 test asserted only Assert.NotEqual(unrelatedCts.Token,
ex.CancellationToken) — the negative form of the contract. Tighten to pin
the positive invariants: the escaping OCE carries the caller's CT, the
inner exception is the original handler-thrown OCE with the unrelated CT,
and the caller CT is not in a cancelled state on the no-cancel path.

Add a companion test for the first-catch branch (caller IS cancelled +
handler throws OCE with an unrelated token): the escaping OCE must carry
the caller's token, not the handler's.
… window

StopAsync awaited bus.StopConsumingAsync(cancellationToken) without enforcing
ITransportConfiguration.GracefulShutdownTimeoutMilliseconds. A non-cooperative
transport (third-party IConsumer, or a RabbitMQ host with a wedged channel)
could block host shutdown until the host's outer kill-CT fired, leaving the
grace window silently ignored at the host boundary.

Race the stop against Task.Delay(GracefulShutdownTimeoutMs, cancellationToken);
if the grace window elapses first, cancel the linked CTS, log a warning, and
return so the host can continue shutting down.
…xhaustion

The grace-bound StopAsync logged "did not complete within GracefulShutdownTimeoutMilliseconds"
even when the real reason for the cancellation was the host's outer CT firing
(operator Ctrl+C, container kill, host shutdown grace expired). Production
diagnostics would attribute every outer-CT-driven shutdown to grace exhaustion
incorrectly.

Check cancellationToken.IsCancellationRequested inside the grace-task-winner
branch; if the outer CT fired, fall through to await stopTask so the OCE
propagates naturally without the misleading warning. Strengthen the original
regression test to capture the linked CTS state, and add companion tests for
outer-CT cancellation, GracefulShutdownTimeoutMilliseconds=0, and a cooperative
consumer finishing inside the grace window.
twatson83 added 26 commits May 20, 2026 22:44
…o-cycle chaos

A scatter-gather flow needs BOTH replies. Each reply travels through the
handler-dispatch chain that can stack one publisher retry budget (~30s
ack-wait + ~10s inter-attempt delay) plus the consumer-side retry-queue
delay. A single ill-aligned chaos cycle is absorbed by the publish retry;
two consecutive cycles bracketing the same request-reply window can push
a single reply past the per-flow timeout. The framework's
RequestReplyManager then evicts the correlation entry and any
late-arriving reply is silently dropped (no entry to match against).

Doubling the request-reply timeout gives the second reply enough wall-
clock to ride out a two-cycle alignment. Observed in a 30-minute chaos
soak with 25 kills: 4 of 284,536 flows hit the alignment; doubling the
budget removes the alignment window entirely.
verify-all.sh runs unit tests, E2E tests, a 5-minute harness soak, and a
5-minute chaos soak in sequence. Each stage is skippable via SKIP_*
env vars. Uses docker compose up --wait against the harness's existing
docker-compose.yml so the broker is reachable before the harness starts.

out/ and docs/ are now ignored: out/ is regenerated on every harness
run, and docs/ holds working specs/plans that are committed deliberately
during agent sessions (not auto-staged from the working tree). Tracked
files in docs/ remain tracked.
Roslyn's IDE0031 (null propagation) rewrite for `if (x is not null) x.E += h`
is `x?.E += h`, which requires C# 14 null-conditional compound assignment.
ServiceConnect.Client.RabbitMQ multi-targets net8.0 (LangVersion=12) and
net10.0 (LangVersion=14), so the suggested rewrite is invalid under the
net8.0 compile. Combined with TreatWarningsAsErrors + EnforceCodeStyleInBuild,
the warning escalated to a build error in the net10.0 pass on CI.

Wrap the four event subscribe/unsubscribe blocks in
`#pragma warning disable IDE0031` with a comment explaining the TFM split,
rather than duplicating the bodies behind `#if NET10_0_OR_GREATER`.
A tag can point at any commit, and published NuGet packages can only be
delisted, not unpublished, so the prior flow (restore -> build -> pack ->
push) had no safety net if a non-master commit was tagged or master had
regressed since CI last ran. Re-run both test projects against the exact
tagged tree before pack, and upload the trx files as a 90-day audit
artifact alongside the nupkgs.
New page under learn/operations covering RabbitMQ cluster connectivity:
host-list format (including the no-trim and same-port-per-host
constraints from the naive Host.Split(',')), automatic-recovery
behaviour cross-linked to the observability log table, recovery-tuning
knobs, and a quorum-queues recipe that wires x-queue-type=quorum
through Arguments / RetryQueueArguments / UtilityQueueArguments. Also
adds a sidebar entry under Operations and a cross-link from the
Transport section of the Configuration page so existing readers find
the new content.

Anchors the .gitignore "docs/" rule to "/docs/" so it only matches the
repo-root agent-spec folder, not website/src/content/docs/. Without
that, every new docs page would be silently ignored.
These files were tracked before /docs/ was added to .gitignore. Remove
from the repo so the gitignore rule actually takes effect — keep them
locally.
Repo Pages was previously in legacy mode serving /docs (which has been
removed). Switched Pages to GitHub Actions build_type via API; update
the workflow trigger and deploy gate to match the active branch.
Astro 6 requires Node >=22.12.0; runner was on 20.20.2.
ILeaseAwareTimeoutStore, IServiceConnectConnection, and
IRegistryInitializer are in the sidebar but don't exist as interfaces
in the codebase and have no corresponding content pages, so the links
404. Drop the nav entries.
… deadline helper

CancelHelperPublishesAtDeadlineAsync raced the drain-deadline catch in
DisposeAsync: both paths cancel shutdownPublishCts when the grace
window expires, but the helper omitted the Volatile.Write that the
drain-catch performs first. With FakeTimeProvider the helper's Task.Delay
fires first, so the stalled retry-publish unblocked with OCE while
_shutdownTimedOut was still 0, and AckOrNackAsync fell through to
BasicNackAsync(requeue:true) instead of leaving the message unacked.

The audit-stall sibling test happened to survive because the OCE is
swallowed and ProcessAsync's later `return !_shutdownTimedOut()` had
enough scheduling slack for the drain-catch to set the flag; the retry
path rethrows so there is no slack.
releases.mdx: replace the deleted v7 sections with a single audience-
action structure (Breaking / What's new / Hardening / Removed /
Migration), summary depth with grouped highlights. ~150 lines vs the
~700-line scaffold; ticket IDs / phase labels / commit hashes stripped
per the project's comment-style rule.

migrating-v6-to-v7.mdx: new focused guide covering the mechanical
conversions (DI, handlers, pipeline, fan-out, async) and the silent
runtime bear traps (STJ strictness, Mongo aggregator + saga renames,
TLS default, publisher confirms, exchange-name hash shift, reserved-
header trust boundary, OTel semconv 1.x attribute rename). Linked from
the top-level sidebar.
twatson83 added 2 commits May 22, 2026 09:40
Cross-checked all 64 mdx pages against the v7 solution and patched the
findings — 12 BLOCKERs (broken code, dead links, wrong signatures), 13
MAJORs (API drift, missing defaults, missing overloads), and 14 MINORs
(wording, missing nuance).

Highlights:
- migrating-v6-to-v7: fix IFilter signature, IProcessHandler.HandleAsync
  signature, IStreamHandler.ExecuteAsync signature, SendToManyAsync ct
  param drop, SendRequestMultiAsync no-endpoints-array, branch URL,
  ActivitySource name.
- learn/operations/clustering: HeartbeatTime default 60s -> 120s, fix
  GitHub org link.
- learn/operations/error-handling: RequestSendCancelledException does
  NOT derive from ServiceConnectException; catch it separately.
- learn/operations/hosting: BusHostedService races StopConsumingAsync
  against Task.Delay(GracefulShutdownTimeoutMilliseconds), not passes
  the timeout in.
- learn/messaging-patterns/aggregator: no persistor = warning + drop
  per message, not registration failure.
- learn/messaging-patterns/filters: ISendMessageMiddleware MUST be
  Singleton; IMessageProcessingMiddleware has no lifetime restriction.
- learn/messaging-patterns/process-manager: RequestTimeoutAsync(ct)
  cancels the insert, not the delivery — pass CancellationToken.None
  when the schedule must survive shutdown.
- reference/bus/ibus: RequestOptions.Timeout is int; sentinel is
  Timeout.Infinite (-1), not Timeout.InfiniteTimeSpan. RequestTimeoutAsync
  only needs ITimeoutStore, not EnableProcessManagerTimeouts=true.
- reference/messages/options: add missing ReplyOptions section.
- reference/handlers/event-args: OutgoingEventArgs is abstract with a
  protected constructor.
- reference/healthchecks: document the 4th AddServiceConnectBus and
  AddServiceConnectConsumer overloads (configurable grace + TimeProvider);
  ProducerConnectionHealthCheck calls GetHealthSnapshot, not direct
  IsHealthy/HasAttemptedConnection reads.
- reference/extension-points/transport/iconsumer: BrokerConsumer and
  KafkaConsumer skeletons now implement IsCancelledByBroker (abstract,
  no DIM default — examples wouldn't compile without it).
- reference/extension-points/transport/iproducer: BrokerProducer and
  KafkaProducer skeletons reference 'body'/'packet' (param names), not
  the undefined 'message'; KafkaProducer.BuildMessage accepts
  ReadOnlyMemory<byte> instead of byte[].
- reference/extension-points/persistence/iprocessmanagerfinder:
  IIdentified lives in ServiceConnect.Interfaces, not
  ServiceConnect.Interfaces.Persistence.

Plus MINORs across getting-started, request-reply, streaming,
ibusconfiguration, itransportconfiguration, ifilter, ipropertymapper,
telemetry, imessagedispatcher, imessageprocessor.

Astro build verified locally (all 64 pages emit).
Cross-doc link-integrity sweep found four anchors that don't resolve
against the slugs Starlight emits for the target headings:

- pub-sub -> messages/#separate-commands-from-events: no such heading.
  The commands-vs-events discussion lives as a bolded paragraph inside
  "Designing contracts that age well"; repoint to that section.

- request-reply -> ibus/#sendrequestasynct-treply (and the multi variant):
  github-slugger strips angle brackets and commas but preserves the
  space between type-parameter words, so the heading
  `SendRequestAsync<TRequest, TReply>` slugifies to
  `sendrequestasynctrequest-treply`, not `sendrequestasynct-treply`.

- process-manager -> iprocessmanagerfinder/#findDataAsync: CamelCase
  link to a lowercase anchor, and missing the <T> typeparam letter.
  Correct slug is `finddataasynct`.

Verified against the rendered HTML in website/dist/ and via a
slug-aware link checker over every internal /ServiceConnect-CSharp/
link in the docs (67 unique links, 0 failures after this fix).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants