Refactor jsRpcSession span parenting and ownership#6704
Refactor jsRpcSession span parenting and ownership#6704
Conversation
|
UnknownError: ProviderInitError |
|
@Ankcorn Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This lifts the jsRpcSession span out of TraceContext and onto JsRpcSessionCustomEvent, so the span is reachable from the event and available for enrichment in the followup PR. The span lifetime and ownership model looks correct.
One minor note (low severity):
|
Review posted successfully. Here's a summary of my analysis: Overall assessment: The PR is well-structured. The span lifetime model is correct — One minor issue flagged: The new |
04e17f6 to
b28449b
Compare
|
@ask-bonk review this PR |
There was a problem hiding this comment.
The featureFlagsForFl issue from my previous review has been addressed — the uint channel case now populates it via the metadataExtra callback.
On re-review, the span ownership model is sound and the code is correct. One minor nit below on a comment that looks like it has placeholder text left over from editing.
This review was generated by an AI assistant (Bonk) and may contain mistakes.
|
Review posted successfully. Here's a summary of the re-review: Previous feedback status: The Re-review findings: The code is correct. The span ownership model is sound:
One minor nit posted: A comment at line 2414 in |
fdc5681 to
04aa1ca
Compare
|
@ask-bonk review this PR |
There was a problem hiding this comment.
Both issues from prior reviews (missing featureFlagsForFl, truncated comment) are addressed. On re-review the refactoring is correct — no new issues found.
Summary of verification:
- Span lifetime.
SpanBuilderis move-only with RAIIend()in its destructor. Storing it onJsRpcSessionCustomEventgives it session-scoped lifetime, matching the logical span duration. The event is kept alive by.attach()on thecustomEvent()promise. Correct. SpanParentref-counting.SpanParent(SpanBuilder&)callsmapAddRefon the builder's observer (akj::Refcounted).addRef()does the same. Both thesessionSpanbuilder and theSubrequestMetadata::userSpanParenthold independentkj::Ownreferences to the same observer. Correct.- Capture-by-reference safety. All
[&]lambdas inwithSessionSpan/getSubrequestare called synchronously (kj::FunctionParamis non-owning by design). No dangling references. - Scratch path vector.
callImpluses a separatesessionPathvector fortryGetJsRpcSessionClientand onlykj::mvs it intopathon success.JsRpcPropertyappending unconditionally to the scratch vector is harmless since it's discarded on thekj::nonefallback. TheStringPtrs point toJsRpcProperty::namemembers that outlive the synchronousmethodPathconstruction. Correct. getCap()before event move.getCap()moves the promise-side cap out of the event; the cap then lives independently via capnp's refcounting. Moving the event intocustomEvent()afterward is safe — same pattern as the original code.- OutgoingFactory variants. Returning
SpanBuilder(nullptr)for DO/cross-process actor factories correctly skips thejsRpcSessionspan. These factories produce their own outer span (e.g.durable_object_subrequest), so no redundant span is created. Matches prior behavior. - Test. The new
jsRpcInsideEnterSpantest validates thatgetCurrentUserTraceSpan().newChild()picks up theenterSpan-pushedAsyncContextFrame, mirroring the existingfetchInsideEnterSpancase for HTTP. The.wd-testcorrectly adds therpccompat flag and the entrypoint binding.
This review was generated by an AI assistant (Bonk) and may contain mistakes.
|
Review posted successfully. Here's a summary: Both issues from my prior reviews have been addressed ( I verified seven aspects of correctness:
|
685b7fa to
b367ed8
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6704 +/- ##
=======================================
Coverage 66.61% 66.61%
=======================================
Files 402 402
Lines 115803 115841 +38
Branches 19395 19401 +6
=======================================
+ Hits 77139 77166 +27
- Misses 27094 27099 +5
- Partials 11570 11576 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // If this isn't the root object (i.e. this is a JsRpcProperty), the property path starting from | ||
| // the root object will be appended to `path`. | ||
| // | ||
| // For cap-holding providers (JsRpcStub, JsRpcPromise), this returns the live cap. Session- |
There was a problem hiding this comment.
This seems to be breaking the abstraction that this interface is meant to provide. What's the point of having a shared interface between Fetcher and JsRpcStub if they implement different methods and have to be called differently?
The right answer here is to push the Fetcher-specific logic into getClientForOneCall. It's not immediately obvious to me why this can't be done here.
There was a problem hiding this comment.
If the point here is that #6695 needs access to the SpanBuilder for the session, we can extend getClientForOneCall() with an extra output parameter to let it fill that in when it's a top-level request.
|
I wonder if this should even be JSRPC-specific at all, or if it should apply to all event types (e.g. including fetch events)? Seems like you could benefit from this applying to all cross-worker subrequests. Perhaps, then, it makes more sense to extend |
…sionCustomEvent
Per Dan's spec, the user-visible jsRpcSession span belongs in
JsRpcSessionCustomEvent (the membrane event itself), not at the
session-dispatch site in Fetcher::getClient. Move it accordingly:
- Server side, in run(): wraps the membrane lifetime, from delivered()
through to all caps being dropped. Parented to the caller's enclosing
user span via USER_SPAN_CONTEXT_PROPAGATION (the metadata.userSpanParent
set on the client side flows here as the IoContext's current user span).
- Client side, in sendRpc(): only fires for cross-process dispatch
(in-process service bindings reach the callee via WorkerEntrypoint::
customEvent -> event->run() and never enter sendRpc). Wraps the wire
round-trip. Parented to the caller's enclosing user span when an
IoContext is in scope; constructs as unobserved otherwise so capnp
dispatch contexts that lack an IoContext don't crash.
Fetcher::getClientForOneCall stops going through getClient(... "jsRpcSession")
and instead inlines the channel switch to:
- Construct an internal trace span (still needed for SubrequestMetadata
.parentSpan ID propagation to the callee's nested internal subrequests).
- Pass getCurrentUserTraceSpan() as metadata.userSpanParent so the new
server-side jsRpcSession span lands as a direct child of the caller's
enclosing user span.
- Skip user-span synthesis at the dispatch site -- the user-visible span
is now emitted by JsRpcSessionCustomEvent itself.
Internal trace span layout is unchanged: it stays at the dispatch site
because its ID has to be available when SubrequestMetadata is assembled,
which happens before the event runs. Internal-trace asymmetry vs the user
trace is a deliberate trade-off (see worker-rpc.c++ comment).
Tail-worker-test fixtures updated:
- jsrpcDoSubrequest (caller): loses the client-side jsRpcSession user
spans; only durable_object_subrequest remains as a client-side user
span.
- Callee fixtures (myActorJsrpc, jsrpcNonFunction, jsrpcGetCounter):
wrapped with spanOpen/spanClose for the new server-side jsRpcSession.
- expectedWithPropagation: callees no longer attach as direct children
of jsrpcDoSubrequest. The buildTree heuristic uses sequential
per-invocation span IDs that collide across invocations in the same
trace, so it can attach a callee under whichever invocation it last
indexed at the colliding ID. The shape reflects the heuristic's
limitation, not actual misparenting -- the underlying trigger
contexts and traceIds remain correct.
Adds Case 7 to tracing-hierarchy-instrumentation-test: an RPC call made
inside enterSpan() must produce a jsRpcSession user span whose parent
context is the enterSpan, verifying the USER_SPAN_CONTEXT_PROPAGATION
flow across the membrane after moving jsRpcSession to the server-side
JsRpcSessionCustomEvent::run.
The server-side jsRpcSession lives on the *callee*'s invocation, so
direct parentSpanId equality with the caller's enterSpan doesn't work:
each invocation's SequentialSpanSubmitter mints span IDs starting at 1,
while cross-invocation references in the streaming-tail (the trigger
context on the callee's onset) carry the real 64-bit spanId. Instead
this test asserts the structural propagation we actually care about:
- jsRpcSession lives on a separate callee invocation,
- and, when USER_SPAN_CONTEXT_PROPAGATION is enabled (@all-autogates),
the callee shares the caller's traceId and onset's trigger context
references a span on the caller's invocation.
The instrumentation-test-helper collector is extended to capture
per-invocation traceId and triggerSpanId from onset events, accessible
to assertions via state.invocations.
The wd-test mock service now also streams tails so callee-side spans
are observable in the same tail stream the test inspects.
a1dafc5 to
63ac6bd
Compare
|
Thanks for the review - I was working through some stuff with Dan and this PR was kinda dealing with 2 slightly different intentions at the same time so I got mega confused. I'm trying to break down more so closing this in favour of #6734 |
That would be really nice but not all of our subrequests are over http, for example, some subrequests happen to be transported over http, imagine a worker-to-worker cached subrequest that runs over channel tokens over http over cache. How would we then extend this implementation idea to also work for fetch entrypoints? Building this into the jsrpc session mechanism is clear as we must have bidirectional RPC for jsrpc. |
Prep for native jsRpcSession span enrichment.
Background
jsRpcSessionis the user span for one RPC session. Today it's created byFetcher::getClientForOneCall(HTTP) and kept alive via.attach()on theWorkerInterface. AIG enrichment needs to write to this span fromcallImpl's response handler — but.attach()makes it unreachable, andFetcherconstructing anRPC-layer event blurs HTTP/RPC layering.
Commit 1 — Test:
jsRpcSessionparented to enclosingenterSpanRPC analog of
fetchInsideEnterSpan. Locks in the contract before moving the span.Commit 2 — Move span ownership into
JsRpcSessionCustomEvent.attach()ordering (side effect); now correct by construction..attach()hides the object; AIG needs a named path.event->jsRpcSessionSpanis that path."jsRpcSession"string moves out of generic HTTP plumbing intoFetcher::getJsRpcClientreturning{worker, span}.OutgoingFactoryvariants pass
SpanBuilder(nullptr)— they already make adurable_object_subrequestspan.Commit 3 — Move event construction into
callImplFetcherproduces{worker, span};callImplnames and uses them.&event->jsRpcSessionSpanincallImpl's response lambda; constructing the event there keeps it in-scope. Otherwisethe AIG PR would need a
ClientForOneCall { client, SpanBuilder* }struct on the provider API.JsRpcClientProviderhad one virtual hiding two shapes. Now two, each defaulted:getClientForOneCall— cap-holders (JsRpcStub,JsRpcPromise).tryGetJsRpcSessionClient— session-creators (Fetcher).Note. Both virtuals append to a shared
pathvector.callImplgives the session attempt a scratch vector and only commits on success, so forwarders canappend unconditionally. (Caught early — 4 pipelining tests failed before the fix.)
Net effect
.attach()-ed toWorkerInterfaceJsRpcSessionCustomEventFetcher(HTTP)callImpl(RPC)JsRpcClientProviderevent->jsRpcSessionSpan