-
Notifications
You must be signed in to change notification settings - Fork 19
Introduce topSpanId to disambiguate incoming requests that share traceid #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Generate changelog in
|
07f9af5 to
67b55a6
Compare
|
Followups are to include this new property in request and service logs |
67b55a6 to
50ad80c
Compare
…raceid Logs from concurrent incoming requests with the same traceid are currently difficult to tell apart (one can look at thread names etc, but not across executors). This introduces a new optional property on a Trace, localTraceId, that we set to a new unique identifier if an incoming request already has a traceid assigned.
50ad80c to
20aae2c
Compare
|
I am not super comfortable about diverging from the concepts that zipkin already define: https://zipkin.io/pages/architecture.html.. we've already got james' Lines 75 to 78 in 2150bd2
Edit nvm seems like these would be the same |
|
Yeah; However what we're trying to do here is tell apart two requests from the same caller, so we need to generate this id within the current service. (or rely on caller to never call us with the same |
|
|
|
I'm not sure this is quite what we want, in the past we have discussed service-local request identifiers, which we can use to associate request data with all service logging for that request, not necessarily connected to tracing, though there are definitely parallels. |
|
Well, if we don't connect this to tracing then it has to be propagated manually when you say submit something to an executor while processing a request, right? I don't think there's a point to having multiple thread-locals that we carry around to mark 'logically still the same unit of work'. Or is that not what you meant?
This seems like a different question - you want to see what the request path for a service log line was? Sure, seems possibly useful, but unrelated to my problem :P |
Wouldn't it allow us to associate a span ID to the request, and the request ID to all request logging, solving your problem? |
|
That assumes that the incoming span id is unique, which is the case for conjure clients currently but that seems like an implementation detail. |
|
The server generates a new span for the requests parented to the span sent by the client, so that one will always be unique. |
Yeah, but the Trace can't really know that the second-to-oldest span in the stack is the special local-unique one. Also, it's totally possible to close the request serving span (return a value) but still want to log things related to that request with the same traceid / localTraceId. For example you might be submitting an asynchronous cleanup task to run after your request completes, which is related to the trace/request, but not contained within it. |
|
You can parent a span to one that's already closed. |
|
Hm, actually, I thought that the originating span id was the id of the top span in the stack, but actually it's the parent of the top span in the stack. So this can be done without the extra field / constructors. Will change |
|
e57846e - we no longer maintain a separate id for the local trace, and instead expose the id of the topmost span in the stack. |
e57846e to
c72b86f
Compare
|
(check failures are just the TODOs, which I'm planning to remove when I know if the current behaviour is a bug or not) |
|
All in all I think we should wire up the propagating the originatingSpanId (and your topSpanId) through the Trace, and remove the originatingSpanId from the OpenSpan as it doesn't belong there (as I explained in more detail in #352 (comment)) |
Previously cloning an unsampled trace would lose track of originatingSpanId, which mainly mattered when recovering the original trace after a `withTrace` call.
d785629 to
bbb2376
Compare
|
I just noticed this, but since |
carterkozak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this uses the top-span identifier. If we implement this approach, how do we take advantage of the value without requiring all products to make code changes?
| startSpan(Optional.of(parentSpanId)); | ||
| if (numberOfSpans == 0) { | ||
| originatingSpanId = Optional.of(parentSpanId); | ||
| topSpanId = Optional.of(Tracers.randomId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever want to generate a new random ID if the SpanType is not SERVER_INCOMING? In all other cases I think the traceId we already have should be sufficient.
|
It doesn't look like this uses the top-span identifier. If we implement this approach, how do we take advantage of the value without requiring all products to make code changes?
Followup PR is to update sls-logging to inject the identifier as an extra
param to service logs. (that's what the jersey interceptor change
facilitates)
No opinion on filtering to only the SERVER_INCOMING span types; I assume
CLIENT_OUTGOING spans should ~never be the root span, so the question is
whether this might be useful for local spans? Which would require someone
starting multiple span stacks with the same traceid.
…On Tue, 19 Nov 2019 at 21:44, Carter Kozak ***@***.***> wrote:
***@***.**** commented on this pull request.
It doesn't look like this uses the top-span identifier. If we implement
this approach, how do we take advantage of the value without requiring all
products to make code changes?
------------------------------
In tracing/src/main/java/com/palantir/tracing/Trace.java
<#352 (comment)>:
> }
@OverRide
void fastStartSpan(String _operation, String parentSpanId, SpanType _type) {
- startSpan(Optional.of(parentSpanId));
+ if (numberOfSpans == 0) {
+ originatingSpanId = Optional.of(parentSpanId);
+ topSpanId = Optional.of(Tracers.randomId());
Do we ever want to generate a new random ID if the SpanType is not
SERVER_INCOMING? In all other cases I think the traceId we already have
should be sufficient.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#352?email_source=notifications&email_token=AAJHSGI55CPVOD6PJIDG6B3QURM2ZA5CNFSM4JNO2FWKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCMESXQI#pullrequestreview-319368129>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJHSGKY4UUDSLJXCSLKTRLQURM2ZANCNFSM4JNO2FWA>
.
|
|
Notes from a sync with @JacekLach @iamdanfox @dansanduleac and myself, tagging @ferozco for context as we chatted previously: This is a divergence from standard zipkin, so we should be careful that we don't implement it in a way that prevents us from future features or refactors. However, tracing state is correctly propagated through most applications, and adding additional state would require a heavy investment. In particular utilities for asynchronous tracing (executors, detached traces, etc) would need to be duplicated across every codebase to additionally handle requestId, which is clearly not desirable. We should implement this without any change to public API, this way we're not locked into supporting request identifiers in the tracing library for perpetuity, only that we maintain the functionality to associate logging with a request. We can achieve this by setting a known MDC key, |
|
linking to the in-progress PR #364 |
Before this PR
Logs from concurrent incoming requests with the same traceid are currently difficult
to tell apart (one can look at thread names etc, but not across executors).
After this PR
==COMMIT_MSG==
This introduces a new optional property on a Trace, topSpanId, that can be read to disambiguate span stacks that belong to the same traceid bug had different entry points.
Use that in tracing-undertow and tracing-jersey to tell apart requests with the same traceid.
==COMMIT_MSG==
Possible downsides?
The unsampled span has to keep a bit more state (not just the parent of the top span as per originating span id, but also the top span id).
We have to generate a new span id in unsampled spans if the stack is empty.