Skip to content

Conversation

@JacekLach
Copy link

@JacekLach JacekLach commented Nov 14, 2019

Before this PR

Logs from concurrent incoming requests with the same traceid are currently difficult
to tell apart (one can look at thread names etc, but not across executors).

After this PR

==COMMIT_MSG==
This introduces a new optional property on a Trace, topSpanId, that can be read to disambiguate span stacks that belong to the same traceid bug had different entry points.

Use that in tracing-undertow and tracing-jersey to tell apart requests with the same traceid.
==COMMIT_MSG==

Possible downsides?

The unsampled span has to keep a bit more state (not just the parent of the top span as per originating span id, but also the top span id).
We have to generate a new span id in unsampled spans if the stack is empty.

@changelog-app
Copy link

changelog-app bot commented Nov 14, 2019

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Introduce a new property on a Trace, topSpanId, that can be read to disambiguate span stacks that belong to the same traceid but had different entry points.

Check the box to generate changelog(s)

  • Generate changelog entry

@JacekLach
Copy link
Author

Followups are to include this new property in request and service logs

…raceid

Logs from concurrent incoming requests with the same traceid are currently difficult
to tell apart (one can look at thread names etc, but not across executors).

This introduces a new optional property on a Trace, localTraceId, that we set to
a new unique identifier if an incoming request already has a traceid assigned.
@iamdanfox
Copy link
Contributor

iamdanfox commented Nov 14, 2019

I am not super comfortable about diverging from the concepts that zipkin already define: https://zipkin.io/pages/architecture.html.. we've already got james' X-B3-Originating-Span-Id which we already put on the wire:

assertThat(intercepted.headers(TraceHttpHeaders.SPAN_ID)).hasSize(1);
assertThat(intercepted.headers(TraceHttpHeaders.TRACE_ID)).hasSize(1);
assertThat(intercepted.headers(TraceHttpHeaders.ORIGINATING_SPAN_ID)).isEmpty();
assertThat(intercepted.headers(TraceHttpHeaders.PARENT_SPAN_ID)).isEmpty();

if normal service logs had james' originating span id as a param somehow, wouldn't you be able to achieve the desired outcome (of differentiating different concurrent requests to one service from the same caller even if they aren't sampled?)

Edit nvm seems like these would be the same

@JacekLach
Copy link
Author

Yeah; ORIGINATING_SPAN_ID relates together requests made from the same caller within a traceid (and disambiguates them from requests hitting current service from a different caller, within the traceid).

However what we're trying to do here is tell apart two requests from the same caller, so we need to generate this id within the current service. (or rely on caller to never call us with the same x-b3-spanid)

@JacekLach
Copy link
Author

tracing-java is pretty much the only place we can do this in, because we want all the usual trace propagation (executors, thread changes, etc) to apply to this property.

@carterkozak
Copy link
Contributor

I'm not sure this is quite what we want, in the past we have discussed service-local request identifiers, which we can use to associate request data with all service logging for that request, not necessarily connected to tracing, though there are definitely parallels.
That way we could build queries for service logging from product Foo for requests to endpoint GET /api/bar.

@JacekLach
Copy link
Author

Well, if we don't connect this to tracing then it has to be propagated manually when you say submit something to an executor while processing a request, right?

I don't think there's a point to having multiple thread-locals that we carry around to mark 'logically still the same unit of work'. Or is that not what you meant?

That way we could build queries for service logging from product Foo for requests to endpoint GET /api/bar.

This seems like a different question - you want to see what the request path for a service log line was? Sure, seems possibly useful, but unrelated to my problem :P

@carterkozak
Copy link
Contributor

unrelated to my problem :P

Wouldn't it allow us to associate a span ID to the request, and the request ID to all request logging, solving your problem?

@JacekLach
Copy link
Author

That assumes that the incoming span id is unique, which is the case for conjure clients currently but that seems like an implementation detail.
(unless you mean we'd generate a fresh id anyway? but then that seems to reduce to this pr, just bigger)

@sfackler
Copy link
Member

The server generates a new span for the requests parented to the span sent by the client, so that one will always be unique.

@JacekLach
Copy link
Author

The server generates a new span for the requests parented to the span sent by the client, so that one will always be unique.

Yeah, but the Trace can't really know that the second-to-oldest span in the stack is the special local-unique one.

Also, it's totally possible to close the request serving span (return a value) but still want to log things related to that request with the same traceid / localTraceId. For example you might be submitting an asynchronous cleanup task to run after your request completes, which is related to the trace/request, but not contained within it.

@sfackler
Copy link
Member

You can parent a span to one that's already closed.

@JacekLach
Copy link
Author

Hm, actually, I thought that the originating span id was the id of the top span in the stack, but actually it's the parent of the top span in the stack. So this can be done without the extra field / constructors. Will change

@JacekLach
Copy link
Author

e57846e - we no longer maintain a separate id for the local trace, and instead expose the id of the topmost span in the stack.

@JacekLach
Copy link
Author

(check failures are just the TODOs, which I'm planning to remove when I know if the current behaviour is a bug or not)

@iamdanfox iamdanfox changed the title Introduce localTraceId to disambiguate incoming requests that share traceid Introduce topSpanId to disambiguate incoming requests that share traceid Nov 18, 2019
@dansanduleac
Copy link
Contributor

All in all I think we should wire up the propagating the originatingSpanId (and your topSpanId) through the Trace, and remove the originatingSpanId from the OpenSpan as it doesn't belong there (as I explained in more detail in #352 (comment))

Previously cloning an unsampled trace would lose track of originatingSpanId,
which mainly mattered when recovering the original trace after a `withTrace`
call.
@dansanduleac
Copy link
Contributor

I just noticed this, but since Trace.top() already exists and gets you the innermost span, which is the opposite of what you're introducing, we should probably call your topSpanId something else - maybe outermostSpanId?

Copy link
Contributor

@carterkozak carterkozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this uses the top-span identifier. If we implement this approach, how do we take advantage of the value without requiring all products to make code changes?

startSpan(Optional.of(parentSpanId));
if (numberOfSpans == 0) {
originatingSpanId = Optional.of(parentSpanId);
topSpanId = Optional.of(Tracers.randomId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever want to generate a new random ID if the SpanType is not SERVER_INCOMING? In all other cases I think the traceId we already have should be sufficient.

@JacekLach
Copy link
Author

JacekLach commented Nov 19, 2019 via email

@carterkozak
Copy link
Contributor

Notes from a sync with @JacekLach @iamdanfox @dansanduleac and myself, tagging @ferozco for context as we chatted previously:

This is a divergence from standard zipkin, so we should be careful that we don't implement it in a way that prevents us from future features or refactors. However, tracing state is correctly propagated through most applications, and adding additional state would require a heavy investment. In particular utilities for asynchronous tracing (executors, detached traces, etc) would need to be duplicated across every codebase to additionally handle requestId, which is clearly not desirable.

We should implement this without any change to public API, this way we're not locked into supporting request identifiers in the tracing library for perpetuity, only that we maintain the functionality to associate logging with a request.

We can achieve this by setting a known MDC key, _requestId, the same way we set traceId and _sampled. This value should be initialized only when a new root span is created with span type SERVER_INCOMING.

@iamdanfox
Copy link
Contributor

linking to the in-progress PR #364

@iamdanfox iamdanfox closed this Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants