Skip to content

Conversation

@fhanau
Copy link
Contributor

@fhanau fhanau commented Nov 13, 2025

  • Introduce a new WorkerTracer parameter indicating whether any BTWs are
    present. In a follow-up, this will be used to optimize memory management,
    but for now it helps us assert that if we have a tracer with logLevel none,
    we have BTWs (otherwise the tracer would be redundant, indicating waste).
  • Log an error if a WorkerTracer is destructed without getting the Outcome event
    even when logLevel == none

@fhanau fhanau force-pushed the felix/110925-stw-2 branch from d8a2a9c to 3d6898e Compare November 20, 2025 03:21
@fhanau fhanau changed the title [DRAFT][o11y] Additional hardening against missing outcome event, STW optimizations [o11y] Additional hardening against missing outcome event, STW optimizations Nov 20, 2025
@fhanau fhanau marked this pull request as ready for review November 20, 2025 03:22
@fhanau fhanau requested review from a team as code owners November 20, 2025 03:22
auto& tailStreamWriter = KJ_UNWRAP_OR_RETURN(maybeTailStreamWriter);
// This is where we'll actually encode the span. This function should never be invoked if STW is
// inactive as span tracing is only used in STW.
auto& tailStreamWriter = KJ_ASSERT_NONNULL(maybeTailStreamWriter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intended as additional hardening, using KJ_UNWRAP_OR_RETURN could cause us to miss out on span tracing being set up when it is not being used – I'm highly confident that we're already avoiding this though.


KJ_IF_SOME(writer, maybeTailStreamWriter) {
auto& spanContext = KJ_UNWRAP_OR_RETURN(topLevelInvocationSpanContext);
auto& spanContext = KJ_ASSERT_NONNULL(topLevelInvocationSpanContext);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This serves as additional hardening too, I'm very confident that we always provide the Onset event when doing tracing.

@fhanau
Copy link
Contributor Author

fhanau commented Nov 20, 2025

Downstream PR is still WIP, but this is ready for review.

@fhanau fhanau requested a review from mar-cf November 20, 2025 03:26
@fhanau fhanau force-pushed the felix/110925-stw-2 branch from 3d6898e to 5dc1403 Compare November 24, 2025 02:06
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

@fhanau fhanau force-pushed the felix/110925-stw-2 branch from 5dc1403 to b329282 Compare November 24, 2025 02:20
Copy link
Contributor

@mar-cf mar-cf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are changes here that shouldn't be bundled into the same commit.

  1. hasBufferedTailWorkers flag (currently buffered). Used for an optimization to skip buffering when no BTWs exist, and to confirm efficiently creating tracers with valid consumers.
  2. Onset detection + stricter assertions. How do the stricter assertion manifest? If we detect a tracing misconfiguration, we can report that but still function degraded.

It's not from this change, but its not clear that topLevelInvocationSpanContext is being used for two purposes, one of which is as an implicit flag if the onset was reported.

// WorkerInterfaces and then ends up not using it due to an error/incorrect parameters; such
// error checking should be done beforehand to avoid unused allocations). Report such cases.
LOG_ERROR_PERIODICALLY(
"destructed WorkerTracer with STW without reporting Onset event", kj::getStackTrace());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now "destructed WorkerTracer without reporting Onset event"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, will update this

kj::Maybe<kj::Own<tracing::TailStreamWriter>> maybeTailStreamWriter);
explicit WorkerTracer(PipelineLogLevel pipelineLogLevel, ExecutionModel executionModel);
kj::Maybe<kj::Own<tracing::TailStreamWriter>> maybeTailStreamWriter,
bool buffered);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not a fan of bool arguments. Can this be a strong-bool or an enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do this.

@fhanau
Copy link
Contributor Author

fhanau commented Dec 2, 2025

There are changes here that shouldn't be bundled into the same commit.

  1. hasBufferedTailWorkers flag (currently buffered). Used for an optimization to skip buffering when no BTWs exist, and to confirm efficiently creating tracers with valid consumers.
  2. Onset detection + stricter assertions. How do the stricter assertion manifest? If we detect a tracing misconfiguration, we can report that but still function degraded.

It's not from this change, but its not clear that topLevelInvocationSpanContext is being used for two purposes, one of which is as an implicit flag if the onset was reported.

Here's the intent of this PR, why I think this PR contains the right amount of changes:
1.) We are not actually doing such an optimization in this PR (as you also mention in #5527 (comment)), that would be a larger change. However, we do use the buffered variable to perform optimizations: Not constructing a WorkerTracer in some cases, please see the downstream PR for that. Based on that, we are adding the variable and making use of it for optimizations in the same change.
2) Note that these assertions do not catch a misconfiguration in which we could still provide some valid trace data, but serve to verify that the optimization that we use the buffered variable for is being applied correctly. If an assertion is violated, that indicates that we constructed a WorkerTracer or tail stream writer that will be wholly unused:
If logLevel == None, we do not perform any STW tracing. Therefore, any tail stream writer being present would be unused, and if we have buffered == false we won't have BTW tracing either and won't be doing tracing at all – we should have not constructed a WorkerTracer in the first place.

As for changes that are not needed here, I think we can get rid of hasBuffered() and the buffered class variable – at this stage, it is only used in the constructor. I'll update the PR accordingly.

@fhanau fhanau force-pushed the felix/110925-stw-2 branch from b329282 to baef58c Compare December 2, 2025 19:31
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 2, 2025

CodSpeed Performance Report

Merging #5527 will not alter performance

Comparing felix/110925-stw-2 (60ea4d1) with main (04fb23c)

Summary

✅ 140 untouched
⏩ 38 skipped1

Footnotes

  1. 38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@mar-cf
Copy link
Contributor

mar-cf commented Dec 5, 2025

Here's the intent of this PR, why I think this PR contains the right amount of changes:

My point was this PR looks to be two mostly independent changes in one. See the comment for how I would split this up.

we are adding the variable

It should be possible to do without adding this with a query that will check parents. I'm not extremely opposed to adding it, but maybe you could (or already have) consider it and provide a reason why this is better?

@fhanau fhanau force-pushed the felix/110925-stw-2 branch 2 times, most recently from 2d6e151 to bda6a55 Compare December 11, 2025 16:43
…zations

- Introduce a new WorkerTracer parameter indicating whether any BTWs are
  present. In a follow-up, this will be used to optimize memory management,
  but for now it helps us assert that if we have a tracer with logLevel none,
  we have BTWs (otherwise the tracer would be redundant, indicating waste).
- Log an error if a WorkerTracer is destructed without getting the Outcome event
  even when logLevel == none
@fhanau fhanau force-pushed the felix/110925-stw-2 branch from bda6a55 to 60ea4d1 Compare December 30, 2025 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants