Skip to content

Bound OTel traces emitted by long-running subscription handlers#63

Open
danielorbach wants to merge 12 commits into
mainfrom
issue-56
Open

Bound OTel traces emitted by long-running subscription handlers#63
danielorbach wants to merge 12 commits into
mainfrom
issue-56

Conversation

@danielorbach
Copy link
Copy Markdown
Owner

@danielorbach danielorbach commented May 31, 2026

A component built with this framework produced one OTel trace per process rather than per message: the lifecycle held a single span open across Procedure.Exec and every handler-derived span became its child. Tempo on the receiving side accumulated tens of thousands of spans under one trace ID, hit per-tenant live-trace limits, and dropped exports for unrelated services on the same pipeline. Tail-based sampling did not help because the trace never completed.

The framework now opens no span across Procedure.Exec. Spans started from l.Context() are roots of independent traces, bounded by the work done in one handler invocation. Lifecycle identity travels as attributes (component.name, footprint.identifier, footprint.revision, footprint.solution) propagated through a context value and stamped onto every span by a SpanProcessor users register on their TracerProvider. The pair (footprint.identifier, component.name) is the effective anchor operators query when filtering trace data for one component, and a process-stable process.nonce attribute closes the gap between two replicas loading the same footprint.

The cost of dropping the long-lived span is the loss of clickable link navigation between a handler span and "its" lifecycle. A synthesized anchor SpanContext, and a perpetually-zero-duration anchor span, were considered and rejected: backends without good link resolution would see dangling links, the value-add over attribute filtering was marginal, and both would have reintroduced retention-drift fragility. Operators pivot via attribute filters; the loader emits a WARNING log at Load carrying the nonce so log-to-trace correlation works without changing exporter setup.

L.Error, L.Errorf, L.Fatal, and L.Fatalf become log-only and are deprecated; they previously recorded onto the long-lived span and have no replacement target without a call-site context. ErrorContext and FatalContext variants take ctx so they can record onto whatever span the caller has active. WithSpan and WithForkSpanName similarly deprecate to no-ops since their consumer is gone. All in-repo callers migrate; staticcheck reports zero SA1019 warnings after the final commit.

Three things this PR explicitly does not address. Optional clickable linking from handler spans to a bootstrap-or-anchor span can land later via the same ctx value the SpanProcessor already reads, with no API breakage. The deprecated method bodies stay log-only for v1 compatibility; removal is a future major. Merge semantics on nested WithForkAttributes intentionally match plain context.WithValue with no special-case code, so callers that need to extend an inherited slice include the inherited values explicitly.

The earlier approach — detach the long-lived span from l.Context() but keep it open off-context — is preserved at #64 for reference. The stronger no-long-lived-spans invariant landed after that approach was evaluated.

Resolves #56.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the long-lived per-lifecycle OTel span (issue #56) so that spans started from L.Context() are roots of independent traces, and propagates lifecycle identity via context-stored attributes that a new NewSpanProcessor stamps onto every span. Adds bounded framework spans for runCleanup and Component.Bootstrap, a process-stable process.nonce, and Context-suffixed variants of Error/Fatal so per-handler spans can be the recording target.

Changes:

  • New tracing.go (tracer, processNonce, WithAttributes/WithForkAttributes, NewSpanProcessor) plus package doc; lifecycle.go no longer opens a lifecycle span and wraps cleanup in a bounded <name>.cleanup span.
  • Error/Fatal/WithSpan/WithForkSpanName deprecated; new ErrorContext/FatalContext (+f variants) record onto caller-supplied span; in-repo callers migrated.
  • loader back-links Claim.Footprint, forks components with identity attributes via claimIdentity, emits a bounded <component>.bootstrap span, and warns at Load with the process nonce; kafkaloader/fileloader updated accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tracing.go New tracer, process nonce, identity-attribute context plumbing, and NewSpanProcessor.
tracing_test.go Coverage for attribute propagation/replacement, nonce stamping, cleanup span, root spans, and context error/fatal recording.
lifecycle.go Drops lifecycle span, stashes identity attrs on ctx, wraps cleanup in bounded span, deprecates Error/Fatal, adds *Context variants.
procedure.go Removes span field and SpanName, deprecates WithSpan/WithForkSpanName, adds attrs to lifecycleOptions.
doc.go New package doc describing the tracing contract, identity propagation, and operator anchor.
loader/loader.go Back-links Claim.Footprint, logs process nonce, forks with WithForkAttributes, bounded <component>.bootstrap span, FatalfContext for bootstrap errors.
loader/example_test.go Migrates ErrorfErrorfContext.
fileloader/jsonloader.go FatalfFatalfContext.
fileloader/yamlloader.go FatalFatalfContext.
fileloader/fileloader.go WithSpan("loader")WithName("loader").
kafkaloader/kafkaloader.go WithSpanWithName, error/fatal migrated to *Context variants, span ctx threaded into handler.
examples/direct/{main,ping,pong}.go Migrate to *Context variants; pong.go adds bounded pong.echo consumer span.
examples/embed/{ping,pong,probe}.go Migrate to ErrorfContext.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread loader/loader.go
Comment thread procedure.go
Comment thread procedure.go Outdated
Comment thread loader/loader.go
The framework's plan for issue #56 is to drop the long-lived lifecycle
span and identify spans by attributes propagated through l.Context()
instead. This commit lands the propagation primitive: WithAttributes
and WithForkAttributes stash a slice in the context; NewSpanProcessor's
OnStart reads it and applies the attributes to every span started
under that context. No caller invokes the new options yet; subsequent
commits rewire the loader and reference examples onto this primitive.

The slice-replacement semantics fall out of plain context.WithValue:
a fork that sets its own WithForkAttributes shadows the parent's value
via the most-nested-wins rule, and a fork without the option inherits
naturally. The framework does not merge; callers that want to extend
the inherited slice must include the inherited values explicitly.
OpenTelemetry Resource is not used because lifecycle identity has to
be per-context (multiple components per process need distinct
identifiers); SpanProcessor reading context values is the in-process
equivalent.
Adds ErrorContext, ErrorfContext, FatalContext, and FatalfContext on L,
mirroring slog.Logger.ErrorContext. Each variant logs via the lifecycle
and, if ctx carries a recording span, records the error onto that span
(Fatal additionally sets the span status to Error and exits the
lifecycle goroutine).

The motivation is the next commit, which detaches l.Context() from the
long-lived lifecycle span. After that change, L.Error's
trace.SpanFromContext(l.ctx) call has nothing recording to record onto,
and the value is in the call site's handler span — which L itself
cannot see. The ctx parameter is how the call site surfaces it.

Existing Error and Fatal remain operational against the long-lived
lifecycle span; their deprecation lands with the long-lived span's
removal, where their behavior change is the immediate consequence.
Issue #56 is rooted in the framework opening one OTel span per L and
holding it open for the lifecycle's full duration. A subscription
Procedure that handled tens of thousands of messages per run produced
a single trace whose span count grew with uptime; downstream Tempo
ingesters dropped batches once per-tenant live-trace limits were hit,
causing collateral failures for unrelated services on the same trace
pipeline.

This commit removes the lifecycle span from execute and drops the
asynchronous reaper goroutine that closed it. Spans started from
l.Context() are now roots of independent traces, bounded per
invocation; the SpanProcessor introduced earlier ties them to the
lifecycle via attribute filtering rather than parent-child nesting.

The old Error/Errorf/Fatal/Fatalf methods used trace.SpanFromContext
on the lifecycle context to record errors; with no span there to
record onto, the SpanFromContext call returns a no-op span. Rather
than silently dropping the trace integration, the methods are now
log-only and marked Deprecated, pointing callers to the *Context
variants that can record onto a handler-scope span at the call site.

The package-level tracer var moves out of procedure.go in this commit
because no caller in component remains. The next commit reintroduces
it for the bounded cleanup span around runCleanup.
The previous commit dropped the long-lived lifecycle span. Cleanup
timings and panics during runCleanup were observable through that
span; with it gone they no longer surface in trace data. This commit
restores that visibility with a bounded <name>.cleanup span scoped to
runCleanup alone — open at the moment cleanup begins, closed when the
last cleanup func returns or a panic unwinds the deferred frame.

The span is always opened (no option) because the cost of an empty
cleanup span is negligible and the alternative — silent cleanups —
removes the only observability for shutdown-path errors. The wrapping
defer sits between close(l.done) and l.wg.Wait so cleanup runs after
forked children complete but before completion is signalled to
external listeners, matching the existing runCleanup ordering.

The package-level tracer var reintroduced in this commit lives in
tracing.go alongside the SpanProcessor it pairs with; the framework's
own span emission is colocated.

Test setup moves to TestMain so the SpanProcessor binds once for the
entire test binary. OTel's global TracerProvider delegate is set by
sync.Once on the first SetTracerProvider call, so the per-test install
helper used in earlier commits left subsequent tests reading from the
first test's stale provider. A single binary-scoped recorder removes
the stale-delegate gotcha; tests disambiguate by unique span names.
WithSpan and WithForkSpanName existed to label the long-lived
lifecycle span, which is gone after the previous commits. The options
remain in the API surface for v1 source compatibility but their bodies
are no-ops; both are marked Deprecated, pointing callers at WithName /
WithForkName for the L-naming concern the framework still cares about.

The lifecycleOptions.span field is removed alongside the SpanName
helper that read it — the user pushed to remove the field rather than
leave a dangling unused member. The Fork code path no longer sets it.

fileloader and kafkaloader were the two in-repo callers of
WithSpan("loader"). They are migrated to WithName("loader") so the
loader L keeps its log prefix, pprof label, and cleanup span name
under "loader"; without the migration those would have collapsed to
empty strings.
The next commit makes Claim.Exec build the lifecycle's identity
attributes from the footprint it belongs to (footprint.identifier,
footprint.revision, footprint.solution). Today Claim sees only itself
and its Component descriptor; the Footprint that allocated it is not
reachable from inside Exec, so the wiring would either duplicate every
caller's footprint plumbing or change Exec's signature.

A back-pointer set during Load — symmetric to the existing
Component-on-Claim field — gives Exec the footprint it needs without
either disruption. Users that build Claims by hand for tests may leave
Footprint nil; the next commit's Exec degrades to component-only
identity when Footprint is absent.
forkGroup.Fork now attaches the component's identity attributes
(component.name plus footprint.identifier / revision / solution) to
the forked L via WithForkAttributes. The framework's SpanProcessor
reads those values from l.Context() and stamps them onto every span
the Component (or any of its sub-procedures) starts, so operators can
filter trace data by component or footprint without each Component
plumbing attributes through its own instrumentation.

Claim.Exec opens its own short-lived <component>.bootstrap span around
Component.Bootstrap. The span replaces the role of the long-lived
lifecycle span that issue #56 removed: a bounded record of bootstrap
timing and outcome, with bootstrap-specific attributes (binding,
options, locations, footprint name/description) on the span itself
and identity attributes auto-applied by the SpanProcessor. Bootstrap
failures set Error status on the span before FatalfContext exits the
goroutine.

A comment on forkGroup.Fork notes the gap operators need to know
about: the effective lifecycle anchor in trace data is the pair
(footprint.identifier, component.name); two processes loading the
exact same footprint produce attribute-indistinguishable spans. A
later commit closes the gap with a process nonce.
Issue #56 asked for the pong example to demonstrate the right
per-message instrumentation shape, since the silent example was the
shape readers copied when reaching for a reference. This commit adds
a pong.echo span around the per-message send work, started from
l.Context() with SpanKindConsumer.

l.Context() is detached from any long-lived span after the framework
rework, so the span is a root — bounded to one message, regardless
of how long the subscription runs. component.NewSpanProcessor stamps
it with the lifecycle's identity attributes (component.name=pong,
footprint.identifier=..., footprint.revision=..., footprint.solution
=...) by reading them from the context value the framework set when
the lifecycle was created. The receive-error and send-error paths
move to ErrorfContext so the error records onto whatever span is
active at the call site rather than going trace-blind.

Sub-procedures of bootstrapping Components inherit identity through
the ctx-value chain, so the "sub" L this code runs under sees the
same identity attributes that the bootstrap span carries — no
re-declaration here.
A new reader reaching the package needs to understand four things
about tracing before writing a handler: l.Context() is span-free,
spans started from it are roots, identity travels as attributes via
the SpanProcessor, and (footprint.identifier, component.name) is the
effective anchor in trace data. doc.go states each plainly and shows
the SDK setup that activates identity propagation.

The cross-process collision risk is documented as it stands today:
two processes loading the same Footprint produce attribute-
indistinguishable spans, and operators relying on the framework's
identity alone will see them conflated. The doc points at process-
level resource attributes (service.instance.id, host.name) as the
escape hatch operators have today.

The deprecation of L.Error and L.Fatal lands in the doc here too —
the *Context variants are the right call for handler-scope error
recording now that there is no long-lived span for the unsuffixed
methods to record onto.
The framework's identity attributes (component.name, footprint.*) are
not unique across unix processes: two replicas loading the same
Footprint would otherwise produce attribute-indistinguishable spans
in trace data, and operators triaging an incident would have no
attribute pivot to separate the two feeds.

A package-level uuid generated at init becomes the process.nonce
attribute on every span the framework's SpanProcessor sees. The nonce
is stable per-process so attribute filters by process.nonce isolate
one replica's spans across the full trace timeline. ProcessNonce()
exposes it for callers that want to surface the value alongside log
data.

The loader emits a WARNING log line at the start of Load carrying the
footprint identifier, revision, and the nonce, giving operators a
pivot from log lines to trace data.

The doc package overview reflects the new behaviour — the cross-
process collision caveat the previous commit documented is now closed
by the nonce.
The L.Error / L.Errorf / L.Fatal / L.Fatalf methods were deprecated
when the long-lived lifecycle span was removed — they have no span to
record onto and only log. The replacements take a context so they can
record onto whatever span is active at the call site (typically a
per-handler span the caller has opened).

Every in-repo caller migrates to the ctx-bearing form. Where no
handler-scope span is active (most subscription loops, bootstrap
error paths in fileloader / kafkaloader), the call passes l.Context()
— the *Context method still tolerates a span-free context, just
records nothing in that case. Where a handler span is in scope (the
handleFootprint span in kafkaloader, the per-probe ctx in
examples/embed/probe), the call passes that ctx instead so the error
lands on the relevant span.

examples/direct/main.go drops the now-unused "fmt" import that the
migration left behind.

After this commit, golangci-lint reports no SA1019 warnings across
the module.
The YAML footprint() constructor populated Name, Metadata, Identifier,
and Locations but silently dropped Solution and Revision, even though
FootprintYAML carries both and the JSON loader copies them. With the
loader now deriving span identity attributes from these fields, every
YAML-loaded lifecycle stamped footprint.solution="" and
footprint.revision=0, breaking the operator-facing trace anchor.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread lifecycle.go
// (bounded by runCleanup) so it does not reintroduce the unbounded
// lifecycle span that issue #56 removed.
defer func() {
_, span := tracer.Start(l.ctx, l.name+".cleanup")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[High] Cleanup span captures only duration — the documented error/panic observability is not delivered.

The span's context is discarded (_, span :=), so runCleanup and the cleanup funcs never see it. CleanupError now routes failures through the deprecated log-only l.Error (no RecordError), and a cleanup panic unwinds through defer span.End() with no recover()+RecordError, so the span ends with Unset status and zero events.

This contradicts doc.go ("cleanup-time durations and panics remain observable"), the inline comment here ("durations and errors are observable"), and the commit message's claim that the span "restores that visibility". It is also a regression: cleanup errors used to land on the lifecycle span via l.ErrorRecordError; now they land on no span at all.

Right-depth fix: thread the span's ctx into runCleanup(ctx) so CleanupContext/CleanupError can record, and add recover()+RecordError/SetStatus before the re-panic.

Comment thread examples/direct/pong.go
if err := pub.Send(ctx, &pubsub.Message{Body: []byte(echo)}); err != nil {
l.ErrorfContext(ctx, "send pong: %w", err)
}
span.End()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] Reference pong.echo span: End() not deferred (leak on panic) + no SetStatus on send error.

span.End() is called inline, so if pub.Send panics the span never ends/exports and the long-running loop accumulates un-ended spans. On a non-panic send error, ErrorfContext records an exception event but the span keeps status Unset, so a backend computing error rate from span status reports 0% errors while every echo is dropped.

The framework's own kafkaloader.handleFootprint calls span.SetStatus(codes.Error, ...) on its error path — this example (the shape issue #56 designated as the per-message reference) contradicts it. Use defer span.End() and set Error status in the failure branch.

Comment thread loader/example_test.go
err := pub.Send(l.Context(), &pubsub.Message{Body: options.Message})
if err != nil {
l.Errorf("send: %w", err)
l.ErrorfContext(l.Context(), "send: %w", err)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] The loader-side feature has no executing test coverage. (re: loader/example_test.go:21, Example_loader)

Example_loader has no // Output: line, so go test compiles but never runs it, and loader/ contains zero func Test…. That leaves claimIdentity, the claim.Footprint = &footprint back-link, g.Fork's WithForkAttributes wiring, and Claim.Exec's bootstrap span/attributes entirely unexercised — tracing_test.go only covers the component-package primitives directly.

A regression in an attribute key, the back-link ordering, a dropped fork option, or a nil-deref on Footprint would pass CI green. Adding an // Output: line (or a real func Test… that forks a Claim through Load and asserts on recorded spans) would close the gap.

Comment thread loader/loader.go
// component name plus the footprint identifiers reachable through
// Claim.Footprint. Footprint may be nil for Claims built outside Load (e.g.
// tests); identity falls back to component name alone.
func claimIdentity(c *Claim) []attribute.KeyValue {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] Footprint identity attribute schema is hand-listed at five sites with no shared constructor.

The same footprint.* keys are restated in claimIdentity (here), the slog.Warn startup log using raw string keys (loader.go:109), Claim.Exec's bootstrap span (loader.go:390-400), loadFootprintComponents' skip/load span events (loader.go:211-223), and kafkaloader.handleFootprint (kafkaloader.go:100-107). The key sets and encodings already diverge — e.g. footprint.solution is treated as identity here but as descriptive metadata in kafkaloader.

Renaming or retyping a dimension (or adding footprint.solution to the span events to match) means editing five spots across two packages plus doc.go; miss one and the (footprint.identifier, component.name) anchor the docs advertise silently drifts. A single footprintIdentity(fp) helper would define it once.

Comment thread examples/direct/main.go
sub, err := binding.LinkInterest(l.Context(), PongAspect)
if err != nil {
l.Fatal(err)
l.FatalContext(l.Context(), err)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium-low] The example never registers NewSpanProcessor() that its own comments rely on. (re: EntrypointProc at main.go:42 and the comment at pong.go:56)

No non-test file calls otel.SetTracerProvider/NewSpanProcessor(), yet pong.go:56 asserts "component.NewSpanProcessor stamps the span with the lifecycle's identity attributes (component.name=pong, ...)". As shipped, the direct example produces spans with no identity attributes and no process.nonce, so a reader copying it — or following doc.go's "read the loader's process.nonce log line, then pivot to spans" workflow — gets nothing.

Issue #56 asked the example to demonstrate the right instrumentation shape; it shows span creation but not the identity-propagation half. Wiring up a TracerProvider with NewSpanProcessor() in main (even with a stdout exporter) would make the example match its comments.

Comment thread tracing_test.go
// waitForSpan polls sr.Ended() until a span with name appears, accommodating
// the framework's asynchronous span reaper.
func waitForSpan(t *testing.T, sr *tracetest.SpanRecorder, name string) sdktrace.ReadOnlySpan {
t.Helper()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low-medium] waitForSpan polls for an "asynchronous span reaper" that this PR removed.

The doc comment says it polls "accommodating the framework's asynchronous span reaper" — but the reaper goroutine was deleted in this PR. All spans (cleanup, bootstrap, handler) now end synchronously before RunProc returns, so the poll loop, deadline, and SyncTimeout dependency are dead complexity referencing a mechanism the same PR deleted. A direct sr.Ended() lookup after RunProc returns now suffices and would fail loudly on a real regression instead of timing out.

Comment thread loader/loader.go
// Back-link each Claim to the Footprint it belongs to so Claim.Exec can
// build identity attributes (component.name + footprint.*) without
// threading the Footprint through every call site.
for _, claim := range footprint.Allocations {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low-medium, latent] claim.Footprint = &footprint aliases the by-value Load parameter.

This takes the address of Load's local Footprint copy and writes it into every *Claim in the caller's shared Allocations slice: all claims alias one heap-escaped snapshot (decoupled from the caller's Footprint), and the caller's Claim objects are mutated as a side effect.

Safe today only because loaded.Swap(true) forbids a second Load. The package TODOs plan multi-footprint support; once that guard relaxes (or a caller reuses a []*Claim), a second Load clobbers every claim's Footprint pointer and claimIdentity emits the wrong footprint.identifier/revision/solution.

Comment thread tracing.go
}
// process.nonce disambiguates spans emitted by different processes that
// share the same identity attributes (e.g., two replicas loading the
// same Footprint). The value is stable per-process.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low-medium] OnStart hot path: per-span process.nonce construction + two SetAttributes calls.

OnStart runs on every span start in the process. It builds attribute.String("process.nonce", processNonce) fresh each time and issues a second SetAttributes call. Since the nonce is process-constant, precompute a package-level attribute.KeyValue and fold it into a single SetAttributes.

Also: appending the nonce last makes it the first attribute dropped under a tight SpanLimits AttributeCountLimit, so replica disambiguation silently fails on attribute-heavy bootstrap spans — exactly where it's needed.

Comment thread tracing.go
// every span as the process.nonce attribute. Two unix processes with the same
// component identity (component.name, footprint.identifier, ...) would
// otherwise produce attribute-indistinguishable spans; the nonce gives
// operators a stable per-process dimension to disambiguate them.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low, design] process.nonce is a per-span attribute rather than an OTel Resource attribute.

doc.go itself names service.instance.id/host.name as the standard replica disambiguators, and no sdktrace.Resource is set anywhere. A random per-span UUID duplicates a value OTel models once at Resource scope and can't be correlated to the host/pod the way a Resource detector's host.name/k8s.pod.name can — operators still can't answer "which replica" without grepping the one startup log line. The right altitude is a documented WithResource(service.instance.id) on the caller's TracerProvider.

Comment thread lifecycle.go
// SpanProcessor returned by NewSpanProcessor can apply them to spans started
// under l.Context(). A nil slice leaves any inherited attributes from the
// parent context visible; a non-nil slice replaces them.
if options.attrs != nil {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low, API edge] No way to clear inherited identity via WithForkAttributes().

A zero-arg WithForkAttributes() passes a nil slice (Go variadic semantics), so this if options.attrs != nil guard skips re-stashing and the child inherits the parent's attributes. There's no public way to express "replace inherited identity with nothing", which contradicts a strict reading of the documented "a fork that sets its own WithForkAttributes fully replaces the inherited slice".

Non-triggering for the loader (claimIdentity always returns ≥1 attr) and the nil-inherits branch has no test, but it's a latent trap for future callers.

Comment thread tracing_test.go
var spanRecorder *tracetest.SpanRecorder

func TestMain(m *testing.M) {
spanRecorder = tracetest.NewSpanRecorder()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low, test isolation] TestMain installs one process-global recorder that is never reset.

Every RunProc across the package (lifecycle_test.go has ~30) now emits a real <name>.cleanup span into this single, ever-growing spanRecorder, and waitForSpan returns the first span matching a name. It works only because names are currently unique — the two WithName("") tests in TestL_Name already each emit a span named .cleanup into the shared recorder. A future or reordered test that reuses a span name, or asserts on span counts/absence, would read an unrelated earlier test's span, giving order-dependent flakiness.

Comment thread tracing_test.go
defer span.End()
l.FatalContext(handlerCtx, fmt.Errorf("handler fatal"))
aborted = false // unreachable
_ = aborted
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low, test smell] TestL_FatalContext/RecordsAndAborts never verifies the abort.

aborted = false // unreachable re-assigns the already-zero value and _ = aborted is never read after RunProc returns, so nothing checks that FatalContext actually called runtime.Goexit(). A regression where FatalContext returned normally instead of aborting would still pass, because RecordError/SetStatus run before Goexit regardless. Set a flag after the FatalContext call and assert it stayed false to actually cover the abort.

@danielorbach
Copy link
Copy Markdown
Owner Author

Could not attach inline (the target line is outside the diff hunk).

procedure.go:53[Low] ProcE.Exec and CleanupError still call the deprecated log-only methods.

ProcE.Exec here calls l.Fatal(err), and CleanupError (lifecycle.go:619) calls l.Error(err) — both now log-only. These are the framework's own internal error funnels (every ForkE/ProcE failure; every CleanupError/Background/Context failure), so the PR's "all in-repo callers migrate" is inaccurate and these paths record onto no span.

ProcE has no ctx to thread; CleanupError could record if the cleanup span's ctx were plumbed into runCleanup (see the lifecycle.go:221 finding). Worth a note either way so the deprecation isn't read as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bounding OTel traces emitted by long-running subscription handlers

2 participants