Skip to content

[API] Add public API to create a span with caller-provided trace_id while preserving root span (no parent_id) #4963

@prashanthb951

Description

@prashanthb951

Is your feature request related to a problem?

Yes. There is no public OTel API to create a span that carries a
caller-provided trace_id while remaining a ROOT span (no parent_id).

Service : FastAPI middleware running on GCP Cloud Run
Upstream : API Gateway injects W3C traceparent header on every request
Backends : GCP Cloud Logging + Arize AI (LLM observability)

Two hard requirements that must BOTH be satisfied:

REQ 1 — GCP Log Correlation:
span.trace_id MUST equal caller's trace_id from traceparent header.
GCP Cloud Logging uses this to correlate logs across services
in Cloud Trace UI. If trace_id differs → logs from this service
are completely disconnected from upstream caller's logs.

REQ 2 — Arize Root Span:
span.parent_id MUST be None.
Arize AI requires the span to be a ROOT span to correctly group
LLM sessions, display full traces and export custom attributes
(session.id, input.value, output.value).
If parent_id is set → Arize sees an orphaned child of a parent
that does not exist in its system → broken traces, missing
attributes, incorrect session grouping.

OTel public API offers only two paths:

Path 1 — NonRecordingSpan as parent:
→ trace_id inherited ✅ (REQ 1 satisfied)
→ parent_id gets set ❌ (REQ 2 broken)

Path 2 — context=None (no parent):
→ parent_id = None ✅ (REQ 2 satisfied)
→ fresh trace_id ❌ (REQ 1 broken)

No public API path satisfies both simultaneously.
SpanContext is immutable after creation with no public setter.

Only working solution today is overriding PRIVATE _context attribute:

span = tracer.start_span("POST /message", context=None)

if hasattr(span, "_context"):
    original = span.get_span_context()
    span._context = SpanContext(
        trace_id    = caller_trace_id,
        span_id     = original.span_id,
        is_remote   = False,
        trace_flags = caller_trace_flags,
    )

This works in production BUT:
❌ Uses a private undocumented attribute
❌ No public equivalent exists in OTel spec
❌ Fragile across SDK version upgrades
❌ Cannot be recommended to other teams

This affects any service that:

  • Acts as logical root of its own trace
  • Runs on a platform that injects trace headers
    (GCP, AWS, Azure) for platform-level log correlation
  • Uses a separate observability backend (Arize, Honeycomb,
    Datadog) that requires root spans for correct grouping

Describe the solution you'd like

I'd like a public API to create a span that carries a caller-provided
trace_id while remaining a ROOT span (no parent_id).

Any one of the following options would solve the problem:

─────────────────────────────────────────────────────────────────────
Option A — Add trace_id parameter to start_span()
─────────────────────────────────────────────────────────────────────

span = tracer.start_span(
name = "POST /message",
trace_id = caller_trace_id, # ← NEW optional parameter
context = None, # ← no parent — stays root
)

This is the most minimal and backwards-compatible change.
trace_id parameter would be optional — existing code unchanged.

─────────────────────────────────────────────────────────────────────
Option B — Public update method on Span after creation
─────────────────────────────────────────────────────────────────────

span = tracer.start_span("POST /message", context=None)

span.update_span_context(
trace_id = caller_trace_id, # ← NEW public method
)

Rules:

- Only trace_id can be updated (span_id unchanged)

- parent_id stays None (cannot be added post-creation)

- Can only be called before span.end()

- Raises exception if called after span.end()

─────────────────────────────────────────────────────────────────────
Option C — Dedicated root span factory method on Tracer
─────────────────────────────────────────────────────────────────────

span = tracer.start_root_span(
name = "POST /message",
trace_id = caller_trace_id, # ← carry upstream trace_id
kind = SpanKind.SERVER,
)

Guarantees by contract:

- parent_id = None always (cannot become child)

- is_remote = False always (we are the origin)

- trace_id = caller provided value

─────────────────────────────────────────────────────────────────────
Expected Behaviour (same for all options)
─────────────────────────────────────────────────────────────────────

span.get_span_context().trace_id == caller_trace_id ✅
span.parent == None ✅
span.get_span_context().is_remote == False ✅
span.get_span_context().is_valid == True ✅
span.is_recording() == True ✅

─────────────────────────────────────────────────────────────────────
Why This Is Different From Existing Context Propagation
─────────────────────────────────────────────────────────────────────

Existing propagation (NonRecordingSpan + valid span_id):
→ Inherits trace_id ✅
→ Sets parent_id = fake_span_id ❌
→ Observability backend sees orphaned child span ❌

Proposed API:
→ Carries trace_id ✅
→ parent_id = None (guaranteed root) ✅
→ Observability backend sees clean root span ✅

─────────────────────────────────────────────────────────────────────
Preference Order
─────────────────────────────────────────────────────────────────────

1st choice → Option A (least invasive, backwards compatible)
2nd choice → Option C (clearest intent, purpose-built)
3rd choice → Option B (post-creation mutation, less ideal)

Open to maintainer guidance on which fits best with
the existing API design philosophy.

Describe alternatives you've considered

All alternatives were tested in production on GCP Cloud Run
(FastAPI + Starlette middleware). Each fails to satisfy both
requirements simultaneously via public API alone.

Two requirements that must BOTH be satisfied:
REQ 1: trace_id == caller's trace_id (GCP log correlation)
REQ 2: parent_id == None (Arize root span)

─────────────────────────────────────────────────────────────────────
Alternative 1 — NonRecordingSpan + INVALID_SPAN_ID
─────────────────────────────────────────────────────────────────────

Code:
fake_ctx = SpanContext(
trace_id = caller_trace_id,
span_id = INVALID_SPAN_ID, # 0x0000000000000000
is_remote = False,
trace_flags = trace_flags,
)
fake_parent = NonRecordingSpan(fake_ctx)
span = tracer.start_span(
context = set_span_in_context(fake_parent)
)

Why it fails:
SpanContext.is_valid = (trace_id != 0 AND span_id != 0)
INVALID_SPAN_ID = 0x0000000000000000 → is_valid = False
OTel SDK sees is_valid = False → ignores trace_id entirely
→ generates a brand new random trace_id

Result:
REQ 1: ❌ trace_id NOT inherited (OTel generates fresh one)
REQ 2: ✅ parent_id = None (root span)

─────────────────────────────────────────────────────────────────────
Alternative 2 — NonRecordingSpan + valid random span_id + is_remote=True
─────────────────────────────────────────────────────────────────────

Code:
fake_span_id = int.from_bytes(os.urandom(8), "big")
fake_ctx = SpanContext(
trace_id = caller_trace_id,
span_id = fake_span_id, # non-zero → is_valid=True
is_remote = True,
trace_flags = trace_flags,
)
fake_parent = NonRecordingSpan(fake_ctx)
span = tracer.start_span(
context = set_span_in_context(fake_parent)
)

Why it fails:
is_valid = True → OTel SDK inherits trace_id ✅
BUT OTel SDK also sets span.parent_id = fake_span_id
NonRecordingSpan is never exported BUT parent_id is
still recorded on the real span.
Observability backend (Arize) sees span as CHILD of
a parent that does not exist in its system.
→ orphaned trace, broken session grouping,
custom attributes not exported correctly.

Result:
REQ 1: ✅ trace_id inherited correctly
REQ 2: ❌ parent_id = fake_span_id (NOT root)

─────────────────────────────────────────────────────────────────────
Alternative 3 — NonRecordingSpan + valid random span_id + is_remote=False
─────────────────────────────────────────────────────────────────────

Code:
fake_span_id = int.from_bytes(os.urandom(8), "big")
fake_ctx = SpanContext(
trace_id = caller_trace_id,
span_id = fake_span_id,
is_remote = False, # tried False instead of True
trace_flags = trace_flags,
)
fake_parent = NonRecordingSpan(fake_ctx)
span = tracer.start_span(
context = set_span_in_context(fake_parent)
)

Why it fails:
is_remote flag on the PARENT controls span.is_remote
on the child — it does NOT control whether parent_id
gets set on the child span.
parent_id is determined purely by whether parent
SpanContext.is_valid = True.
Since span_id is non-zero → is_valid = True →
parent_id still gets set to fake_span_id.
Same outcome as Alternative 2 regardless of is_remote value.

Result:
REQ 1: ✅ trace_id inherited correctly
REQ 2: ❌ parent_id = fake_span_id (NOT root)
is_remote=False makes no difference to parent_id

─────────────────────────────────────────────────────────────────────
Alternative 4 — span._context private attribute override (CURRENT WORKAROUND)
─────────────────────────────────────────────────────────────────────

Code:

Step 1: Create span as pure root

span = tracer.start_span(
name = "POST /message",
context = None, # no parent → root span
)

Step 2: Replace trace_id via private attribute

if hasattr(span, "_context"):
original = span.get_span_context()
span._context = SpanContext(
trace_id = caller_trace_id, # ← stitched
span_id = original.span_id, # ← unchanged
is_remote = False, # ← we are origin
trace_flags = caller_trace_flags,
)

Why it works but is not acceptable long-term:
✅ trace_id = caller's (GCP log correlation works)
✅ parent_id = None (Arize sees root span)
✅ is_recording = True (full attribute export works)
✅ Confirmed working in production logs

❌ _context is a PRIVATE undocumented attribute
❌ No public equivalent exists (SpanContext is immutable
by design — no public setter defined in OTel spec)
❌ Fragile across SDK version upgrades
❌ hasattr guard helps but is not a long-term contract
❌ Not portable across OTel implementations (Java, Go etc.)
❌ Not a pattern that can be recommended to other teams

Result:
REQ 1: ✅ trace_id stitched correctly
REQ 2: ✅ parent_id = None (root span)
BUT : ⚠️ relies entirely on private SDK internals

─────────────────────────────────────────────────────────────────────
Alternative 5 — Accept the limitation, choose one requirement
─────────────────────────────────────────────────────────────────────

Option 5a: Prioritise GCP log correlation, sacrifice Arize root:
→ Use NonRecordingSpan + valid span_id
→ GCP logs correlate correctly ✅
→ Arize shows orphaned child spans ❌
→ LLM session grouping broken in Arize ❌

Option 5b: Prioritise Arize root span, sacrifice GCP correlation:
→ Use context=None (pure root, fresh trace_id)
→ Arize shows clean root spans ✅
→ GCP logs use different trace_id to upstream caller ❌
→ Cannot correlate logs across services in Cloud Trace ❌

Both options unacceptable for production observability.

Result:
REQ 1 + REQ 2 cannot BOTH be satisfied via public API.

─────────────────────────────────────────────────────────────────────
Summary Table
─────────────────────────────────────────────────────────────────────

Alternative │ REQ 1 │ REQ 2
│ trace_id │ parent_id
│ stitched │ = None
─────────────────────────────────────┼──────────────┼────────────

  1. NonRecordingSpan + INVALID_SPAN_ID│ ❌ │ ✅
  2. NonRecordingSpan + is_remote=True │ ✅ │ ❌
  3. NonRecordingSpan + is_remote=False│ ✅ │ ❌
  4. _context override (workaround) │ ✅ │ ✅ (private)
    5a. Accept — GCP only │ ✅ │ ❌
    5b. Accept — Arize only │ ❌ │ ✅
    ─────────────────────────────────────┼──────────────┼────────────
    Proposed API (any option A/B/C) │ ✅ │ ✅ (public)

Additional Context

There is no public OTel API to create a span that carries a
caller-provided trace_id while remaining a ROOT span (no parent_id).

─────────────────────────────────────────────────────────────────────
Real World Context
─────────────────────────────────────────────────────────────────────

Service : FastAPI middleware running on GCP Cloud Run
Upstream : API Gateway injects W3C traceparent header on every request
Backends : GCP Cloud Logging + Arize AI (LLM observability)

Two hard requirements that must BOTH be satisfied:

REQ 1 — GCP Log Correlation:
span.trace_id MUST equal caller's trace_id from
traceparent header.
GCP Cloud Logging uses this to correlate logs across
services in Cloud Trace UI.
If trace_id differs → logs from this service are
completely disconnected from upstream caller's logs.

REQ 2 — Arize Root Span:
span.parent_id MUST be None.
Arize AI requires the middleware span to be a ROOT span
to correctly group LLM sessions, display full traces,
and export custom attributes (session.id, input.value,
output.value).
If parent_id is set → Arize sees an orphaned child of a
parent that does not exist in its system → broken traces,
missing attributes, incorrect session grouping.

─────────────────────────────────────────────────────────────────────
The Core Problem
─────────────────────────────────────────────────────────────────────

OTel's public API offers only two ways to influence trace_id
on a new span:

Path 1 — Pass a parent context (NonRecordingSpan):
→ trace_id inherited from parent ✅ (REQ 1 satisfied)
→ span.parent_id = parent's span_id ❌ (REQ 2 broken)
→ Arize sees orphaned child, attributes not exported

Path 2 — Pass context=None (no parent):
→ span.parent_id = None ✅ (REQ 2 satisfied)
→ OTel generates a fresh trace_id ❌ (REQ 1 broken)
→ GCP log correlation lost entirely

There is NO public API path that satisfies both simultaneously.
SpanContext is immutable after creation with no public setter.

The ONLY working solution today is overriding the PRIVATE
_context attribute after span creation:

span = tracer.start_span("POST /message", context=None)

if hasattr(span, "_context"):
    original = span.get_span_context()
    span._context = SpanContext(
        trace_id    = caller_trace_id,     # from traceparent
        span_id     = original.span_id,    # unchanged
        is_remote   = False,               # we are origin
        trace_flags = caller_trace_flags,
    )

This works in production today BUT:
❌ Uses a private undocumented attribute
❌ No public equivalent exists in OTel spec
❌ Fragile across SDK version upgrades
❌ Cannot be recommended to other teams
❌ Not portable across OTel implementations

This is not an edge case. Any service that:

  • Acts as logical root of its own trace
  • Runs on a platform that injects trace headers
    (GCP, AWS, Azure) for platform-level log correlation
  • Uses a separate observability backend (Arize, Honeycomb,
    Datadog) that requires root spans for correct grouping

...will hit this exact same problem.

Would you like to implement a fix?

None

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions