Skip to content

impl(o11y): record telemetry attributes on LRO span#5695

Open
haphungw wants to merge 9 commits into
googleapis:mainfrom
haphungw:stacked-pr-3-rust-declare-and-record-telemetry
Open

impl(o11y): record telemetry attributes on LRO span#5695
haphungw wants to merge 9 commits into
googleapis:mainfrom
haphungw:stacked-pr-3-rust-declare-and-record-telemetry

Conversation

@haphungw
Copy link
Copy Markdown
Contributor

In our previous iteration (#5694), we introduced an operation_name(&self) query method to the public Poller trait.

To avoid polluting the public API, we utilize thread-local span propagation: because the LRO span is active in the thread context, the internal pollers (PollerImpl and DiscoveryPoller) record the LRO operation name and destination resource ID directly onto the current active span using tracing::Span::current().record().

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances tracing for Long Running Operations (LROs) by recording operation names and resource IDs in the aip151 and discovery modules, and updating the 'LRO Wait' span with additional metadata. Feedback was provided regarding the removal of instrumentation from the into_stream method in src/lro/src/internal/tracing.rs, which would lead to fragmented traces and a loss of parent span context for operations executed within the stream.

Comment thread src/lro/src/internal/tracing.rs
@haphungw haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch 2 times, most recently from 1471b6c to 5978295 Compare May 19, 2026 23:19
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.89%. Comparing base (c24629c) to head (da20176).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5695      +/-   ##
==========================================
- Coverage   97.89%   97.89%   -0.01%     
==========================================
  Files         226      226              
  Lines       55471    55485      +14     
==========================================
+ Hits        54304    54315      +11     
- Misses       1167     1170       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@haphungw haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 5978295 to 7355448 Compare May 19, 2026 23:42
@haphungw haphungw marked this pull request as ready for review May 20, 2026 00:13
@haphungw haphungw requested a review from a team as a code owner May 20, 2026 00:13
query,
);
let p0 = poller.poll().await;
let p0 = poller.poll().instrument(test_span()).await;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we verify the span contents?

Aside: consider writing new tests just for tracing. Often it is quickest to use existing tests to test a new feature... but when you do that enough times the tests get complicated and lose their focus.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. I added unit tests for both aip151 and discovery to verify the 2 specific attributes we populate for LROs.

@haphungw haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 7355448 to a983c67 Compare May 20, 2026 00:37
@haphungw haphungw requested review from coryan and westarle May 20, 2026 13:45
Comment thread src/lro/src/internal/aip151.rs Outdated
Comment on lines +274 to +275
span.record("gcp.longrunning.operation_name", name);
span.record("gcp.resource.destination.id", name);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these ever different?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the final design we decided that they're always the same. we expected that downstream tools (e.g., AppHub) will use Operation ID as the default.

async fn poll(&mut self) -> Option<PollingResult<ResponseType, MetadataType>> {
if let Some(start) = self.start.take() {
let result = start().await;
let (op, poll) = crate::details::handle_start(result);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the operation is marked done inside the start()? If so we might want to extract the name from the result above handle_start?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I fixed the loops to extract the name directly from result (if Ok) before calling handle_start. Now we can make sure we don't miss these attributes.

I added a test for this case.

Comment thread src/lro/src/internal/aip151.rs Outdated
let (op, poll) = crate::details::handle_start(result);
#[cfg(google_cloud_unstable_tracing)]
if let Some(ref name) = op {
let span = tracing::Span::current();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tracing is disabled, this might interfere with the customers' spans. Can we attach the span directly to the PollerImpl?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right. I considered a few approaches for context propagation:

We also used task-local back then for gaxi RequestRecorder, so I guess it's okay to adopt this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding attaching the span directly to PollerImpl: I wanted to avoid adding a tracing field to PollerImpl because that forces our polling struct to actively manage and carry o11y variables even when tracing is disabled.

@haphungw haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 3b62b4a to 5b5d0da Compare May 20, 2026 20:09
@haphungw haphungw closed this May 20, 2026
@haphungw haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 5b5d0da to 27f2c17 Compare May 20, 2026 20:17
@haphungw haphungw reopened this May 20, 2026
@haphungw haphungw requested a review from westarle May 20, 2026 20:56
@haphungw haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 58f49ef to e8ac197 Compare May 21, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants