Skip to content

Flakiness: fix duplicate Pulsar multi-topic receive spans#18585

Closed
trask wants to merge 1 commit into
open-telemetry:mainfrom
trask:fix-pulsar-multitopic-receive-flake
Closed

Flakiness: fix duplicate Pulsar multi-topic receive spans#18585
trask wants to merge 1 commit into
open-telemetry:mainfrom
trask:fix-pulsar-multitopic-receive-flake

Conversation

@trask
Copy link
Copy Markdown
Member

@trask trask commented May 5, 2026

I've been seeing this issue causing tests to fail completely (even on retry) from time to time, e.g. https://scans.gradle.com/s/rg3ci7zt2glcm

Problem

PulsarClientTest.testConsumeMultiTopics() has been flaky in Develocity. In the last seven days, Develocity reported this method as the top flaky method in PulsarClientTest, with failures split between:

  • extra receive traces, for example Expected size: 4 but was: 5/6
  • missing message id assertions, for example [STRING attribute 'messaging.message.id'] expected: \"32:0:-1\" but was: null

The multi-topic Pulsar client path is a little unusual: MultiTopicsConsumerImpl is a wrapper consumer. It starts real per-topic ConsumerImpl sub-consumers, receives messages from those sub-consumers with receiveAsync(), wraps the received message in TopicMessageImpl, and then dispatches the wrapped message to the listener.

Before this change, the javaagent receive instrumentation matched both ConsumerImpl and MultiTopicsConsumerImpl. That means one logical multi-topic message could be observed at both levels: once when the sub-consumer actually receives the message, and again when the wrapper drains/dispatches the already-received message. This explains the extra receive spans seen in the flaky failures. Those extra receive traces can also make the sorted trace assertions line up against the wrong receive trace, which surfaces as the missing messaging.message.id assertion.

Fix

Only apply ConsumerImplInstrumentation to org.apache.pulsar.client.impl.ConsumerImpl.

For multi-topic consumers, this keeps instrumentation at the real receive point: the underlying per-topic ConsumerImpl sub-consumer. The receive context is still injected into the Pulsar message and recovered by the listener wrapper, so the expected receive/process relationship is preserved without creating a duplicate wrapper-level receive span.

@trask trask marked this pull request as ready for review May 5, 2026 05:02
@trask trask requested a review from a team as a code owner May 5, 2026 05:02
This was referenced May 5, 2026
@trask trask changed the title Fix duplicate Pulsar multi-topic receive spans Flakiness: fix duplicate Pulsar multi-topic receive spans May 9, 2026
@trask trask requested a review from Copilot May 9, 2026 20:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes flakiness in Pulsar multi-topic consumer tests by preventing the javaagent from creating duplicate “receive” spans when MultiTopicsConsumerImpl wraps per-topic ConsumerImpl sub-consumers.

Changes:

  • Restrict ConsumerImplInstrumentation to match only org.apache.pulsar.client.impl.ConsumerImpl (no longer MultiTopicsConsumerImpl).
  • Remove the now-unused namedOneOf matcher import.

@laurit
Copy link
Copy Markdown
Contributor

laurit commented May 14, 2026

From what I understand the receive spans are the ones where messaging.message.id is sometimes missing. Creating these spans is always triggered from the CompletableFuture returned from org.apache.pulsar.client.impl.ConsumerImpl.internalReceiveAsync so it shouldn't really be affected by this PR. Did I misunderstand something?

@trask
Copy link
Copy Markdown
Member Author

trask commented May 17, 2026

I was thinking it's related to extra spans sometimes being captured though that might be a different issue unrelated to the flakiness. I opened a new PR to focus just on the extra span issue outside of the flakiness question (#18771).

@trask trask closed this May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants