Flakiness: fix duplicate Pulsar multi-topic receive spans#18585
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes flakiness in Pulsar multi-topic consumer tests by preventing the javaagent from creating duplicate “receive” spans when MultiTopicsConsumerImpl wraps per-topic ConsumerImpl sub-consumers.
Changes:
- Restrict
ConsumerImplInstrumentationto match onlyorg.apache.pulsar.client.impl.ConsumerImpl(no longerMultiTopicsConsumerImpl). - Remove the now-unused
namedOneOfmatcher import.
|
From what I understand the |
|
I was thinking it's related to extra spans sometimes being captured though that might be a different issue unrelated to the flakiness. I opened a new PR to focus just on the extra span issue outside of the flakiness question (#18771). |
I've been seeing this issue causing tests to fail completely (even on retry) from time to time, e.g. https://scans.gradle.com/s/rg3ci7zt2glcm
Problem
PulsarClientTest.testConsumeMultiTopics()has been flaky in Develocity. In the last seven days, Develocity reported this method as the top flaky method inPulsarClientTest, with failures split between:Expected size: 4 but was: 5/6[STRING attribute 'messaging.message.id'] expected: \"32:0:-1\" but was: nullThe multi-topic Pulsar client path is a little unusual:
MultiTopicsConsumerImplis a wrapper consumer. It starts real per-topicConsumerImplsub-consumers, receives messages from those sub-consumers withreceiveAsync(), wraps the received message inTopicMessageImpl, and then dispatches the wrapped message to the listener.Before this change, the javaagent receive instrumentation matched both
ConsumerImplandMultiTopicsConsumerImpl. That means one logical multi-topic message could be observed at both levels: once when the sub-consumer actually receives the message, and again when the wrapper drains/dispatches the already-received message. This explains the extra receive spans seen in the flaky failures. Those extra receive traces can also make the sorted trace assertions line up against the wrong receive trace, which surfaces as the missingmessaging.message.idassertion.Fix
Only apply
ConsumerImplInstrumentationtoorg.apache.pulsar.client.impl.ConsumerImpl.For multi-topic consumers, this keeps instrumentation at the real receive point: the underlying per-topic
ConsumerImplsub-consumer. The receive context is still injected into the Pulsar message and recovered by the listener wrapper, so the expected receive/process relationship is preserved without creating a duplicate wrapper-level receive span.