Skip to content

CRE-4150: fix race in Test_DataSource/Observe/records_telemetry#22344

Open
aareet wants to merge 1 commit intodevelopfrom
fix/flaky-CRE-4150-2026-05-07
Open

CRE-4150: fix race in Test_DataSource/Observe/records_telemetry#22344
aareet wants to merge 1 commit intodevelopfrom
fix/flaky-CRE-4150-2026-05-07

Conversation

@aareet
Copy link
Copy Markdown
Contributor

@aareet aareet commented May 7, 2026

Summary

Fixes flake in Test_DataSource/Observe/records_telemetry (Trunk: panic: runtime error: slice bounds out of range [:3] with capacity 0 at data_source_test.go:345).

The test captures ch := tm.ch after ds.Observe() returns. Under fast loopWakeCh pacing (microseconds), the production observation loop's next iteration calls MakeObservationScopedTelemetryCh again and overwrites tm.ch with a fresh empty channel before the test reads it. The test then ranges over an empty/closed channel, gets telems of length 0, and panics on telems[:3].

Fix

Track every channel the mock creates in a new channels []chan any slice (guarded by the existing tm.mu). Read channels[0] instead of the racy tm.ch — the first channel is the one populated by the asserted first iteration (stream IDs 21/22/23), and is provably non-empty by the time Observe returns because loopStartedCh only closes after MakeObservationScopedTelemetryCh has been called at least once.

Test code only — no production-code changes.

Flaky test fixes

Issue Test Trunk
CRE-4150 `Test_DataSource/Observe/records_telemetry` Trunk test case

Test plan

  • `go test -race -shuffle=on -run '^Test_DataSource$' ./core/services/llo/observation/...` — 35 consecutive PASS locally after the change (one early flake observed before stabilizing; 35x clean since)
  • Local `golangci-lint run ./core/services/llo/observation/...` — no new violations from this diff (pre-existing issues in unrelated lines untouched)
  • CI passes, including any quarantined-test runs

Flakiness-classification: TEST (user override from AMBIGUOUS — Tier 1 string-match signals did not match the goroutine-shared-state race pattern, but stack trace and code analysis pinpoint a clear test-side race)

🤖 Generated with Claude Code

The test captures `ch := tm.ch` after `ds.Observe()` returns, but the
production observation loop's next iteration calls
MakeObservationScopedTelemetryCh again and overwrites tm.ch with a fresh
empty channel before the test reads it. With the fast loopWakeCh pacing,
the next iteration can begin in microseconds, so the test can end up
ranging over an empty/closed channel and panicking on `telems[:3]` with
"slice bounds out of range [:3] with capacity 0".

Track every channel the mock creates in a `channels []chan any` slice
(under the existing mu) and have the test read `channels[0]` instead of
the latest tm.ch. The first channel is the one populated by the asserted
first iteration (stream IDs 21/22/23), and it is provably non-empty by
the time Observe returns since loopStartedCh only closes after the first
MakeObservationScopedTelemetryCh call.

Verified: 35 consecutive passes with -race -shuffle=on after the change.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

✅ No conflicts with other open PRs targeting develop

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 7, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@cl-sonarqube-production
Copy link
Copy Markdown

@aareet aareet marked this pull request as ready for review May 7, 2026 19:27
@aareet aareet requested review from a team as code owners May 7, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant