Skip to content

eventservice: optimize scanwindow#4950

Open
asddongmen wants to merge 8 commits into
pingcap:masterfrom
asddongmen:0427-scanwindow-v2
Open

eventservice: optimize scanwindow#4950
asddongmen wants to merge 8 commits into
pingcap:masterfrom
asddongmen:0427-scanwindow-v2

Conversation

@asddongmen
Copy link
Copy Markdown
Collaborator

@asddongmen asddongmen commented Apr 29, 2026

What problem does this PR solve?

Issue Number: close #xxx

The old scan window controller overreacted to memory feedback. A release pulse could reset the window, then later reports would shrink it again. Because it is a per-changefeed commit-ts span cap rather than a timer, this created sawtooth batching. Stale dispatchers could also pin the base ts and delay tables blocked by pending DDL.

What is changed and how it works?

This PR replaces the old policy with an adaptive controller while keeping scans event-driven. It combines a sliding usage window, EMAs, a pressure score, and cooldowns. High or critical pressure reduces the window in bounded steps. Low pressure recovers gradually and can leave the default floor faster. Release signals now relieve pressure instead of resetting the interval. The broker also skips stale dispatchers in minSentTs refresh and allows local advance for pending DDL when the global window is pinned.

This smooths scan progress, reduces reset storms, and preserves DDL forward progress.

Before
image

After
image

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Questions

Will it cause performance regression or break compatibility?

No compatibility break is expected. This changes only scan window control behavior and should stabilize throughput and memory pressure.

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

None

Summary by CodeRabbit

  • New Features

    • Added detailed scan-window monitoring metrics for usage, EMAs, pressure, bands, adjustments and memory-release events.
  • Improvements

    • Replaced trend-based scan-interval logic with an adaptive, pressure-aware controller for safer, tiered interval adjustments and recovery.
  • Bug Fixes

    • Scan intervals no longer reset on memory-release events.
  • Tests

    • Expanded test coverage for adaptive behavior, braking/recovery scenarios, metrics, and interval continuity.

Review Change Stack

Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 29, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hongyunyan for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Replaces trend-based scan-interval tuning with an adaptive scan-window controller: reports normalized usage, maintains fast/slow EMAs and a pressure score, applies tiered reductions and critical/emergency braking with cooldowns, gates conservative recovery, and exposes per-changefeed Prometheus metrics; broker metric lifecycle and tests updated.

Changes

Adaptive Scan Window Controller

Layer / File(s) Summary
Data Model & Configuration
pkg/eventservice/dispatcher_stat.go, pkg/eventservice/scan_window.go
changefeedStatus replaces trend fields with scanWindowController, syncPointInterval, and band-state fields; old trend constant removed; new controller tuning constants added (thresholds, EMA params, cooldowns, band bounds).
Controller Abstraction
pkg/eventservice/scan_window.go
Introduces scanWindowController interface, scanWindowReport, scanWindowDecision, scanWindowDecisionReason, and scanWindowBandState; newAdaptiveScanWindowController constructor added.
Memory Usage Entry Point
pkg/eventservice/scan_window.go
updateMemoryUsage normalizes usage, reports to controller via OnCongestionReport, records controller metrics, and persists new scan interval only when decision differs.
Metrics Init / Delete
pkg/eventservice/scan_window.go, pkg/eventservice/event_broker.go
Adds initializeScanWindowMetrics and deleteScanWindowMetrics; getOrSetChangefeedStatus and removeChangefeedStatus call these centralized helpers.
Prometheus Metrics
pkg/metrics/event_service.go
Adds exported metric vectors: EventServiceScanWindowUsageRatioGaugeVec, EventServiceScanWindowUsageEMAGaugeVec, EventServiceScanWindowTargetBandGaugeVec, EventServiceScanWindowTargetBandCrossCount, EventServiceScanWindowPressureScoreGaugeVec, EventServiceScanWindowMemoryReleaseCount, EventServiceScanWindowAdjustCount; registers them in initEventServiceMetrics.
Metrics Observation & Target-Band Tracking
pkg/eventservice/scan_window.go
Observes usage/EMA/pressure gauges, classifies report/fast/slow values into bands, and increments cross-counts on band transitions.
Core Controller Logic
pkg/eventservice/scan_window.go
adaptiveScanWindowController.OnCongestionReport handles sample ingestion, EMA/pressure maintenance, critical/high/sustained reduction logic, emergency braking, floor-recovery eligibility, and guarded upward adjustments.
Braking & Scaling Helpers
pkg/eventservice/scan_window.go
Interval-scaling helpers for very-low recovery, emergency brake selection, and low-recovery scaling implemented.
Critical Brake & Recovery
pkg/eventservice/scan_window.go
Critical braking includes cooldown deduplication and emergency-min-interval unlock logic; floor-recovery predicates gate recovery before normal increase cooldown.
EMA & Pressure Updates
pkg/eventservice/scan_window.go
Fast/slow EMA updates, pressure-score accumulation/decay, and pressure relief tied to memory-release pulses; test reset hooks included.
Remove Old Trend Logic
pkg/eventservice/scan_window.go
Removes previous adjustScanInterval and associated trend detection/increase-eligibility constants and reset behavior.
Test Imports & Helpers
pkg/eventservice/scan_window_test.go
Expands imports to include metrics test utilities and adds markScanWindowReadyForIncrease / markScanWindowReadyForDecrease helpers.
Existing Test Updates
pkg/eventservice/scan_window_test.go, pkg/eventservice/event_broker_test.go
Updates tests to use new helpers; updates congestion-control v2 memory-release test to assert scan interval preservation.
New Tests & Metrics Assertions
pkg/eventservice/scan_window_test.go
Adds many tests covering low/very-low recovery slowdown, high/critical/emergency pressure behaviors, recovery-from-floor progression, sustained-pressure reduction, and asserts Prometheus metric updates (usage ratio, EMAs, pressure score, adjustment counts, target-band cross-counts).

🎯 4 (Complex) | ⏱️ ~45 minutes

Sequence Diagram(s)

sequenceDiagram
  participant UpdateMemoryUsage
  participant AdaptiveScanWindowController
  participant EventServiceMetrics
  participant ChangefeedStatus
  UpdateMemoryUsage->>AdaptiveScanWindowController: OnCongestionReport(normalizedUsage, releaseCount)
  AdaptiveScanWindowController-->>AdaptiveScanWindowController: update EMAs, compute pressure score, decide interval
  AdaptiveScanWindowController->>EventServiceMetrics: emit usage/EMA/pressure/adjust metrics
  AdaptiveScanWindowController->>ChangefeedStatus: decision(scanInterval)
  ChangefeedStatus->>ChangefeedStatus: persist new scanInterval if changed
Loading

Possibly related PRs

  • pingcap/ticdc#4836: Modifies changefeedStatus in pkg/eventservice/dispatcher_stat.go with additional fields and initialization changes.

Suggested labels

lgtm, approved, size/XL

Suggested reviewers

  • flowbehappy
  • wk989898

Poem

🐰 A window wakes, no trends to test,
EMA whispers, pressures rest,
Brakes apply, then cautiously mend,
Metrics count each careful bend,
Rabbit hops — the feed holds steady, friend.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is missing the required 'Issue Number' line with the issue reference as specified in the template. Replace 'Issue Number: close #xxx' with an actual issue number (e.g., 'Issue Number: close #4949') to complete the required template section.
Docstring Coverage ⚠️ Warning Docstring coverage is 2.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: optimizing the scan window mechanism in the event service.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 29, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the existing scan interval adjustment logic with a new adaptiveScanWindowController that utilizes Exponential Moving Averages (EMAs) and a pressure score for more stable memory pressure management. The update includes comprehensive simulation tests and enhanced Prometheus metrics for monitoring controller decisions. Review feedback highlights a non-monotonic discontinuity in the emergency brake calculation, potential over-throttling caused by latching peak usage values, the presence of magic numbers, and the use of a redundant maxFloat64 helper that should be replaced by the built-in max function.

Comment thread pkg/eventservice/scan_window.go Outdated
Comment on lines +562 to +567
func scanWindowEmergencyBrakeInterval(current time.Duration) time.Duration {
if current <= 6*defaultScanInterval {
return max(current/2, defaultScanInterval)
}
return max(current/4, minScanInterval)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic in scanWindowEmergencyBrakeInterval has two significant issues:

  1. Discontinuity: There is a sharp jump at the boundary of 6*defaultScanInterval (30s). For an input of 30s, it returns 15s (current/2), but for 30.1s, it returns ~7.5s (current/4). This non-monotonic behavior can cause unstable oscillations in the scan interval.
  2. Unreachable Minimum: The function floors at defaultScanInterval (5s) for any current <= 30s. This makes the minScanInterval (1s) constant effectively unreachable during emergency pressure (98%+ usage) if the interval has already been reduced to a moderate level. If the goal is to allow the interval to drop to 1s under extreme congestion, the floor should be minScanInterval in both branches.

Comment thread pkg/eventservice/scan_window.go
Comment thread pkg/eventservice/scan_window.go Outdated
Comment thread pkg/eventservice/scan_window.go Outdated
Signed-off-by: dongmen <414110582@qq.com>
@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 11, 2026
@asddongmen asddongmen marked this pull request as ready for review May 11, 2026 03:22
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 11, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
pkg/eventservice/scan_window.go (4)

425-549: 💤 Low value

Consider factoring the repeated scanWindowDecision{...} literal.

OnCongestionReport returns the same scanWindowDecision shape from six branches, each repeating usage, fastUsageEMA, slowUsageEMA, pressureScore. A small builder method on the controller would let the function body focus on policy and reduce the surface for accidental field drift if a new metric/field is added later.

♻️ Sketch
func (c *adaptiveScanWindowController) makeDecisionLocked(
    newInterval, maxInterval time.Duration,
    reason scanWindowDecisionReason,
    usage memoryUsageStats,
) scanWindowDecision {
    return scanWindowDecision{
        newInterval:   newInterval,
        maxInterval:   maxInterval,
        reason:        reason,
        usage:         usage,
        fastUsageEMA:  c.fastUsageEMA,
        slowUsageEMA:  c.slowUsageEMA,
        pressureScore: c.pressureScore,
    }
}

Each branch then becomes return c.makeDecisionLocked(newInterval, maxInterval, reason, usage).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/eventservice/scan_window.go` around lines 425 - 549, The
OnCongestionReport function repeats identical scanWindowDecision literals across
multiple return sites; add a helper method on adaptiveScanWindowController
(e.g., makeDecisionLocked(newInterval time.Duration, maxInterval time.Duration,
reason scanWindowDecisionReason, usage memoryUsageStats) scanWindowDecision)
that constructs and returns the scanWindowDecision using c.fastUsageEMA,
c.slowUsageEMA, c.pressureScore and the supplied usage, newInterval,
maxInterval, reason; then replace each repeated literal in OnCongestionReport
with calls to c.makeDecisionLocked(...) (keep current names: OnCongestionReport,
scanWindowDecision, usage, fastUsageEMA, slowUsageEMA, pressureScore).

388-406: ⚡ Quick win

Race on band state can double-count target-band crossings.

observeScanWindowTargetBandMetrics reads state.Load(), compares to currentState, then state.Store(...). Because handleCongestionControl can be invoked concurrently from different from nodes for the same changefeed (each call iterates changefeedMap.Range), two goroutines can both observe the same previousState, both increment EventServiceScanWindowTargetBandCrossCount, and both store — over-counting transitions for the same actual crossing.

Use Swap to read-and-replace atomically so only one caller observes each prior state:

🔒 Proposed diff
-	previousState := scanWindowBandState(state.Load())
-	if previousState != scanWindowBandUnknown && previousState != currentState {
-		metrics.EventServiceScanWindowTargetBandCrossCount.WithLabelValues(changefeed, metricType).Inc()
-	}
-	state.Store(int32(currentState))
+	previousState := scanWindowBandState(state.Swap(int32(currentState)))
+	if previousState != scanWindowBandUnknown && previousState != currentState {
+		metrics.EventServiceScanWindowTargetBandCrossCount.WithLabelValues(changefeed, metricType).Inc()
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/eventservice/scan_window.go` around lines 388 - 406, The
observeScanWindowTargetBandMetrics function currently uses state.Load() and
state.Store(), which allows concurrent callers (e.g., handleCongestionControl)
to both see the same previous state and double-count transitions; replace the
Load/Store pair with an atomic swap so the read-and-replace is atomic: call
state.Swap(int32(currentState)) (convert the returned int32 to
scanWindowBandState) to obtain the previousState, then, if previousState !=
scanWindowBandUnknown && previousState != currentState, increment
EventServiceScanWindowTargetBandCrossCount; keep the existing gauge
Set(1)/Set(0) behavior and only change how previousState is read/stored.

321-362: 💤 Low value

Minor: deleteScanWindowMetrics also clears EventServiceAvailableMemoryQuotaGaugeVec.

That metric is not part of the scan-window family (registered separately, set in handleCongestionControl), but its lifecycle is naturally tied to the changefeed. Functionally fine — just be aware the helper name slightly under-promises what it deletes. If you keep this coupling, consider a short comment explaining why the available-memory-quota label is wiped here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/eventservice/scan_window.go` around lines 321 - 362,
deleteScanWindowMetrics currently deletes
EventServiceAvailableMemoryQuotaGaugeVec even though that metric is not part of
the scan-window family; either move that deletion to the metric's owner (e.g.
where handleCongestionControl manages the metric) or keep it here but add a
short explanatory comment. Update the function deleteScanWindowMetrics to either
remove the line
metrics.EventServiceAvailableMemoryQuotaGaugeVec.DeleteLabelValues(changefeed)
and place it in the lifecycle code that sets/clears available memory quota, or
add a one-line comment above that DeleteLabelValues call explaining why the
available-memory-quota label is cleared here (tie to changefeed lifecycle),
referencing deleteScanWindowMetrics and handleCongestionControl so future
readers can find the rationale.

796-801: ⚡ Quick win

Drop maxFloat64 in favor of Go's built-in max.

This file already uses the Go 1.21+ built-in min/max for float64 elsewhere (e.g., min(c.pressureScore+2, scanWindowPressureScoreCeiling) at line 673), so maxFloat64 is inconsistent and unnecessary. Replacing all three call sites with the builtin removes one ad-hoc helper and matches the surrounding style.

♻️ Proposed diff
-func maxFloat64(a float64, b float64) float64 {
-	if a > b {
-		return a
-	}
-	return b
-}

Then at the call sites (lines 679, 681, 687):

-	c.pressureScore = maxFloat64(0, c.pressureScore-1.5)
+	c.pressureScore = max(0, c.pressureScore-1.5)
...
-	c.pressureScore = maxFloat64(0, c.pressureScore-0.5)
+	c.pressureScore = max(0, c.pressureScore-0.5)
...
-	c.pressureScore = maxFloat64(0, c.pressureScore-relief)
+	c.pressureScore = max(0, c.pressureScore-relief)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/eventservice/scan_window.go` around lines 796 - 801, Remove the ad-hoc
helper maxFloat64 and replace its call sites with the Go 1.21 built-in max: find
all uses of maxFloat64(a, b) and change them to max(a, b), then delete the
maxFloat64 function definition; no extra imports are needed—just remove the
function maxFloat64 and update callers to use max directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/eventservice/scan_window.go`:
- Around line 425-549: The OnCongestionReport function repeats identical
scanWindowDecision literals across multiple return sites; add a helper method on
adaptiveScanWindowController (e.g., makeDecisionLocked(newInterval
time.Duration, maxInterval time.Duration, reason scanWindowDecisionReason, usage
memoryUsageStats) scanWindowDecision) that constructs and returns the
scanWindowDecision using c.fastUsageEMA, c.slowUsageEMA, c.pressureScore and the
supplied usage, newInterval, maxInterval, reason; then replace each repeated
literal in OnCongestionReport with calls to c.makeDecisionLocked(...) (keep
current names: OnCongestionReport, scanWindowDecision, usage, fastUsageEMA,
slowUsageEMA, pressureScore).
- Around line 388-406: The observeScanWindowTargetBandMetrics function currently
uses state.Load() and state.Store(), which allows concurrent callers (e.g.,
handleCongestionControl) to both see the same previous state and double-count
transitions; replace the Load/Store pair with an atomic swap so the
read-and-replace is atomic: call state.Swap(int32(currentState)) (convert the
returned int32 to scanWindowBandState) to obtain the previousState, then, if
previousState != scanWindowBandUnknown && previousState != currentState,
increment EventServiceScanWindowTargetBandCrossCount; keep the existing gauge
Set(1)/Set(0) behavior and only change how previousState is read/stored.
- Around line 321-362: deleteScanWindowMetrics currently deletes
EventServiceAvailableMemoryQuotaGaugeVec even though that metric is not part of
the scan-window family; either move that deletion to the metric's owner (e.g.
where handleCongestionControl manages the metric) or keep it here but add a
short explanatory comment. Update the function deleteScanWindowMetrics to either
remove the line
metrics.EventServiceAvailableMemoryQuotaGaugeVec.DeleteLabelValues(changefeed)
and place it in the lifecycle code that sets/clears available memory quota, or
add a one-line comment above that DeleteLabelValues call explaining why the
available-memory-quota label is cleared here (tie to changefeed lifecycle),
referencing deleteScanWindowMetrics and handleCongestionControl so future
readers can find the rationale.
- Around line 796-801: Remove the ad-hoc helper maxFloat64 and replace its call
sites with the Go 1.21 built-in max: find all uses of maxFloat64(a, b) and
change them to max(a, b), then delete the maxFloat64 function definition; no
extra imports are needed—just remove the function maxFloat64 and update callers
to use max directly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4190f76c-a953-4e4d-887a-59bfaeef4a9c

📥 Commits

Reviewing files that changed from the base of the PR and between dd501c8 and 274d253.

📒 Files selected for processing (6)
  • pkg/eventservice/dispatcher_stat.go
  • pkg/eventservice/event_broker.go
  • pkg/eventservice/event_broker_test.go
  • pkg/eventservice/scan_window.go
  • pkg/eventservice/scan_window_test.go
  • pkg/metrics/event_service.go

Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
@asddongmen
Copy link
Copy Markdown
Collaborator Author

/test all

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 11, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/eventservice/scan_window_test.go (1)

179-181: ⚡ Quick win

Replace the hardcoded 30 with a duration-derived bound

Line 179 bakes in a window-size assumption. Deriving the loop count from memoryUsageWindowDuration makes this test resilient to future constant tuning.

Suggested change
- for i := 0; i <= 30; i++ {
+ for i := 0; i <= int(memoryUsageWindowDuration/time.Second); i++ {
    status.updateMemoryUsage(start.Add(time.Duration(i)*time.Second), 1, 0)
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/eventservice/scan_window_test.go` around lines 179 - 181, The test
hardcodes 30 iterations when calling status.updateMemoryUsage, which couples it
to a specific window size; change the loop bound to derive from
memoryUsageWindowDuration (e.g., compute n := int(memoryUsageWindowDuration /
time.Second) and use i := 0; i <= n; i++) so the test scales with the actual
memoryUsageWindowDuration constant; update the loop surrounding
status.updateMemoryUsage to use that computed n instead of the literal 30.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/eventservice/scan_window_test.go`:
- Around line 74-75: Replace hardcoded changefeed IDs created by
newChangefeedStatus(common.NewChangefeedID4Test("default", "test"), ...) with a
unique per-test ID using t.Name() (e.g.,
newChangefeedStatus(common.NewChangefeedID4Test("default", t.Name()), ...)) for
all occurrences (including the instances around lines 89-90, 134-135, 157-158,
166-167, 175-176, 189-190) so updateMemoryUsage emits metrics keyed to a
test-unique label; also make the tests deterministic by using testify/require
assertions where applicable instead of non-deterministic checks.

---

Nitpick comments:
In `@pkg/eventservice/scan_window_test.go`:
- Around line 179-181: The test hardcodes 30 iterations when calling
status.updateMemoryUsage, which couples it to a specific window size; change the
loop bound to derive from memoryUsageWindowDuration (e.g., compute n :=
int(memoryUsageWindowDuration / time.Second) and use i := 0; i <= n; i++) so the
test scales with the actual memoryUsageWindowDuration constant; update the loop
surrounding status.updateMemoryUsage to use that computed n instead of the
literal 30.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de3ec83f-f0f9-438c-87dc-c06631609797

📥 Commits

Reviewing files that changed from the base of the PR and between 274d253 and d8e1a0f.

📒 Files selected for processing (2)
  • pkg/eventservice/scan_window.go
  • pkg/eventservice/scan_window_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/eventservice/scan_window.go

Comment on lines +74 to +75
status := newChangefeedStatus(common.NewChangefeedID4Test("default", "test"), 10*time.Minute)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use unique changefeed IDs in parallel tests to avoid shared metric state

These tests run with t.Parallel() and currently reuse "default","test". Since updateMemoryUsage emits global metrics keyed by changefeed label, this can create hidden cross-test coupling/flakiness. Prefer t.Name() (as done in metric tests) for isolation.

Suggested change
- status := newChangefeedStatus(common.NewChangefeedID4Test("default", "test"), 10*time.Minute)
+ status := newChangefeedStatus(common.NewChangefeedID4Test("default", t.Name()), 10*time.Minute)

As per coding guidelines, "Unit tests should use *_test.go file naming convention and favor deterministic tests using testify/require."

Also applies to: 89-90, 134-135, 157-158, 166-167, 175-176, 189-190

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/eventservice/scan_window_test.go` around lines 74 - 75, Replace hardcoded
changefeed IDs created by
newChangefeedStatus(common.NewChangefeedID4Test("default", "test"), ...) with a
unique per-test ID using t.Name() (e.g.,
newChangefeedStatus(common.NewChangefeedID4Test("default", t.Name()), ...)) for
all occurrences (including the instances around lines 89-90, 134-135, 157-158,
166-167, 175-176, 189-190) so updateMemoryUsage emits metrics keyed to a
test-unique label; also make the tests deterministic by using testify/require
assertions where applicable instead of non-deterministic checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant