Skip to content

[Bug]Fix checkAndReviseMetrics overwriting RocksDB metrics after timer engine switch #10165

@3424672656

Description

@3424672656

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

ubuntu

RocketMQ version

develop

JDK Version

1.8

Describe the Bug

Motivation

When switching from file-based timer engine to RocksDB timer engine via switchTimerEngine, the
checkAndReviseMetrics scheduled task in TimerMessageStore continues to execute without any engine
switch guard. This causes RocksDB-side timer metrics to be incorrectly overwritten.

Root Cause

  1. Shared TimerMetrics: Both TimerMessageStore (file-based) and TimerMessageRocksDBStore (RocksDB)
    share the same TimerMetrics object.

  2. No switch guard in scheduler: The checkAndReviseMetrics scheduled task registered in
    TimerMessageStore.start() has no check for timerStopEnqueue or timerRocksDBEnable. After
    switchTimerEngine(ROCKSDB_TIMELINE) sets timerStopEnqueue=true, the scheduler still fires.

  3. Overwrite via putAll: checkAndReviseMetrics() only traverses timerLog (file-based data) to
    rebuild metric counts for "small" topics, then calls timerMetrics.getTimingCount().putAll(newSmallOnes).
    Since RocksDB-side data is not in timerLog, any topic with metrics from RocksDB gets overwritten to 0
    (or loses the RocksDB portion for shared topics).

Timeline

Steps to Reproduce

Fix

Add a storeConfig.isTimerStopEnqueue() guard in the checkAndReviseMetrics scheduled task. When the
file-based engine has stopped enqueuing (indicating a switch to RocksDB), skip checkAndReviseMetrics
to prevent overwriting RocksDB-side metrics.

Why timerStopEnqueue?

  • switchTimerEngine always sets timerStopEnqueue=true when switching to RocksDB
  • When switching back to file-based, it sets timerStopEnqueue=false, so checkAndReviseMetrics resumes
  • The semantics are precise: "file-based engine has stopped, should not revise file-based metrics"
  • Minimal change, no new config flags needed

Changes

store/src/main/java/org/apache/rocketmq/store/timer/TimerMessageStore.java

Added timerStopEnqueue check in the scheduler task before calling checkAndReviseMetrics():

What Did You Expect to See?

After switching the engine, the indicators returned to normal.

What Did You See Instead?

null

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions