fix: correct inverted condition in DocLevelMonitorQueries causing query index churn#2154
Open
thecodingshrimp wants to merge 1 commit into
Conversation
…ry index churn (opensearch-project#2153) Signed-off-by: thecodingshrimp <leonard.stutzer@sap.com>
55734fa to
289f661
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2153
The condition in
DocLevelMonitorQueries.updateQueryIndexMappingsthat decides when to re-fetch the write index from the query index alias was inverted (!=instead of==). This caused the re-fetch path to fire on every monitor execution rather than only in the backwards-compatibility case it was designed to handle.Root Cause
monitorMetadata.sourceToQueryIndexMappingstores the concrete backing index name (e.g..opensearch-sap-pre-packaged-rules-queries-000001) after the first successful lookup. The condition was intended to re-resolve the alias only when the stored value is the alias name itself — a legacy situation where the metadata was written before this logic existed.With
!=, the condition evaluates totrueon every subsequent run because the stored concrete index name is always different from the alias name. This triggersgetWriteIndexNameForAliasand a metadata rewrite on every run, which cascades into a delete+recreate of the backing query index, generating 6–10MergeSchedulerConfiglog lines per node per cycle.With
==, the condition correctly fires only when the alias name itself is stored, which is the backwards-compat path it was designed for.Fix
The comment block was also updated to document the correct invariant.
Impact
At scale (3 master nodes, many detectors), the broken condition produced 23,000+
MergeSchedulerConfiglog lines per minute. This fix eliminates that churn.Related
This bug is amplified by a separate issue in
opensearch-project/security-analyticswherechained_findingsmonitors are created withdeleteQueryIndexInEveryRun=true, activating the broken code path on every detector run. Both fixes are needed to fully eliminate the log storm. The security-analytics fix prevents the flag from being set unnecessarily; this fix corrects the inverted logic that acts on it.