Skip to content

fix(alerts): compare FQN filters literally in matchAnyEntityFqn#27987

Open
manerow wants to merge 1 commit intomainfrom
fix/match-any-entity-fqn-literal
Open

fix(alerts): compare FQN filters literally in matchAnyEntityFqn#27987
manerow wants to merge 1 commit intomainfrom
fix/match-any-entity-fqn-literal

Conversation

@manerow
Copy link
Copy Markdown
Contributor

@manerow manerow commented May 8, 2026

Fixes open-metadata/openmetadata-collate#4019

Summary

AlertsRuleEvaluator.matchAnyEntityFqn compiles each user-supplied FQN as a Java regex and matches with Matcher.find(). Entity FQNs that contain regex metacharacters ([, ], ., (, ), +, ?, *, \, |, ^, $, {, }) silently never match their own filter — e.g. a name like [TML] Fraud Mart Suite is interpreted as the regex [TML] (a character class meaning one of T, M, L) followed by Fraud Mart Suite, so the literal FQN starting with [ never matches. Affected alerts drop events with no log line.

Change

matchAnyEntityFqn and its testSuiteMatcher helper now compare FQNs literally:

- Pattern pattern = Pattern.compile(name);
- Matcher matcher = pattern.matcher(entity.getFullyQualifiedName());
- if (matcher.find()) { return true; }
+ if (entityFqns.contains(entity.getFullyQualifiedName())) { return true; }

Why this is the right default (and why reverting #14946 is safe)

  • The original contract was literal. The function used String.equals until [Fix] Add Regex for Entity FQn matches #14946 swapped it to Pattern.compile + matcher.find(). That PR added regex matching with no test, no UI surface, and no doc update — the behavior drift was undocumented.
  • Industry standard. Prometheus AlertManager, AWS EventBridge, Datadog, and PagerDuty all default to literal match for identifier filters; regex is an explicit opt-in operator.
  • Consistent with the rest of the file. Every other identifier-matcher in AlertsRuleEvaluator already uses literal equality — matchUpdatedBy, matchAnyEntityId, getTestCaseStatusIfInTestSuite, matchAnyOwnerName, and the direct-domain branch of matchAnyDomain. The same bug class on filterByTableNameTestCaseBelongsTo was already corrected in Fixes #16357: Improve Alerts Filtering with Exact FQN Matching #17034 with Pattern.quote. matchAnyEntityFqn was the lone outlier.
  • Migration risk is bounded. The function has zero direct Java callers in src/main, no seed or default rule hardcodes a regex (every reference is matchAnyEntityFqn(${fqnList}), with fqnList filled from runtime user input), no test pinned regex semantics, and the UI alert builder populates the field from a typeahead picker that emits canonical FQNs. The only surface that could change behavior is hand-crafted PATCH-API rules whose author intentionally exploited the undocumented regex behavior.

Type of change

  • Bug fix

Tests

Parameterized regression test_matchAnyEntityFqn_treatsRegexMetacharsAsLiteral in AlertsRuleEvaluatorResourceIT covers FQNs with [], (), +, ?, *, |. A second test covers the testCase → parent test-suite fallback. Both would fail under the prior regex semantics; they pin the literal contract going forward.

@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 8, 2026
@manerow manerow added the To release Will cherry-pick this PR into the release branch label May 8, 2026
@manerow manerow self-assigned this May 8, 2026
@manerow manerow force-pushed the fix/match-any-entity-fqn-literal branch from a1677b6 to 86fc831 Compare May 8, 2026 11:42
@manerow
Copy link
Copy Markdown
Contributor Author

manerow commented May 8, 2026

Thanks for the suggestion @gitar-bot. I considered Set.copyOf(entityFqns) but I'd argue keeping List.contains is the right call here. A few reasons:

Allocation cost dominates at typical n. Set.copyOf builds a HashSet-equivalent immutable map per evaluation: hash each element, allocate buckets, etc. For the typical alert-rule list size (n ≈ 1–5, since the UI typeahead picks one or a handful of entities), List.contains's linear String.equals scan beats the Set's allocation + hashing on a single call. The crossover where Set wins tends to be n ≳ 10, which is well past real-world rule sizes.

Consistency with the rest of the file. Every other identifier-matcher in AlertsRuleEvaluator already uses List.contains or direct equals on a small list — matchUpdatedBy (L355), matchAnyEntityId (L184), getTestCaseStatusIfInTestSuite (L336), matchAnyOwnerName (L635, L642), and the direct-domain branch of matchAnyDomain (L491). Switching just matchAnyEntityFqn to a Set would make it the outlier and tempt someone to "round-trip" the rest, optimizing a path that isn't hot.

Hot path is actually publication, not evaluation. Each subscription evaluates each event once; the per-rule overhead is dwarfed by the I/O cost of the webhook delivery downstream. If a user ever stuffs hundreds of FQNs into a single rule, we can revisit, but at that point the right answer is probably to split the rule or use the entity ID list — not to micro-optimize the matcher.

So I'd prefer to keep List.contains and leave the optimization for the day someone actually reports a hot path here.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 8, 2026

Thanks for the suggestion @gitar-bot. I considered Set.copyOf(entityFqns) but I'd argue keeping List.contains is the right call here. A few reasons:

Allocation cost dominates at typical n. Set.copyOf builds a HashSet-equivalent immutable map per evaluation: hash each element, allocate buckets, etc. For the typical alert-rule list size (n ≈ 1–5, since the UI typeahead picks one or a handful of entities), List.contains's linear String.equals scan beats the Set's allocation + hashing on a single call. The crossover where Set wins tends to be n ≳ 10, which is well past real-world rule sizes.

Consistency with the rest of the file. Every other identifier-matcher in AlertsRuleEvaluator already uses List.contains or direct equals on a small list — matchUpdatedBy (L355), matchAnyEntityId (L184), getTestCaseStatusIfInTestSuite (L336), matchAnyOwnerName (L635, L642), and the direct-domain branch of matchAnyDomain (L491). Switching just matchAnyEntityFqn to a Set would make it the outlier and tempt someone to "round-trip" the rest, optimizing a path that isn't hot.

Hot path is actually publication, not evaluation. Each subscription evaluates each event once; the per-rule overhead is dwarfed by the I/O cost of the webhook delivery downstream. If a user ever stuffs hundreds of FQNs into a single rule, we can revisit, but at that point the right answer is probably to split the rule or use the entity ID list — not to micro-optimize the matcher.

So I'd prefer to keep List.contains and leave the optimization for the day someone actually reports a hot path here.

That's a well-reasoned case and I agree with the conclusion. The allocation-vs-scan crossover argument is sound for n ≈ 1–5, the consistency rationale with the rest of AlertsRuleEvaluator is compelling, and the observation that the hot path is publication I/O rather than evaluation logic puts this firmly in the "not worth micro-optimizing" bucket. Happy to defer to List.contains here and revisit only if someone surfaces a real performance regression with large FQN lists.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 8, 2026

Code Review ✅ Approved 1 resolved / 1 findings

Updates matchAnyEntityFqn to use literal string comparison instead of regex, resolving issues where FQNs containing special characters failed to match. Consider converting the entityFqns list to a Set to improve lookup performance from O(n) to O(1).

✅ 1 resolved
Performance: List.contains() is O(n); consider Set for FQN lookup

📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:146 📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:601 📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:606 📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:137 📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:148 📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:601-612 📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:603 📄 openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java:608
In matchAnyEntityFqn, entityFqns.contains(entity.getFullyQualifiedName()) performs a linear scan. The same list is also passed to testSuiteMatcher where it's called inside nested loops (testSuites × domains). For typical alert rules with a handful of FQNs this is fine, but converting to a Set<String> at the top of matchAnyEntityFqn would make lookups O(1) and future-proof against rules with many FQN entries.

This is minor because real-world alert filter lists are small (UI typeahead picks), but it's a trivial improvement.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔴 Playwright Results — 3 failure(s), 13 flaky

✅ 4012 passed · ❌ 3 failed · 🟡 13 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 299 0 0 4
🟡 Shard 2 752 0 3 8
🔴 Shard 3 752 3 4 7
🟡 Shard 4 789 0 1 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 733 0 5 8

Genuine Failures (failed on all attempts)

Features/Tasks/TaskNavigation.spec.ts › clicking task notification while on entity task tab refreshes the task list (shard 3)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoBeVisible�[2m(�[22m�[2m)�[22m failed

Locator: locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()
Expected: visible
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toBeVisible" with timeout 15000ms�[22m
�[2m  - waiting for locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()�[22m

Features/Tasks/TaskNavigation.spec.ts › two sessions: admin on Columns tab creates task, assignee sees refresh on notification click (shard 3)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoBeVisible�[2m(�[22m�[2m)�[22m failed

Locator: locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()
Expected: visible
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toBeVisible" with timeout 15000ms�[22m
�[2m  - waiting for locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()�[22m

Flow/ObservabilityAlerts.spec.ts › Alert operations for a user with and without permissions (shard 3)
Error: Wait for pending events to complete

�[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

Expected: �[32mtrue�[39m
Received: �[31mfalse�[39m

Call Log:
- Test timeout of 60000ms exceeded
🟡 13 flaky test(s) (passed on retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/DomainTierCertificationVoting.spec.ts › DataProduct - Tier assign, update, and remove (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
  • Features/UserProfileOnlineStatus.spec.ts › Should show "Active recently" for users active within last hour (shard 3, 1 retry)
  • Flow/IngestionBot.spec.ts › Ingestion bot should be able to access domain specific domain (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Delete Database (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Tier Add, Update and Remove (shard 6, 1 retry)
  • Pages/Tag.spec.ts › Verify Owner Add Delete (shard 6, 1 retry)
  • Pages/TasksUIFlow.spec.ts › Create and resolve description task for Topic via UI (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant