Skip to content

Feature/add alert filter by database#427

Open
HannahVernon wants to merge 36 commits intoerikdarlingdata:devfrom
HannahVernon:feature/add-alert-filter-by-database
Open

Feature/add alert filter by database#427
HannahVernon wants to merge 36 commits intoerikdarlingdata:devfrom
HannahVernon:feature/add-alert-filter-by-database

Conversation

@HannahVernon
Copy link
Contributor

What does this PR do?

Adds the ability to filter certain alert features by database name:

  • Deadlocks
  • Long Running Queries

Which component(s) does this affect?

  • [✅] Full Dashboard
  • [✅] Lite Dashboard

How was this tested?

Tested deadlock alerts with a classic "lock resources in opposite order" deadlock test in a dedicated database. With the database name included in the exclusion list, the alert is not fired when a deadlock occurs. With the database not present in the exclusion list, the deadlock alert is fired as expected.

Tested long running queries with the execution-database both in the list and not in the list - exclusion works as expected.

Checklist

  • [✅] I have read the contributing guide
  • [✅] My code builds with zero warnings (dotnet build -c Debug)
  • [✅] I have tested my changes against at least one SQL Server version
  • [✅] I have not introduced any hardcoded credentials or server names

HannahVernon and others added 25 commits February 27, 2026 14:26
…_DIAGNOSTIC wait types, and WAITFOR wait types.
…to allow future configurability on the number of long-running queries returned. Added optional parameter to control display of WAITFOR types in future.
…clusions

Feature/long running query exclusions
…c() method. Removed System.Collections.Generic using statement as it is unnecessary.
…KUPTHREAD-to-long-running-queries

Added exclusion for backup-related waits to GetLongRunningQueriesAsync() method.
…dded miscWaitsFilter to exclude XE_LIVE_TARGET_TVF waits. Removed unused parameters. Corrected minimum value for maxLongRunningQueryCount (minimum 1 instead of 5).
…KUPTHREAD-to-long-running-queries

Feature/add exclusion for backupthread to long running queries
…ITFOR, BACKUPTHREAD, BACKUPIO, and XE_LIVE_TARGET_TVF wait types in Lite/Services/LocalDataService.WaitStats.cs
…ries-monitor-changes-to-lite

Added filtering for various wait types to long running query code in Lite.
Adds LongRunningQueryMaxResults to UserPreferences (default 5) and
exposes it in the Settings UI alongside the existing duration threshold.
Threads the value through GetAlertHealthAsync and GetLongRunningQueriesAsync
to replace the hardcoded TOP(5) in the DMV query.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Math.Clamp(1, int.MaxValue) guard to GetLongRunningQueriesAsync in
both Dashboard and Lite, and updates the Dashboard settings validation
to use an explicit range check with a descriptive error message.

Mirrors the LongRunningQueryMaxResults setting across the Lite project:
adds the App property, loads/saves it to settings.json, exposes it in
the Lite Settings UI, and passes it through to GetLongRunningQueriesAsync
to replace the hardcoded LIMIT 5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents an OverflowException if the value in settings.json is outside
the int32 range. The value is read as long, clamped to [1, int.MaxValue],
then cast back to int.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces hardcoded wait type exclusions in GetLongRunningQueriesAsync
with user-configurable booleans for SP_SERVER_DIAGNOSTICS, WAITFOR /
BROKER_RECEIVE_WAITFOR, backup waits, and miscellaneous waits. All
four filters default to true (existing behavior preserved). Settings
are exposed in the Notifications section of both Dashboard and Lite
Settings UIs and persisted to UserPreferences / settings.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…config-settings

Feature/long running queries config settings
…config-settings

Feature/long running queries config settings
Users can now exclude specific databases (e.g. DBAtools, msdb) from
generating alerts. The exclusion list is a comma-separated text box in
Settings > Notifications for both projects.

- Dashboard: filters LRQ at trigger level, blocking via SQL NOT IN clause,
  deadlock context filtered before email is built; list persisted in UserPreferences
- Lite: filters LRQ at trigger level; list persisted to alert_excluded_databases
  JSON array in settings.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The merge with dev reverted the excludedDatabases parameter from
GetAlertHealthAsync and GetBlockingValuesAsync. Re-applies those
changes on top of the dev merge (which also landed parameterised
TOP(@MaxResults) and the 1-1000 clamp in GetLongRunningQueriesAsync).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Owner

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, Hannah! Really nice feature — database-level alert exclusions are something folks have been asking about. Clean implementation across both Dashboard and Lite, and the testing sounds thorough. A few things I noticed:


1. Deadlock alert fires on unfiltered count (Dashboard)

The deadlock tray notification is triggered by health.DeadlockCount, which comes from GetDeadlockCountAsync — that query is never filtered by excluded databases. The BuildDeadlockContextAsync filtering only affects the email context body. So a deadlock in an excluded database will still trigger the tray alert, but the email will correctly omit it. This is inconsistent with how blocking and long-running queries work, where the filtering happens before the threshold check.

2. Lite is missing deadlock and blocking filtering

The Lite side only filters long-running queries. If Lite has deadlock and blocking alert paths, those aren't getting the exclusion filter applied. The Dashboard covers all three.

3. Unrelated change to max results clamp

In Lite/App.xaml.cs, the alert_long_running_query_max_results clamp was changed from Math.Clamp(v.GetInt64(), 1, 1000) to Math.Clamp(v.GetInt64(), 1, int.MaxValue). That removes the safety cap and seems unrelated to this PR.

4. Settings help text is incomplete

Both Dashboard and Lite say "applies to long-running query and blocking alerts" but the Dashboard also applies it to deadlocks (partially). Should probably mention deadlocks too, or clarify the actual scope once the filtering is consistent.

@HannahVernon
Copy link
Contributor Author

Sorry for the partially incorrect PR. Will fix, likely tomorrow, and update the PR.

HannahVernon and others added 3 commits March 5, 2026 11:16
The deadlock tray notification was firing based on health.DeadlockCount,
which comes from GetDeadlockCountAsync. That query reads from
sys.dm_os_performance_counters  a server-wide counter with no
database-level granularity  so deadlocks in excluded databases still
incremented the delta and triggered the alert, even though the email
body correctly omitted them via BuildDeadlockContextAsync.

This was inconsistent with blocking and long-running query alerts, where
filtering by excluded databases happens before the threshold check.

Fix:
- Add FilteredDeadlockCount (nullable long) to AlertHealthResult to carry
  a database-filtered deadlock count alongside the raw counter value.
- Add GetFilteredDeadlockCountAsync to DatabaseService.NocHealth, which
  sums deadlock_count_delta from collect.blocking_deadlock_stats over a
  5-minute window (matching AlertCooldown), excluding the configured
  databases. This mirrors the approach used by GetBlockingValuesAsync.
- In GetAlertHealthAsync, run GetFilteredDeadlockCountAsync in parallel
  when excluded databases are configured and populate FilteredDeadlockCount.
- In EvaluateAlertConditionsAsync, use effectiveDeadlockDelta =
  FilteredDeadlockCount ?? deadlockDelta for the threshold check, tray
  notification count, RecordAlert, and email currentValue.
  The raw counter is still stored in _previousDeadlockCounts unchanged.

Result: a deadlock in an excluded database no longer triggers the tray
alert or email, consistent with how blocking alerts behave.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- In CheckPerformanceAlerts, when AlertExcludedDatabases is configured,
  fetch the raw blocked process reports and deadlock rows then count only
  those not belonging to excluded databases before comparing against the
  threshold  mirroring the existing long-running query pattern.
- Blocking: filter GetRecentBlockedProcessReportsAsync results by
  DatabaseName; use effectiveBlockingCount for threshold check,
  tray notification, and email currentValue.
- Deadlocks: filter GetRecentDeadlocksAsync results via new
  IsDeadlockExcluded helper that parses deadlock_graph_xml and excludes
  a deadlock only when every process node's currentdbname is in the
  excluded list (conservative  cross-database deadlocks still fire);
  use effectiveDeadlockCount throughout.
- Apply the same exclusion filter inside BuildBlockingContextAsync and
  BuildDeadlockContextAsync so email alert bodies also omit events from
  excluded databases.
- Add using System.Xml.Linq for XElement.Parse in the new helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Accidentally changed Math.Clamp upper bound from 1000 to int.MaxValue
in a prior commit. Restores the safety cap to prevent unbounded result
sets from the long-running query alert.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both editions now filter blocking, deadlock, and long-running query
alerts by the excluded databases list. Update the settings hint text
to reflect all three alert types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@HannahVernon
Copy link
Contributor Author

@erikdarlingdata - here's a summary of the changes I made today:

Four issues addressed across three commits:


  1. Dashboard deadlock alert firing on unfiltered count (f41d087)

GetDeadlockCountAsync reads from sys.dm_os_performance_counters, which is server-wide and cannot be filtered by database. Added a new GetFilteredDeadlockCountAsync method that instead queries collect.blocking_deadlock_stats — summing
deadlock_count_delta over a 5-minute window (matching AlertCooldown), excluding configured databases. When excluded databases are configured, EvaluateAlertConditionsAsync uses this filtered count (effectiveDeadlockDelta) for the
threshold check, tray notification, RecordAlert, and email currentValue. The raw perf counter tracking (_previousDeadlockCounts) is left untouched.


  1. Lite missing deadlock and blocking filtering (5e41b5d)

CheckPerformanceAlerts was using summary.BlockingCount and summary.DeadlockCount (both unfiltered server-wide counts) directly against thresholds. Fixed to mirror the existing long-running query pattern:

  • Blocking: when excluded databases are configured and the raw count is at or above threshold, fetches GetRecentBlockedProcessReportsAsync and filters in-memory by DatabaseName. Uses effectiveBlockingCount for the threshold check,
    tray notification, and email value.
  • Deadlocks: same approach using GetRecentDeadlocksAsync, filtered via a new IsDeadlockExcluded helper that parses each deadlock's XML and excludes a deadlock only when every process node's currentdbname is in the excluded list
    (cross-database deadlocks still fire).
  • BuildBlockingContextAsync and BuildDeadlockContextAsync also now filter by excluded databases so email alert bodies are consistent with tray notifications.

  1. Restore safety cap on alert_long_running_query_max_results (746591a)

The Math.Clamp upper bound had been accidentally changed from 1000 to int.MaxValue. Restored to 1000.


  1. Settings help text updated (781069f)

Both Dashboard and Lite SettingsWindow.xaml hint text updated from "applies to long-running query and blocking alerts" to "applies to blocking, deadlock, and long-running query alerts" to accurately reflect all three alert types.

Copy link
Owner

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — 4 Items

1. GetFilteredDeadlockCountAsync catch block should return null, not 0

File: Dashboard/Services/DatabaseService.NocHealth.cs, line ~260 in the new method

Problem: The catch block returns 0, which gets assigned to FilteredDeadlockCount. In MainWindow.xaml.cs, the alert logic does:

var effectiveDeadlockDelta = health.FilteredDeadlockCount ?? deadlockDelta;

When FilteredDeadlockCount is 0 (from the catch), it's not null — so the fallback to deadlockDelta never happens. This means real deadlock alerts get silently suppressed if the query fails (e.g., older schema without collect.blocking_deadlock_stats, permissions issue, timeout).

Fix: Change the return type to Task<long?> and return null from the catch block. This way a failure falls back to the raw performance counter delta and deadlock alerts still work:

catch (Exception ex)
{
    Logger.Warning($"Failed to get filtered deadlock count: {ex.Message}");
    return null;  // Fall back to raw delta
}

Then update the method signature and the assignment in GetAlertHealthAsync accordingly.


2. SQL string interpolation should use parameterized queries

Files:

  • Dashboard/Services/DatabaseService.NocHealth.csGetBlockingValuesAsync (line ~215)
  • Dashboard/Services/DatabaseService.NocHealth.csGetFilteredDeadlockCountAsync (line ~243)

Problem: Both methods build SQL with string interpolation:

var dbFilter = string.Join(", ", excludedDatabases.Select(db => $"N'{db.Replace("'", "''")}'"));

The Replace("'", "''") escaping works, but string-built SQL is fragile and doesn't follow the pattern used everywhere else in the codebase (parameterized SqlCommand with @param placeholders). The values come directly from user input in the Settings text box.

Fix: Use sp_executesql with a comma-separated parameter and STRING_SPLIT, or build individual @db0, @db1, @db2 parameters dynamically:

var dbParams = new List<string>();
for (int i = 0; i < excludedDatabases.Count; i++)
{
    var paramName = $"@exdb{i}";
    dbParams.Add(paramName);
    cmd.Parameters.AddWithValue(paramName, excludedDatabases[i]);
}
var dbFilter = $"AND DB_NAME(s.dbid) NOT IN ({string.Join(", ", dbParams)})";

This matches the parameterized pattern used by every other query in DatabaseService.*.cs.


3. Settings UI placement is misleading

Files:

  • Dashboard/SettingsWindow.xaml (line ~280)
  • Lite/Windows/SettingsWindow.xaml (line ~207)

Problem: The "Database Exclusions" section is placed inside the long-running query options area — between the LRQ exclude checkboxes and the TempDB space checkbox. But this setting applies to blocking, deadlocks, AND long-running queries. Users will either miss it or think it only applies to long-running queries.

Fix: Move the "Database Exclusions" section to its own visually distinct group, either:

  • Above the individual alert type settings (blocking, deadlock, LRQ), since it's a cross-cutting filter
  • Or below all alert type settings as a "Global Alert Filters" section

In both XAML files, move the TextBlock + TextBox + helper text block to a position that makes its scope obvious. The helper text already says "applies to blocking, deadlock, and long-running query alerts" which is good — it just needs to not be visually nested under the LRQ section.


4. Verify Dashboard DeadlockItem.DatabaseName is populated

File: Dashboard/MainWindow.xaml.csBuildDeadlockContextAsync (line ~106 in diff)

Problem: The deadlock filtering in Dashboard does:

deadlocks = deadlocks
    .Where(d => string.IsNullOrEmpty(d.DatabaseName) ||
        !excludedDatabases.Any(e =>
            string.Equals(e, d.DatabaseName, StringComparison.OrdinalIgnoreCase)))
    .ToList();

This assumes DeadlockItem (or whatever model GetDeadlocksAsync returns) has a DatabaseName property that's populated with the database where the deadlock occurred.

Check: Does the GetDeadlocksAsync query actually return a database_name column and map it to this property? If DatabaseName is always null/empty, the string.IsNullOrEmpty(d.DatabaseName) check means every deadlock passes the filter and nothing ever gets excluded.

Compare with the Lite implementation which correctly parses the deadlock graph XML to extract currentdbname from process nodes — that approach works regardless of whether the collector stores a separate database_name column. If Dashboard's model doesn't have this field populated, you'll need the same XML parsing approach (see IsDeadlockExcluded in Lite/MainWindow.xaml.cs).

How to verify: Check the query in DatabaseService.Blocking.cs or wherever GetDeadlocksAsync is defined — look for whether database_name is in the SELECT list and mapped to the model.

HannahVernon and others added 7 commits March 5, 2026 13:56
…null

Returning 0 from the catch block meant FilteredDeadlockCount was set to
a non-null value on failure, so the null-coalescing fallback in
EvaluateAlertConditionsAsync never triggered:

  var effectiveDeadlockDelta = health.FilteredDeadlockCount ?? deadlockDelta;

A 0 count silently suppressed deadlock alerts on schema errors (e.g.
missing collect.blocking_deadlock_stats), permission issues, or
timeouts. Returning null lets the fallback to the raw performance
counter delta kick in so alerts still fire.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…adlockCountAsync

Replace string-interpolated NOT IN lists with dynamically built @exdb0,
@exdb1... parameters via cmd.Parameters.AddWithValue, matching the
parameterized pattern used throughout DatabaseService.*.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The database exclusions TextBox was nested inside the Long Running Query
Filters area, making it look like it only applied to LRQ alerts. Moved
it below all per-alert-type settings into its own 'Global Alert Filters'
section so its cross-cutting scope (blocking, deadlock, and long-running
query alerts) is visually obvious.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The deadlock filter used DB_NAME(bds.dbid) but collect.blocking_deadlock_stats
has no dbid column - it has database_name (sysname). This caused the query
to fail silently, falling back to the unfiltered delta every time.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…racy

Replace per-row DatabaseName filtering with deadlock graph XML parsing,
matching Lite's IsDeadlockExcluded approach. A deadlock is only excluded
when ALL process nodes' currentdbname are in the excluded list. This
prevents cross-database deadlocks from being silently suppressed when
only one participant's database is excluded.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…l body

The blocking threshold check already excluded filtered databases via
GetBlockingValuesAsync, but the email context could still include blocking
events from excluded databases. Now matches Lite's BuildBlockingContextAsync
which already filters.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@HannahVernon
Copy link
Contributor Author

Thanks for bearing with me through this review, Erik.

Fixes for items 1 through 4 have been implemented.

  1. GetFilteredDeadlockCountAsync catch block returns null, not 0, and the method signature is updated.

  2. SQL string interpolation has been changed out with explicit T-SQL parameterization in the full Dashboard. Checked how the query would look for cases with an empty excludedDatabases list, as well as one containing msdb against both GetFilteredDeadlockCountAsync() and GetBlockingValuesAsync() methods.

  3. Settings UI placement is misleading -> added a section called "Global Alert Filters", and moved the "Database Exclusions" text box into that section. Fired up the Dashboard and inspected the layout of the settings window - looks much better to me.

  4. checking excluded databases in GetDeadlocksAsync() is now filtering via the deadlock graph in BuildDeadlockContextAsync() instead of just the database_name column.


I asked Github Copilot to do a comprehensive review:

The user wants a final comprehensive re-audit. Let me get the complete diff again and systematically review everything. Since we just did a thorough audit and only added two small lines, I should still be thorough but can focus on
confirming the previous findings still hold plus verifying the new changes.

● Final audit: 32 PASS, 2 WARN, 0 FAIL.

The two WARNs are design notes, not bugs:

  1. Deadlock delta window mismatch — FilteredDeadlockCount uses a fixed 5-minute SQL window vs. the polling-interval raw delta. Already documented in the code comments as intentional.
  2. Dashboard vs Lite filtering architecture — Dashboard filters blocking at SQL level; Lite filters client-side. Both correct, just different due to their data access patterns.

No errors found. All SQL queries, null handling, filtering logic, edge cases, and settings persistence are correct across both editions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants