Feature/add alert filter by database#427
Feature/add alert filter by database#427HannahVernon wants to merge 36 commits intoerikdarlingdata:devfrom
Conversation
…_DIAGNOSTIC wait types, and WAITFOR wait types.
…to allow future configurability on the number of long-running queries returned. Added optional parameter to control display of WAITFOR types in future.
…n instead of janky string addition.
…clusions Feature/long running query exclusions
…c() method. Removed System.Collections.Generic using statement as it is unnecessary.
…KUPTHREAD-to-long-running-queries Added exclusion for backup-related waits to GetLongRunningQueriesAsync() method.
…dded miscWaitsFilter to exclude XE_LIVE_TARGET_TVF waits. Removed unused parameters. Corrected minimum value for maxLongRunningQueryCount (minimum 1 instead of 5).
…KUPTHREAD-to-long-running-queries Feature/add exclusion for backupthread to long running queries
…ITFOR, BACKUPTHREAD, BACKUPIO, and XE_LIVE_TARGET_TVF wait types in Lite/Services/LocalDataService.WaitStats.cs
…ries-monitor-changes-to-lite Added filtering for various wait types to long running query code in Lite.
Adds LongRunningQueryMaxResults to UserPreferences (default 5) and exposes it in the Settings UI alongside the existing duration threshold. Threads the value through GetAlertHealthAsync and GetLongRunningQueriesAsync to replace the hardcoded TOP(5) in the DMV query. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Math.Clamp(1, int.MaxValue) guard to GetLongRunningQueriesAsync in both Dashboard and Lite, and updates the Dashboard settings validation to use an explicit range check with a descriptive error message. Mirrors the LongRunningQueryMaxResults setting across the Lite project: adds the App property, loads/saves it to settings.json, exposes it in the Lite Settings UI, and passes it through to GetLongRunningQueriesAsync to replace the hardcoded LIMIT 5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents an OverflowException if the value in settings.json is outside the int32 range. The value is read as long, clamped to [1, int.MaxValue], then cast back to int. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces hardcoded wait type exclusions in GetLongRunningQueriesAsync with user-configurable booleans for SP_SERVER_DIAGNOSTICS, WAITFOR / BROKER_RECEIVE_WAITFOR, backup waits, and miscellaneous waits. All four filters default to true (existing behavior preserved). Settings are exposed in the Notifications section of both Dashboard and Lite Settings UIs and persisted to UserPreferences / settings.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…config-settings Feature/long running queries config settings
…config-settings Feature/long running queries config settings
Users can now exclude specific databases (e.g. DBAtools, msdb) from generating alerts. The exclusion list is a comma-separated text box in Settings > Notifications for both projects. - Dashboard: filters LRQ at trigger level, blocking via SQL NOT IN clause, deadlock context filtered before email is built; list persisted in UserPreferences - Lite: filters LRQ at trigger level; list persisted to alert_excluded_databases JSON array in settings.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The merge with dev reverted the excludedDatabases parameter from GetAlertHealthAsync and GetBlockingValuesAsync. Re-applies those changes on top of the dev merge (which also landed parameterised TOP(@MaxResults) and the 1-1000 clamp in GetLongRunningQueriesAsync). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
Thanks for this, Hannah! Really nice feature — database-level alert exclusions are something folks have been asking about. Clean implementation across both Dashboard and Lite, and the testing sounds thorough. A few things I noticed:
1. Deadlock alert fires on unfiltered count (Dashboard)
The deadlock tray notification is triggered by health.DeadlockCount, which comes from GetDeadlockCountAsync — that query is never filtered by excluded databases. The BuildDeadlockContextAsync filtering only affects the email context body. So a deadlock in an excluded database will still trigger the tray alert, but the email will correctly omit it. This is inconsistent with how blocking and long-running queries work, where the filtering happens before the threshold check.
2. Lite is missing deadlock and blocking filtering
The Lite side only filters long-running queries. If Lite has deadlock and blocking alert paths, those aren't getting the exclusion filter applied. The Dashboard covers all three.
3. Unrelated change to max results clamp
In Lite/App.xaml.cs, the alert_long_running_query_max_results clamp was changed from Math.Clamp(v.GetInt64(), 1, 1000) to Math.Clamp(v.GetInt64(), 1, int.MaxValue). That removes the safety cap and seems unrelated to this PR.
4. Settings help text is incomplete
Both Dashboard and Lite say "applies to long-running query and blocking alerts" but the Dashboard also applies it to deadlocks (partially). Should probably mention deadlocks too, or clarify the actual scope once the filtering is consistent.
|
Sorry for the partially incorrect PR. Will fix, likely tomorrow, and update the PR. |
The deadlock tray notification was firing based on health.DeadlockCount, which comes from GetDeadlockCountAsync. That query reads from sys.dm_os_performance_counters a server-wide counter with no database-level granularity so deadlocks in excluded databases still incremented the delta and triggered the alert, even though the email body correctly omitted them via BuildDeadlockContextAsync. This was inconsistent with blocking and long-running query alerts, where filtering by excluded databases happens before the threshold check. Fix: - Add FilteredDeadlockCount (nullable long) to AlertHealthResult to carry a database-filtered deadlock count alongside the raw counter value. - Add GetFilteredDeadlockCountAsync to DatabaseService.NocHealth, which sums deadlock_count_delta from collect.blocking_deadlock_stats over a 5-minute window (matching AlertCooldown), excluding the configured databases. This mirrors the approach used by GetBlockingValuesAsync. - In GetAlertHealthAsync, run GetFilteredDeadlockCountAsync in parallel when excluded databases are configured and populate FilteredDeadlockCount. - In EvaluateAlertConditionsAsync, use effectiveDeadlockDelta = FilteredDeadlockCount ?? deadlockDelta for the threshold check, tray notification count, RecordAlert, and email currentValue. The raw counter is still stored in _previousDeadlockCounts unchanged. Result: a deadlock in an excluded database no longer triggers the tray alert or email, consistent with how blocking alerts behave. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- In CheckPerformanceAlerts, when AlertExcludedDatabases is configured, fetch the raw blocked process reports and deadlock rows then count only those not belonging to excluded databases before comparing against the threshold mirroring the existing long-running query pattern. - Blocking: filter GetRecentBlockedProcessReportsAsync results by DatabaseName; use effectiveBlockingCount for threshold check, tray notification, and email currentValue. - Deadlocks: filter GetRecentDeadlocksAsync results via new IsDeadlockExcluded helper that parses deadlock_graph_xml and excludes a deadlock only when every process node's currentdbname is in the excluded list (conservative cross-database deadlocks still fire); use effectiveDeadlockCount throughout. - Apply the same exclusion filter inside BuildBlockingContextAsync and BuildDeadlockContextAsync so email alert bodies also omit events from excluded databases. - Add using System.Xml.Linq for XElement.Parse in the new helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Accidentally changed Math.Clamp upper bound from 1000 to int.MaxValue in a prior commit. Restores the safety cap to prevent unbounded result sets from the long-running query alert. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both editions now filter blocking, deadlock, and long-running query alerts by the excluded databases list. Update the settings hint text to reflect all three alert types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@erikdarlingdata - here's a summary of the changes I made today: Four issues addressed across three commits:
The Math.Clamp upper bound had been accidentally changed from 1000 to int.MaxValue. Restored to 1000.
Both Dashboard and Lite |
erikdarlingdata
left a comment
There was a problem hiding this comment.
Code Review — 4 Items
1. GetFilteredDeadlockCountAsync catch block should return null, not 0
File: Dashboard/Services/DatabaseService.NocHealth.cs, line ~260 in the new method
Problem: The catch block returns 0, which gets assigned to FilteredDeadlockCount. In MainWindow.xaml.cs, the alert logic does:
var effectiveDeadlockDelta = health.FilteredDeadlockCount ?? deadlockDelta;When FilteredDeadlockCount is 0 (from the catch), it's not null — so the fallback to deadlockDelta never happens. This means real deadlock alerts get silently suppressed if the query fails (e.g., older schema without collect.blocking_deadlock_stats, permissions issue, timeout).
Fix: Change the return type to Task<long?> and return null from the catch block. This way a failure falls back to the raw performance counter delta and deadlock alerts still work:
catch (Exception ex)
{
Logger.Warning($"Failed to get filtered deadlock count: {ex.Message}");
return null; // Fall back to raw delta
}Then update the method signature and the assignment in GetAlertHealthAsync accordingly.
2. SQL string interpolation should use parameterized queries
Files:
Dashboard/Services/DatabaseService.NocHealth.cs—GetBlockingValuesAsync(line ~215)Dashboard/Services/DatabaseService.NocHealth.cs—GetFilteredDeadlockCountAsync(line ~243)
Problem: Both methods build SQL with string interpolation:
var dbFilter = string.Join(", ", excludedDatabases.Select(db => $"N'{db.Replace("'", "''")}'"));The Replace("'", "''") escaping works, but string-built SQL is fragile and doesn't follow the pattern used everywhere else in the codebase (parameterized SqlCommand with @param placeholders). The values come directly from user input in the Settings text box.
Fix: Use sp_executesql with a comma-separated parameter and STRING_SPLIT, or build individual @db0, @db1, @db2 parameters dynamically:
var dbParams = new List<string>();
for (int i = 0; i < excludedDatabases.Count; i++)
{
var paramName = $"@exdb{i}";
dbParams.Add(paramName);
cmd.Parameters.AddWithValue(paramName, excludedDatabases[i]);
}
var dbFilter = $"AND DB_NAME(s.dbid) NOT IN ({string.Join(", ", dbParams)})";This matches the parameterized pattern used by every other query in DatabaseService.*.cs.
3. Settings UI placement is misleading
Files:
Dashboard/SettingsWindow.xaml(line ~280)Lite/Windows/SettingsWindow.xaml(line ~207)
Problem: The "Database Exclusions" section is placed inside the long-running query options area — between the LRQ exclude checkboxes and the TempDB space checkbox. But this setting applies to blocking, deadlocks, AND long-running queries. Users will either miss it or think it only applies to long-running queries.
Fix: Move the "Database Exclusions" section to its own visually distinct group, either:
- Above the individual alert type settings (blocking, deadlock, LRQ), since it's a cross-cutting filter
- Or below all alert type settings as a "Global Alert Filters" section
In both XAML files, move the TextBlock + TextBox + helper text block to a position that makes its scope obvious. The helper text already says "applies to blocking, deadlock, and long-running query alerts" which is good — it just needs to not be visually nested under the LRQ section.
4. Verify Dashboard DeadlockItem.DatabaseName is populated
File: Dashboard/MainWindow.xaml.cs — BuildDeadlockContextAsync (line ~106 in diff)
Problem: The deadlock filtering in Dashboard does:
deadlocks = deadlocks
.Where(d => string.IsNullOrEmpty(d.DatabaseName) ||
!excludedDatabases.Any(e =>
string.Equals(e, d.DatabaseName, StringComparison.OrdinalIgnoreCase)))
.ToList();This assumes DeadlockItem (or whatever model GetDeadlocksAsync returns) has a DatabaseName property that's populated with the database where the deadlock occurred.
Check: Does the GetDeadlocksAsync query actually return a database_name column and map it to this property? If DatabaseName is always null/empty, the string.IsNullOrEmpty(d.DatabaseName) check means every deadlock passes the filter and nothing ever gets excluded.
Compare with the Lite implementation which correctly parses the deadlock graph XML to extract currentdbname from process nodes — that approach works regardless of whether the collector stores a separate database_name column. If Dashboard's model doesn't have this field populated, you'll need the same XML parsing approach (see IsDeadlockExcluded in Lite/MainWindow.xaml.cs).
How to verify: Check the query in DatabaseService.Blocking.cs or wherever GetDeadlocksAsync is defined — look for whether database_name is in the SELECT list and mapped to the model.
…null Returning 0 from the catch block meant FilteredDeadlockCount was set to a non-null value on failure, so the null-coalescing fallback in EvaluateAlertConditionsAsync never triggered: var effectiveDeadlockDelta = health.FilteredDeadlockCount ?? deadlockDelta; A 0 count silently suppressed deadlock alerts on schema errors (e.g. missing collect.blocking_deadlock_stats), permission issues, or timeouts. Returning null lets the fallback to the raw performance counter delta kick in so alerts still fire. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…adlockCountAsync Replace string-interpolated NOT IN lists with dynamically built @exdb0, @exdb1... parameters via cmd.Parameters.AddWithValue, matching the parameterized pattern used throughout DatabaseService.*.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The database exclusions TextBox was nested inside the Long Running Query Filters area, making it look like it only applied to LRQ alerts. Moved it below all per-alert-type settings into its own 'Global Alert Filters' section so its cross-cutting scope (blocking, deadlock, and long-running query alerts) is visually obvious. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The deadlock filter used DB_NAME(bds.dbid) but collect.blocking_deadlock_stats has no dbid column - it has database_name (sysname). This caused the query to fail silently, falling back to the unfiltered delta every time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…racy Replace per-row DatabaseName filtering with deadlock graph XML parsing, matching Lite's IsDeadlockExcluded approach. A deadlock is only excluded when ALL process nodes' currentdbname are in the excluded list. This prevents cross-database deadlocks from being silently suppressed when only one participant's database is excluded. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…l body The blocking threshold check already excluded filtered databases via GetBlockingValuesAsync, but the email context could still include blocking events from excluded databases. Now matches Lite's BuildBlockingContextAsync which already filters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for bearing with me through this review, Erik. Fixes for items 1 through 4 have been implemented.
I asked Github Copilot to do a comprehensive review:
● Final audit: 32 PASS, 2 WARN, 0 FAIL. The two WARNs are design notes, not bugs:
No errors found. All SQL queries, null handling, filtering logic, edge cases, and settings persistence are correct across both editions. |
What does this PR do?
Adds the ability to filter certain alert features by database name:
Which component(s) does this affect?
How was this tested?
Tested deadlock alerts with a classic "lock resources in opposite order" deadlock test in a dedicated database. With the database name included in the exclusion list, the alert is not fired when a deadlock occurs. With the database not present in the exclusion list, the deadlock alert is fired as expected.
Tested long running queries with the execution-database both in the list and not in the list - exclusion works as expected.
Checklist
dotnet build -c Debug)