-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Symptoms
Symptoms include timeouts on api queries, causing keymanweb.com, help.keyman.com, keyman.com, meaning that keymanweb in particular was failing to load, e.g. with heavy queries such as:
Reported also by @LornaSIL this morning:
Something seems to be broken with displaying the osk on help files: https://help.keyman.com/keyboard/sil_bolivia and https://help.keyman.com/keyboard/sil_bwe_karen/1.0.1/sil_bwe_karen
Diagnostics
Looking at the cluster, api.keyman.com database pod was showing 100% CPU utilization for last 2 days:
Unclear why the cpu would be spiking at that point. No evidence of changes to database, or spike in api.keyman.com visits. Memory+disk show a spike that starts hours after the high cpu starts. So that's a bit weird too, but may be SQL Server resource management?
Mitigation
- Restarted the pod. Resolved the immediate issue.
- Will continue to monitor.
Additional actions
- Monitor
- We should have an alert setup for persistent high cpu (e.g. >10 minutes at >.9 CPU avg?)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
