Skip to content

bug: cpu at 100% on database server #271

@mcdurdin

Description

@mcdurdin

Symptoms

Symptoms include timeouts on api queries, causing keymanweb.com, help.keyman.com, keyman.com, meaning that keymanweb in particular was failing to load, e.g. with heavy queries such as:

https://api.keyman.com/cloud/4.0/keyboards?jsonp=keyman.register&languageidtype=bcp47&version=17.0&keyboardid=khmer_angkor,basic_kbdkni

Reported also by @LornaSIL this morning:

Something seems to be broken with displaying the osk on help files: https://help.keyman.com/keyboard/sil_bolivia and https://help.keyman.com/keyboard/sil_bwe_karen/1.0.1/sil_bwe_karen

Diagnostics

Looking at the cluster, api.keyman.com database pod was showing 100% CPU utilization for last 2 days:

Image

Unclear why the cpu would be spiking at that point. No evidence of changes to database, or spike in api.keyman.com visits. Memory+disk show a spike that starts hours after the high cpu starts. So that's a bit weird too, but may be SQL Server resource management?

Mitigation

  • Restarted the pod. Resolved the immediate issue.
  • Will continue to monitor.

Additional actions

  • Monitor
  • We should have an alert setup for persistent high cpu (e.g. >10 minutes at >.9 CPU avg?)

cc @darcywong00 @tim-eves

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions