K3Intermediate

Alert Threshold Configuration

30 minWhen tuning alerts

Format: Design reasonable alert thresholds.

Common mistakes:

  • Threshold too low -> Alert storm ("boy who cried wolf" effect, real problems get ignored)
  • Threshold too high -> Problems discovered too late

Exercise: Set alert thresholds for the following metrics:

                Warning     Critical
Response time   >___ms      >___ms
Error rate      >___%       >___%
CPU usage       >___%       >___%
Disk usage      >___%       >___%
API failure rate >___%      >___%

My Notes