Wait For

Kubernetes clusters and similar dynamic systems may experience temporary discrepancies between the actual and intended state of resources. For example, a deployment could momentarily appear unhealthy during a scaling operation. If alerts are configured for config.unhealthy events, these transient state fluctuations might lead to an overwhelming number of unnecessary notifications.

To address this issue, you can utilize the waitFor parameter. This feature allows you to define a delay before sending notifications for specific events. After an event occurs, the system rechecks its status following the specified wait period. Only if the undesired state persists does a notification trigger.

info

waitFor is only applicable to notifications of health related events

notify-unhealthy-deployments.yaml
apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
  name: deployment-unhealthy-alerts
spec:
  events:
    - config.unhealthy
  waitFor: 2m
  filter: config.type == 'Kubernetes::Deployment'
  to:
    email: alerts@acme.com

Handling Scrape Lag

waitFor re-evaluates the health based on the current state in config-db. However, in some circumstances, there may be a delay between when a change occurs and when it's refelected in config-db, potentially resulting in false positives.

waitForEvalPeriod forces an incremental scrape of the resource before sending a notification. It waits for up to this period for a scrape to complete before sending a notification.

apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
  name: deployment-unhealthy-alerts
spec:
  events:
    - config.unhealthy
  waitFor: 5m
  waitForEvalPeriod: 30s