A non-exhaustive list of the alerts you may see in the #datawarehouse-infra Slack channel and their meanings.
Lets you know the status of the Alli Data platform. Up is good, Down is not.
Long Running Autoscaling Instances
The number of autoscaling groups hasn’t dropped below 3 for longer than 14 hours.
There are more than 500 datasource jobs queued for longer than 2 hours.
High Redshift Connections
The number of Redshift connections is higher than 200 for longer than 2 minutes.
No Redshift Automated Snapshots
Fewer than 1 snapshot has been completed over a 12 hour period.
The Redshift status is reported as unhealthy with checks every 5 minutes.
This alert also happens when Redshift is automatically restarted during non-peak hours.
Redshift Out of Space
The Redshift disk space is over 90% used for longer than 30 minutes.
Redshift Blocking Queries
A Redshift query or queries have been blocking for longer than 90 minutes.