Skip to main content
Skip table of contents

Infrastructure Alerts

A non-exhaustive list of the alerts you may see in the #datawarehouse-infra Slack channel and their meanings.

Pingdom Alerts:

Lets you know the status of the Alli Data platform. Up is good, Down is not.

AWS Alarms

Alarm

Image

Meaning

Long Running Autoscaling Instances

The number of autoscaling groups hasn’t dropped below 3 for longer than 14 hours.

Queue Backup

There are more than 500 datasource jobs queued for longer than 2 hours.

High Redshift Connections

The number of Redshift connections is higher than 200 for longer than 2 minutes.

No Redshift Automated Snapshots

Fewer than 1 snapshot has been completed over a 12 hour period.

Redshift Unhealthy

The Redshift status is reported as unhealthy with checks every 5 minutes.

This alert also happens when Redshift is automatically restarted during non-peak hours.

Redshift Out of Space

The Redshift disk space is over 90% used for longer than 30 minutes.

Redshift Blocking Queries

A Redshift query or queries have been blocking for longer than 90 minutes.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.