Datasource Source Status Deep Dive

Overview

This document will explore the meaning of all source status options for datasources

There are currently 6 source status options for a datasource.

Inactive

Can:

Be loaded via Datasource page
Be synced via API
Be queried in Reports or directly in BigQuery, Redshift, or Snowflake

Cannot:

Run on an automated schedule

Recommended use case:

Datasources that run infrequently
- For example, a datasource that I want to trigger monthly
Datasources that are no longer loading new data
- For example, if a client is no longer active on a platform, but you still want to query the old data for reporting purposes

Setting an inactive datasource to active:

Inactive datasources can be set to active in the Advanced Settings section on the datasource page, or via API by passing sourceStatus: 1 in the request body of an edit request

Active

Can:

Be loaded via Datasource page
Be synced via API
Be queried in Reports or directly in BigQuery, Redshift, or Snowflake
Run on an automated schedule

Cannot:

Recommended use case:

Datasources that are loading new data on a daily basis

Deleted

Can:

Cannot:

Be loaded via Datasource page
Be synced via API
Be queried in Reports or directly in BigQuery, Redshift, or Snowflake
Run on an automated schedule

Recommended use case:

Cleaning up datasources that were never fully set up
Datasources for which the data loaded will never be needed going forward

Deleting a datasource:

Datasources can be deleted via the audit page or by hovering over the three dots on the datasources list page.

Setting a deleted datasource to active:

Deleted datasources cannot be un-deleted. Users can recreate a deleted datasource under the same name as the old datasource.

Archived

In lieu of archiving datasources, if you still care about a datasource’s data but do not want it to load on a schedule, it is recommended that you set it to inactive. If you want the datasource removed completely, it is recommended that you delete it.

Incomplete

Can:

Be queried in Reports or directly in BigQuery, Redshift, or Snowflake

Cannot:

Be loaded via Datasource page
Be synced via API
Run on an automated schedule

Recommended use case:

Users cannot set a datasource to Incomplete. Alli will set a datasource to incomplete automatically until all the setup is completed.
If you have a datasource that you do not want to run, it is recommended to set it to Inactive.

Setting an incomplete datasource to active:

Once the last setup step is completed and saved, the datasource will automatically be set from Incomplete to Active. You can also set a completed datasource to inactive if makes sense for the use case.

Deactivated

Can:

Be loaded via Datasource page
Be queried in Reports or directly in BigQuery, Redshift, or Snowflake

Cannot:

Be synced via API
Run on an automated schedule

Recommended use case:

Users cannot set a datasource to Deactivated. Alli will set a datasource to deactivated automatically if the datasource is actively loading and failing every time.
If you have a datasource that you do not want to run, it is recommended to set it to Inactive.

Setting an deactivated datasource to active:

Deactivated datasources can be set back to active by making changes to the datasource after reviewing the audit page and fixing the recurring error.

Datasource Deactivation

Datasource deactivation involves tracking repeated datasource errors, and eventually deactivating datasources that error over and over without any success. This is done to give better visibility to “true” errors (API issues, bugs, etc.), both for engineers and users, as well as saving resources for datasources that are being used. Many datasources fall out of use, have their authentication expire, and if no one is actively maintaining that datasource, are left to error endlessly.

There are 3 parameters that can lead to deactivation

Amount of errors without success (Max allowed is currently 100)
Amount of duplicate errors without success (Max allowed is currently 30)
Amount of days since success (Max allowed is currently 14 days)

Additionally, there is a grace period of 7 days, which means that no matter how many errors occur, a datasource cannot be set to deactivated for at least 7 days from the start of the chain of consecutive errors. There is also a minimum of 10 errors before deactivation. This means if a datasource was manually triggered monthly, it would not be set to deactivated immediately, even though the error range exceeds the maximum allowed.

Leading up to deactivation:

The datasource owner will receive an email notifying them their datasource is nearing deactivation.
These warning emails will send after every errant run when the datasource is within 10 errors of either of the error count thresholds and/or when the datasource is within 3 days of the 14 day error range threshold
Any successful run OR modification to the datasource will completely reset all counters.

Upon deactivation:

The datasource owner will receive an email notifying them that their datasource has been deactivated.
If they no longer need this datasource to run, they can ignore this email and leave it deactivated.
If they still need the datasource to run, they can review the audit page for the errors, and reconfigure the datasource to work properly. Upon fixing the datasource, the datasource will be set back to active, and all error counters will be completely reset. This should only be done if the datasource will be providing new data.

Again, any successful run OR modification to the datasource will always reset all counters towards deactivation. This functionality is meant to catch datasources that are not maintained and are never succeeding as a result. Datasources that are actively being monitored should be succeeding regularly and as a result, never get close to any of the deactivation thresholds.

FAQ:

Q: My datasource shows warnings on the audit page, does this mean it will get deactivated?

A: No, only terminal errors that immediately end the execution will be counted towards deactivation.

Q: I don’t need my datasource to run anymore, what can I do to avoid getting the extra deactivation emails?

A: At any point, users can set a datasource to inactive, which starting from the next day, will prevent any scheduled runs from happening. This will also prevent errors from stacking up.

Q: When I no longer need a datasource, should I wait for it to be deactivated automatically?

A: No, it is highly recommended to set a datasource to inactive at any time if you don’t expect/need it to load any new data regularly. Inactive datasources can still be queried and can always be set back to active at any time.