Skip to main content
Skip table of contents

Redshift to BigQuery

This Alli App migrates Redshift Objects or Redshift SQL over to a BigQuery Table

If you have issues please file a service desk ticket

Get Started

Within Marketplace, find the Redshift to BigQuery application in the browse section. Install the application and when setting up this app you will need to configure various fields. These are documented below and within the app details:

Best Practices

  1. Either the REDSHIFT_SQL or REDSHIFT_OBJ should be used, if both are used it will error out.

  2. This App is best used in the following scenarios:

    1. At the end of an analytical workflow where you need to move the final result to BigQuery

    2. One Off Data Load requests.

    3. PARTITON_COL SHOULD be used for all Data Copies that include a date column

      1. This will allow BigQuery to trim unneeded partitions during query time

Inputs

Input

Description

Required

Example

REDSHIFT_SQL

The SQL that generates the result that needs to be migrated to BigQuery

N

select * from gap_core.search_account where date >='2019-01-01'

REDSHIFT_OBJ

The Table or View that needs to be migrated to BigQuery

N

surveymonkey.omnichannel_complete

BIGQUERY_PROJECT

The BigQuery Project to create the new table in

Y, Defaults to pmg-datawarehouse

pmg-datawarehouse

BIGQUERY_DATASET

The BigQuery Dataset that the new table needs to be created in

Y

surveymonkey

BIGQUERY_TBL

The BigQuery Table Name

Y

omnichannel_complete

PARTITION_COL

The Column that should partition the data

N, SHOULD be used in most instances

date

GOOGLECREDS

The GOOGLE_APPLICATION_CREDENTIALS FIle.

Y, Defaults to pmg-datawarehouse creds

BIGQUERY_SCHEMA_FILE

The Target Schema File.

All Datatypes supported can be found here:
https://cloud.google.com/bigquery/docs/schemas#standard_sql_data_types

NOTE If there is no schema manually defined by the user, the code will now auto-generate one and output it to the console.

N

schema.json

CLUSTER_COLS

Comma delimited list of columns to cluster by. For Best Practices on when to cluster see the BigQuery Documentation Here: https://cloud.google.com/bigquery/docs/clustered-tables

N

engine, account_name

DROP_BIGQUERY_TBL_FLAG

Drops Target Table before attempting to load it.

Should be used if Schema changes

Y, Defaults to FALSE

REDSHIFT_HOSTNAME

The Redshift Host you need to connect to

N

redshift.data.alliplatform.com

REDSHIFT_DATABASE

The Redshift Database

N

datawarehouse

REDSHIFT_USERNAME

The Redshift User that has access to the data

N

stephen.frazier@pmg.com

If you provide a value for REDSHIFT_HOSTNAME, the other Redshift Template Variables will need to be populated, as well as you’ll need to add an Environment Variable named REDSHIFT_PASSWORD with the password for the corresponding REDSHIFT_USERNAME

Features Change log

  • Added the ability to pull data from any Redshift Cluster

  • Added Cluster by Column(s) Option to help speed up BigQuery Query Performance

  • Added Drop Table Option

  • Auto-Generate BigQuery Schema (Only if no schema is provided by the user)

    • Table Mapping of Pandas Dataframe dtype to BigQuery Data Type:

Pandas dtype

BigQuery Data Type

datetime

TIMESTAMP

string

STRING

integer

INT64

number

FLOAT64

any

STRING


Assumptions

  • Data will always load into BigQuery as Truncate/Insert

  • Data being migrated is less than 7 million rows

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.