Google BigQuery - Upload File to Table from Workflows

Overview

Upload one or more CSV files to any table in BigQuery. The [match type](https://www.shipyardapp.com/docs/reference/blueprint-library/match-type/) selected greatly affects how this Blueprint works. With the file data, you can: - **Append Data** - Add the contents of your file to the end of the table. - **Overwrite Data** - Write over the entire contents of table with the contents of your file. #### Schema Column names are inferred from the header row of your CSV file. If the table already exists, the header values are matched to the table column names. The header names must match the existing column names in your table, otherwise the Vessel will error. The preferred method of providing a schema is an array of JSON, see this example from BigQuery's documentation: ```json [ { "name": string, "type": string, "mode": string, "fields": [ { object (TableFieldSchema) } ], "description": string, "policyTags": { "names": [ string ] }, "maxLength": string, "precision": string, "scale": string, "collation": string, "defaultValueExpression": string, "roundingMode": string }, { "name": string, "type": string, ... } ] ``` For legacy purposes, this blueprint also supports specificying a schema as an array of arrays like so: `[['Column1', 'String'], ['Column2', 'Int64']]` #### Data Loading The data is copied from the provided file to the target endpoint in BigQuery. For larger datasets, we recommend running a [batch loading process](https://cloud.google.com/bigquery/docs/batch-loading-data#python). In all instances, if the table name does not already exist, a new table will be created with datatypes inferred from the CSV contents or specified through the `Schema` input. **Note:** This Vessel cannot be used to upload a local file from your computer. **Recommended Setup:** 1. A Vessel built with this Blueprint should typically run after a Vessel that either downloads a file to Workflows or generates a file with code.

Variables

Name	Reference	Type	Required	Default	Options	Description
Dataset Name	BIGQUERY_DATASET_NAME	Alphanumeric	✅	None	-	Name of the dataset where the BigQuery table lives.
Table Name	BIGQUERY_TABLE_NAME	Alphanumeric	✅	None	-	Name of the BigQuery table to upload the dataset to.
Workflows File Name Match Type	BIGQUERY_SOURCE_FILE_NAME_MATCH_TYPE	Select	✅	exact_match	Exact Match: exact_match Regex Match: regex_match	Determines if the text in "Local File Name" will look for one file with exact match, or multiple files using regex.
Workflows File Name	BIGQUERY_SOURCE_FILE_NAME	Alphanumeric	✅	None	-	Name of the target CSV file on Workflows. Can be regex if "Match Type" is set accordingly.
Workflows Folder Name	BIGQUERY_SOURCE_FOLDER_NAME	Alphanumeric	➖	None	-	Name of the local folder on Workflows to upload the target file from. If left blank, will look in the home directory.
Upload Method	BIGQUERY_UPLOAD_TYPE	Select	✅	append	Append Data: append Overwrite Data: overwrite	Determines how the data in your file(s) will be added to the table.
Service Account	GOOGLE_APPLICATION_CREDENTIALS	Password	✅	None	-	JSON from a Google Cloud Service account key.
Schema	BIGQUERY_SCHEMA	Alphanumeric	➖	None	-	Schema for the uploaded dataset, formatted as JSON or a double-nested list. If left blank, it will be auto-detected.
Header Rows to Skip	BIGQUERY_SKIP_HEADER_ROWS	Integer	➖	0	-	Number of header rows to skip when inserting data. Only required if provided custom schema.
Quoted Newline	BIGQUERY_QUOTED_NEWLINE	Boolean	✅	FALSE	-	Allow quoted data containing newline characters
Project ID	BIGQUERY_PROJECT	Alphanumeric	➖	-	-
Location	BIGQUERY_LOCATION	Alphanumeric	➖	-	-

YAML

Below is the YAML template

YAML


source:
  template: Google BigQuery - Upload File to Table from Workflows
  inputs:
    BIGQUERY_DATASET_NAME:
    BIGQUERY_TABLE_NAME:
    BIGQUERY_SOURCE_FILE_NAME_MATCH_TYPE: exact_match
    BIGQUERY_SOURCE_FILE_NAME:
    BIGQUERY_SOURCE_FOLDER_NAME:
    BIGQUERY_UPLOAD_TYPE: append
    GOOGLE_APPLICATION_CREDENTIALS:
    BIGQUERY_SCHEMA:
    BIGQUERY_SKIP_HEADER_ROWS: '0'
    BIGQUERY_QUOTED_NEWLINE: 'FALSE'
    BIGQUERY_PROJECT:
    BIGQUERY_LOCATION:
  type: TEMPLATE
guardrails:
  retry_count: 1
  retry_wait: 0h0m0s
  runtime_cutoff: 1h0m0s
  exclude_exit_code_ranges:
    - 101
    - 102
    - 103
    - 104
    - 105
    - 106
    - 200
    - 203
    - 204
    - 205
    - 206
    - 207
    - 208
    - 209