Breadcrumbs

Databricks SQL Warehouse - Upload File to Table

Overview

Quickly upload a file from Workflows to a SQL table in Databricks. __Recommended Setup__ This should be used immediately after downloading data from another source. Although they are not required in order to connect, it is recommended that you provide the `Catalog` and the `Schema` that you will query. By not doing so, the connection will resort to the defaults and the uploaded table will reside there. The recommended approach is to provide the volume where the uploaded file will be staged, then copied into the target table. Workflows will remove the staged file after successfully copying into the target. It is also recommended to use one volume per schema, though this is not strictly enforced. If the volume provided does not exist, Workflows will create it. Additionally the [match type](https://www.shipyardapp.com/docs/reference/blueprint-library/match-type/) selected greatly affects how this Blueprint works. **Note** This blueprint cannot upload a file from your local machine.

Variables

Name

Reference

Type

Required

Default

Options

Description

Access Token

DATABRICKS_SQL_ACCESS_TOKEN

Password

-

-

The access token generated in Databricks for programatic access

Databricks Server Host

DATABRICKS_SQL_SERVER_HOST

Alphanumeric

-

-

The URL address of the SQL warehouse

Warehouse HTTP Path

DATABRICKS_SQL_HTTP_PATH

Alphanumeric

-

-

The extended path for the SQL warehouse

Catalog

DATABRICKS_SQL_CATALOG

Alphanumeric

-

-

The optional catalog to connect to. If none is provided, this will default to Hive Metastore

Schema

DATABRICKS_SQL_SCHEMA

Alphanumeric

-

-

The optional schema to connect to. If none is provided, the blueprint will connect to the `default` schema

Volume

DATABRICKS_SQL_VOLUME

Alphanumeric

-

-

The name of the volume to stage the file

Table Name

DATABRICKS_SQL_TABLE

Alphanumeric

-

-

The table in Databricks to write to

Data Types

DATABRICKS_SQL_DATATYPES

Alphanumeric

-

-

The optional Spark datatypes to use in Databricks. These should be in JSON format, and if none are provided then the datatypes will be inferred.

Insert Method

DATABRICKS_SQL_INSERT_METHOD

Select

append

Append: append

Create or Replace: replace

This decides whether to append to an existing table or overwrite an exiting table.

File Type

DATABRICKS_SQL_FILE_TYPE

Select

csv

CSV: csv

Parquet: parquet

The file type to load

Workflows File Match Type

DATABRICKS_SQL_MATCH_TYPE

Select

exact_match

Exact Match: exact_match

Glob Match: glob_match

Determines if the text in "Workflows File Name" will look for one file with exact match, or multiple files using regex.

Workflows Folder Name

DATABRICKS_SQL_FOLDER_NAME

Alphanumeric

-

-

The optional name of the folder where the file in Workflows is located

Workflows File Name

DATABRICKS_SQL_FILE_NAME

Alphanumeric

-

-

The name of the file in Workflows to load to Databricks

YAML

Below is the YAML template

YAML

source:
  template: Databricks SQL Warehouse - Upload File to Table
  inputs:
    DATABRICKS_SQL_ACCESS_TOKEN:
    DATABRICKS_SQL_SERVER_HOST:
    DATABRICKS_SQL_HTTP_PATH:
    DATABRICKS_SQL_CATALOG:
    DATABRICKS_SQL_SCHEMA:
    DATABRICKS_SQL_VOLUME:
    DATABRICKS_SQL_TABLE:
    DATABRICKS_SQL_DATATYPES:
    DATABRICKS_SQL_INSERT_METHOD: append
    DATABRICKS_SQL_FILE_TYPE: csv
    DATABRICKS_SQL_MATCH_TYPE: exact_match
    DATABRICKS_SQL_FOLDER_NAME:
    DATABRICKS_SQL_FILE_NAME:
  type: TEMPLATE
guardrails:
  retry_count: 1
  retry_wait: 0h0m0s
  runtime_cutoff: 1h0m0s
  exclude_exit_code_ranges:
    - 200
    - 202
    - 203
    - 204
    - 205
    - 206
    - 207
    - 208
    - 209
    - 210
    - 211
    - 249