Skip to main content
Skip table of contents

Data Manipulation - Compare Datasets

Overview

Compare two CSV files and return files that show the rows with differing and overlapping information. It is expected that the two files provided will:

  • Provide columns in the first row of the CSV file.

  • Have the same number of columns

  • Have the same column names

After comparing data in the two files, the following files may be generated:

  • {File Name 1}_only.csv (contains only rows found in File Name 1)

  • {File Name 2}_only.csv (contains only rows found in File Name 2)

  • {File Name 1}_overlap.csv (contains rows found in both File Name 1 AND File Name 2)

If there is no unique data, a file with the _only will not be created. If there is no overlapping data, the _overlap file will not be created.

This Template is relatively memory intensive because it loads both file contents into memory using Pandas. For larger file sizes, we recommend running a comparison directly in a database.

Variables

Name

Reference

Type

Required

Default

Options

Description

File Name 1

MANIPULATION_SOURCE_FILE_NAME_1

Alphanumeric

-

-

Name of the target file on Platform.

Folder Name 1

MANIPULATION_SOURCE_FOLDER_NAME_1

Alphanumeric

-

-

Name of the local folder on Platform where the target file lives. If left blank, will look in the home directory.

File Name 2

MANIPULATION_SOURCE_FILE_NAME_2

Alphanumeric

-

-

Name of the 2nd target file on Platform.

Folder Name 2

MANIPULATION_SOURCE_FOLDER_NAME_2

Alphanumeric

-

-

Name of the local folder on Platform where the target file lives. If left blank, will look in the home directory.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.