Breadcrumbs

Data Manipulation - Compare Datasets

Overview

Compare two CSV files and return files that show the rows with differing and overlapping information. It is expected that the two files provided will:

  • Provide columns in the first row of the CSV file.

  • Have the same number of columns

  • Have the same column names

After comparing data in the two files, the following files may be generated:

  • {File Name 1}_only.csv (contains only rows found in File Name 1)

  • {File Name 2}_only.csv (contains only rows found in File Name 2)

  • {File Name 1}_overlap.csv (contains rows found in both File Name 1 AND File Name 2)

If there is no unique data, a file with the _only will not be created. If there is no overlapping data, the _overlap file will not be created.

This Template is relatively memory intensive because it loads both file contents into memory using Pandas. For larger file sizes, we recommend running a comparison directly in a database.

Variables

Name

Reference

Type

Required

Default

Options

Description

File Name 1

MANIPULATION_SOURCE_FILE_NAME_1

Alphanumeric

-

-

Name of the target file on Platform.

Folder Name 1

MANIPULATION_SOURCE_FOLDER_NAME_1

Alphanumeric

-

-

Name of the local folder on Platform where the target file lives. If left blank, will look in the home directory.

File Name 2

MANIPULATION_SOURCE_FILE_NAME_2

Alphanumeric

-

-

Name of the 2nd target file on Platform.

Folder Name 2

MANIPULATION_SOURCE_FOLDER_NAME_2

Alphanumeric

-

-

Name of the local folder on Platform where the target file lives. If left blank, will look in the home directory.