Data Manipulation - Compare Datasets
Overview
Compare two CSV files and return files that show the rows with differing and overlapping information. It is expected that the two files provided will:
Provide columns in the first row of the CSV file.
Have the same number of columns
Have the same column names
After comparing data in the two files, the following files may be generated:
{File Name 1}_only.csv (contains only rows found in
File Name 1
){File Name 2}_only.csv (contains only rows found in
File Name 2
){File Name 1}_overlap.csv (contains rows found in both
File Name 1
ANDFile Name 2
)
If there is no unique data, a file with the _only
will not be created. If there is no overlapping data, the _overlap
file will not be created.
This Template is relatively memory intensive because it loads both file contents into memory using Pandas. For larger file sizes, we recommend running a comparison directly in a database.
Variables
Name | Reference | Type | Required | Default | Options | Description |
---|---|---|---|---|---|---|
File Name 1 | MANIPULATION_SOURCE_FILE_NAME_1 | Alphanumeric | ✅ | - | - | Name of the target file on Platform. |
Folder Name 1 | MANIPULATION_SOURCE_FOLDER_NAME_1 | Alphanumeric | ➖ | - | - | Name of the local folder on Platform where the target file lives. If left blank, will look in the home directory. |
File Name 2 | MANIPULATION_SOURCE_FILE_NAME_2 | Alphanumeric | ✅ | - | - | Name of the 2nd target file on Platform. |
Folder Name 2 | MANIPULATION_SOURCE_FOLDER_NAME_2 | Alphanumeric | ➖ | - | - | Name of the local folder on Platform where the target file lives. If left blank, will look in the home directory. |