SSIS - How to Perform Union Operation in SSIS Package
Scenario:
We have two text files and we want to perform Union operation ( only get distinct records from both files) and load to our Destination. When we look in Data Flow Items, we don't see any Transformation that's name is Union. So how would we do this in SSIS?Solution 1 ( Use Sort Transformation)
In this post we will first use Union All Transformation to union all records. Union All Transformation is going to return us all records, if they are present multiple times, Union All Transformation is going to return us multiple records. Then we will use Sort Transformation to eliminate duplicates and keep only one copy of them. Let's start with step by step approach
Step 1:
Create two text files as shown below. As you can see I have one record ( Aamir,Shahzad,XYZ Address) that is present in both files, rest of records are unique.
Fig 1: Text files for Union Operation in SSIS Package
Step 2:
Create new SSIS Package. Inside the SSIS Package, Bring the Data Flow Task to Control Flow Pane. Inside Data Flow Task, Bring Two Flat File Sources and create connection to TestFile1 and TestFile2.
Bring the Union All Transformation in Data Flow Pane and Connect the Both Flat File Source to it. As my column names in Testfile1 and TestFile2 are same, It will automatically map them. If your columns names are different , double click on Union All Transformation and map the columns from sources.
Fig 2: Union All Transformation in SSIS to merge Data from Two Sources
As Union All is going to return us all records , even duplicates. We want to get only distinct records as Union operation. Let's bring Sort Transformation and configure as shown below
Fig 3: Use Sort Transformation To Remove Duplicate Records
Now we can write these records to destination table or file. In my case just to show you, It worked, I am going to put Multicast Transformation and then add Data Viewer between Sort and Multicast Transformation to show you we performed Union Operation by using Union All and Sort Transformation together. If you want to learn more about Data Viewer, you can check this post.
Fig 4: Performing Union Operation in SSIS Package by using Union All and Sort Together
As we can see in Fig 4, two records are read from each source. Union All Transformation returned us 4 records( Aamir,Shahzad,XYZ) as duplicate record. We used Sort Transformation to eliminate duplicates so we can get output Union would have return us. Sort removed the duplicate copies and returned us three records.
Solution 2 ( Use Aggregate Transformation)
We can use Aggregate Transformation with Union All Transformation to perform Union Operation in SSIS as well.
Instead of using Sort, let's put Aggregate Transformation after Union All Transformation and configure as shown below
Fig 5: Perform Union Operation in SSIS By using Aggregate Transformation
Let's run our SSIS Package and see if this package is performing the Union should.
Fig 6: Performing Union Operation By Using Union All and Aggregate Transformation Together
If you have small number of records and enough memory on machine where you are running the SSIS Package, this can be quick solution. If you have large number of records , These package can take long time to run. As Sort and Aggregate Transformation are blocking transformations. If you have large number of records, You might want to inserted them into database tables and then perform Union operation.