Monday, October 13, 2014

SSIS - How To Use Flat File Or Excel File In Lookup Transformation [Cache Transformation]

Scenario: 

We use Lookup Transformation to join input data with reference data to get required data from Reference Data. By using Lookup Transformation we can connect to databases and get the reference data. But our company has An Excel file that they want to use as reference. The business users can add or remove the records from this Excel. Before we load any data to our destination table we need to verify against this Excel sheet if data matches on required columns.


Solution:  

Once solution to this problem can be, Create the Staging Table and then truncate on each load and load from Flat file and then use in lookup transformation for reference.

In SSIS 2008, Cache Transformation was introduction so we don't have to load the data into Staging/Temp tables to use in Lookup Transformation. We can load the data in memory by using Cache Transformation and then use in Lookup Transformation.

Let's say if I have a source table as shown below
Fig 1: Source Data for Cache Transformation Example

As we can see that I don't have the CountryName. I need to get the CountryName from Flat file by joining the input and reference data on Country Code.


Fig 2: Flat File Reference Data for Cache Transformation


Step 1: 

Create an SSIS Package. Inside SSIS Package bring the Data Flow Task. In this Data Flow Task we will load the Data from Flat File to Cache Transformation. Bring the Flat source and create connection manager for flat file and then bring the Cache Transformation and configure as shown below.
Fig 3: Create Configure Cache Transformation in SSIS Package

Fig 4: Configure Index Position for Mapping Columns in Cache Transformation.

As we are going to use CountryCode for joining with out input column. Set the Index position for this column to 1. If you are using more than one column for joining , you can change the Index position. For first column, It will be 1 and then for second 2 and so on. We don't need to Index the Column/s which we are not using in Join.

Fig 5: Load Reference Data To Cache by using Cache Transformation in First Data Flow Task

In Fig 5, we have loaded the reference data from Flat file source to Cache by using Cache Transformation in Data Flow Task.

Step 2: 

Bring the second Data Flow Task and connect with First Data Flow Task. Inside 2nd Data Flow Task, Read the data from SQL table by using OLE DB Source.Bring the Lookup Transformation and attach OLE DB source to it and configure as shown below.

Fig 6: Configure OLE DB Source


Double Click the Lookup Transformation and then in Connection Type choose Cache Connection Manager as shown below.

I Left Fail Component in How to handle rows with no matching entries. You can configure as per your requirement. You can redirect or fail if there are some record/s that does not match with reference dataset.
Fig 7: Use Cache Connection Manager in Lookup Transformation


Fig 8: Choose Cache Connection Manager in Lookup Transformation

Go to Columns and then connect the columns on which do you want to join. I have connected CountryCode column from Source to CountryCode in Reference Dataset. By using the Check Boxes, Select the columns those you want to retrieve from reference dataset. In our case we want to retrieve Country Name and finally hit Ok. 
Fig 9: Map the Columns from Source to Reference Data Set


Add the destination and then map the columns. Just to show the output, I have used Multicast Transformation as Test Destination and added data viewer to show the records. 

Fig 10: Output from Lookup Transformation


Our final package will look like below fig.
Fig 1: How to Use Cache Transformation in SSIS Package

No comments:

Post a Comment