Friday, October 10, 2014

SSIS - How To Create / Use Temp Table In SSIS Package

Scenario:

We have create a SSIS Package for Upsert(Insert/Update). We get csv file with millions of records with (Id,Name,Address columns). If the record come with new Id , we need to insert that record in dbo.Customer table(id, name, address) and for existing IDs we need to update those records.

After doing some analysis, we got to know that the number of records those need to be updated on daily basis are minimum 100,000 per day. To perform above task we can use Lookup Transformation and find out existing and non existing records. Any non-existing IDs can be directly inserted into dbo.Customer table but for update we have to use OLE DB Command transformation. OLE DB Command transformation is slow, it will update one row at a time and for 100,000 records it will take long time.

How about inserting the records into some staging table and write TSQL Statement to Insert/update records? Good idea! It will be fast and easy to do. But my Architect do not want to create a new table :(

Solution:

Ok, How about we create Temp table and then use it in our package to perform the above task and once done, the Temp table will be gone!

Let's start with step by step approach

Step 1:

Prepare Source.csv file on desktop by using below data
Id,Name,Address
1,Aamir,ABC ADDRESS
2,Raza,Test Address
3,July, 123 River Side CA
4,Robert,540 Rio Rancho NM

Step 2:


Create dbo.Customer Table by using below script

USE TestDB
GOCREATE TABLE dbo.Customer
  (
     ID      INT,
     Name    VARCHAR(100),
     Address VARCHAR(100)
  )


Step 3: 

Create SSIS Package to load csv file into dbo.Customer Table.( Insert new records and update existing)
Create OLE DB Connection to the database where your dbo.Customer table exists. Right Click on Connection and then click properties or Click on Connection and press F4 to go to properties. 
Set RetainSameConnection=True. 
Fig 1: Set RetainSameConnection to True for OLE DB Connection


Step 4: 

Create ##Temp table by using Execute SQL Task as shown below by using 
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))
Fig 2: Create ##Temp table by using Execute SQL Task


Step 5: 

Bring Data Flow Task to Control Flow Surface and then connect Execute SQL task to it. Inside Data Flow task bring Flat File Source and make connection to Source.csv file that you have created in Step 1.
Drag Lookup Transformation and configure as shown below. Our goal is to Insert any record which Id does not exist in dbo.Customer table and if ID exists we want to update that records. Instead of using OLE DB Command Transformation, we will insert records which needs to be update in ##Temp table inside Data Flow Task.
Fig 3: Configure Lookup Transformation ( Redirect rows to no match output)

Fig 4: Choose Id from dbo.Customer for lookup

Fig 5: Map the Source Id to dbo.Customer.ID for lookup

Step 6:

Bring OLE DB Destination Transformation from Data Flow Items as shown. Join No Match Output ( new records) of Lookup to OLE DB Destination and choose destination Table (dbo.Customer).
Fig 6: Insert new records by using No Match Output of Lookup Transformation

As we do not want to use OLE DB Command transformation for update inside Data Flow Task. Let's write all records those need to be update into ##Temp table by using OLE DB Destination. We will not be able to see ##Temp table in drop down in OLE DB Destination. Here are two steps we need to take
i) Create a variable with name ##Temp as shown below
Fig 7: TableName variable holding Temp Table Name

ii) Go to SSMS and create ##Temp table ( if you would not create this table, you will not be able to map the columns in OLE DB Destination)
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))

Bring the OLE DB Destination and map to TableName Variable as shown below.
Fig 8: Configure OLE DB Destination to use TableName variable for Destination Table Name.


Fig 9: Map the Source Columns to ##Temp Table Columns

After all the configuration our Data Flow will look like below figure. I renames the transformation to provide better picture about what we are doing in this Data Flow Task.
Fig 10: Data Flow Task with ##Temp Table Destination.

Step 7:

Go to Control Flow Surface and Drag Execute SQL Task to write update statement.
UPDATE DST 
SET DST.Name=SRC.Name
,DST.ADDRESS=SRC.ADDRESS
FROM  dbo.Customer DST
INNER JOIN ##Temp SRC
ON DST.ID=SRC.ID

Fig 11: Execute SQL Task to Update Dbo.Customer from ##Temp 

Our final SSIS Package will look like below
Fig 12: Insert/Update Package by using Temp Table for Updates

If we try to run the SSIS Package, It might complain that ##Temp does not exists. Go to package properties by right clicking in Control Flow Pane and Set DelayValidation=True. By setting DelayValidation we are asking the package not to validate any objects as ##Temp table does not exist at this point and it will be created later in Package. 
Fig 13: Set Delay Validation=True

Run the Package couple of times and check the data in dbo.Customer table. Data should be loaded. Now let's go to Source.csv file and change some values for Name and Address columns and run the package one more time to make sure, Update logic is working fine.

Here is the data after update.
Id,Name,Address
1,Aamir1,Test  ADDRESS
2,Raza1,Test Address
3,July, 123 River Side CA USA
4,Robert,540 Rio Rancho NM

Fig 14: Package Execution After Updating Records in Source.csv file

As we can see that the records are updated, where ever we made changes in Name and Address values.

Fig 16: dbo.Customer data after Upsert

1 comment:

  1. I think there is a need to look for some more information about SSIS and its different aspects.SSIS always useful for functions such as Update and Insert.

    SSIS Upsert

    ReplyDelete