SSIS - How To Create / Use Temp Table In SSIS Package
Scenario:
We have create a SSIS Package for Upsert(Insert/Update). We get csv file with millions of records with (Id,Name,Address columns). If the record come with new Id , we need to insert that record in dbo.Customer table(id, name, address) and for existing IDs we need to update those records.
After doing some analysis, we got to know that the number of records those need to be updated on daily basis are minimum 100,000 per day. To perform above task we can use Lookup Transformation and find out existing and non existing records. Any non-existing IDs can be directly inserted into dbo.Customer table but for update we have to use OLE DB Command transformation. OLE DB Command transformation is slow, it will update one row at a time and for 100,000 records it will take long time.
How about inserting the records into some staging table and write TSQL Statement to Insert/update records? Good idea! It will be fast and easy to do. But my Architect do not want to create a new table :(
Solution:
Ok, How about we create Temp table and then use it in our package to perform the above task and once done, the Temp table will be gone!
Let's start with step by step approach
Step 1:
Prepare Source.csv file on desktop by using below data
Id,Name,Address
1,Aamir,ABC ADDRESS
2,Raza,Test Address
3,July, 123 River Side CA
4,Robert,540 Rio Rancho NM
Step 2:
Create dbo.Customer Table by using below script
USE TestDB GOCREATE TABLE dbo.Customer ( ID INT, Name VARCHAR(100), Address VARCHAR(100) )
Step 3:
Create SSIS Package to load csv file into dbo.Customer Table.( Insert new records and update existing)
Create OLE DB Connection to the database where your dbo.Customer table exists. Right Click on Connection and then click properties or Click on Connection and press F4 to go to properties.
Set RetainSameConnection=True.
Fig 1: Set RetainSameConnection to True for OLE DB Connection
Step 4:
Create ##Temp table by using Execute SQL Task as shown below by using
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))
Fig 2: Create ##Temp table by using Execute SQL Task
Step 5:
Bring Data Flow Task to Control Flow Surface and then connect Execute SQL task to it. Inside Data Flow task bring Flat File Source and make connection to Source.csv file that you have created in Step 1.
Drag Lookup Transformation and configure as shown below. Our goal is to Insert any record which Id does not exist in dbo.Customer table and if ID exists we want to update that records. Instead of using OLE DB Command Transformation, we will insert records which needs to be update in ##Temp table inside Data Flow Task.
Fig 3: Configure Lookup Transformation ( Redirect rows to no match output)
Fig 4: Choose Id from dbo.Customer for lookup
Fig 5: Map the Source Id to dbo.Customer.ID for lookup
Step 6:
Bring OLE DB Destination Transformation from Data Flow Items as shown. Join No Match Output ( new records) of Lookup to OLE DB Destination and choose destination Table (dbo.Customer).
Fig 6: Insert new records by using No Match Output of Lookup Transformation
As we do not want to use OLE DB Command transformation for update inside Data Flow Task. Let's write all records those need to be update into ##Temp table by using OLE DB Destination. We will not be able to see ##Temp table in drop down in OLE DB Destination. Here are two steps we need to take
i) Create a variable with name ##Temp as shown below
Fig 7: TableName variable holding Temp Table Name
ii) Go to SSMS and create ##Temp table ( if you would not create this table, you will not be able to map the columns in OLE DB Destination)
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))
Bring the OLE DB Destination and map to TableName Variable as shown below.
Fig 8: Configure OLE DB Destination to use TableName variable for Destination Table Name.
Fig 9: Map the Source Columns to ##Temp Table Columns
After all the configuration our Data Flow will look like below figure. I renames the transformation to provide better picture about what we are doing in this Data Flow Task.
Fig 10: Data Flow Task with ##Temp Table Destination.
Step 7:
Go to Control Flow Surface and Drag Execute SQL Task to write update statement.
UPDATE DST
SET DST.Name=SRC.Name
,DST.ADDRESS=SRC.ADDRESS
FROM dbo.Customer DST
INNER JOIN ##Temp SRC
ON DST.ID=SRC.ID
Fig 11: Execute SQL Task to Update Dbo.Customer from ##Temp
Our final SSIS Package will look like below
Fig 12: Insert/Update Package by using Temp Table for Updates
If we try to run the SSIS Package, It might complain that ##Temp does not exists. Go to package properties by right clicking in Control Flow Pane and Set DelayValidation=True. By setting DelayValidation we are asking the package not to validate any objects as ##Temp table does not exist at this point and it will be created later in Package.
Fig 13: Set Delay Validation=True
Run the Package couple of times and check the data in dbo.Customer table. Data should be loaded. Now let's go to Source.csv file and change some values for Name and Address columns and run the package one more time to make sure, Update logic is working fine.
Here is the data after update.
Id,Name,Address
1,Aamir1,Test ADDRESS
2,Raza1,Test Address
3,July, 123 River Side CA USA
4,Robert,540 Rio Rancho NM
Fig 14: Package Execution After Updating Records in Source.csv file
As we can see that the records are updated, where ever we made changes in Name and Address values.
Fig 16: dbo.Customer data after Upsert
I think there is a need to look for some more information about SSIS and its different aspects.SSIS always useful for functions such as Update and Insert.
ReplyDeleteSSIS Upsert