SSIS - How to Delete Top N Rows from CSV or Text File in SSIS by using Script Task
Scenario:
We have received text or csv file. The file has some company information,date created etc. before the data rows start. Our goal is to delete these information rows and regenerate file start from header row.
Fig: Source File With Company Information
Solution:
We will be using Script Task to Delete the Top N rows from text file. We will be reading all the data from the file in string type array and then overwrite the original file after removing Top N rows.
Step 1:
Create two variables as shown in figure.VarFilePath is variable holding the file path. VarDeleteTopNRows is variable that will hold the number of rows you want to delete from starting of file.
Fig 1: Create variables in SSIS Package To Delete Top N Rows from Flat File by using Script Task
Step 2:
Bring the Script Task to Control Flow Pane. Map the above Variables as shown to ReadOnlyVariables in Script Task.
Fig 2: Map Variables in Script Task
Paste the below Code. I have bold the code that I wrote, rest of the code is auto generated by Script Task.
#region Namespaces using System; using System.Data; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; using System.IO; using System.Linq; #endregion namespace ST_c62f3dcfb0964917aade179aac4edfab { [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { public void Main() { // TODO: Add your code here string FilePath = Dts.Variables["User::VarFilePath"].Value.ToString(); Int32 DeleteTopNRows = Convert.ToInt32(Dts.Variables["VarDeleteTopNRows"].Value); string[] lines = System.IO.File.ReadAllLines(FilePath); lines = lines.Skip(DeleteTopNRows).ToArray(); System.IO.StreamWriter file = new System.IO.StreamWriter(FilePath); foreach (string line in lines) { // MessageBox.Show(line.ToString()); file.WriteLine(line); } file.Close(); Dts.TaskResult = (int)ScriptResults.Success; } enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; } }Save the code. Run the script task and once completed. Check the file again.
Fig 3: Top 3 Rows Deleted By SSIS Package.
You can change the value of variable VarDeleteTopNRows according to your requirements.
I feel SSIS is the most useful and important aspect and tools which solves complex database problems on a large scale to find out how important it is in the testing process.
ReplyDeleteSSIS Upsert