Thursday, October 9, 2014

SSIS Nugget: Engine Threads

 
There is a property of each data-flow task called EngineThreads which dictates, quite simply, the number of threads that run in the data-flow pipeline. But what does that mean exactly and how can it affect your data-flow? Well BOL doesn't have much on the subject simply saying "An integer that specifies the number of threads that the data flow task can use during execution". Well that doesn't help much does it? It doesn't tell you what an engine thread actually is so by way of clarification I set about trying to find out more about them.
So I think we can make some simple assumptions. Like, a thread can execute in parallel with other threads and the maximum number of threads that can execute in parallel is determined by the EngineThreads property. Let's show that. here's a data-flow at design-time.
20051002SeperateThreadsDesignTime.JPG
You can see that there are 8 completely independant data paths here right? Each data path has an arbitrary number of rows produced by the source script component so that they all execute for varying lengths of time. So what happens when you run this thing? As you've probably guessed by now the EngineThreads property determines how many of these run in parallel. On the next screenshot you can see what happens at run-time when EngineThreads=5.
20051002SeperateThreadsRunTime.JPG
There are only 5 data paths executing in parallel. Which is what I expected to be honest.
So then I started to wonder if you can have more than 1 thread when the data paths are not independant. In other words, can you have multiple threads in the same execution tree. Here's my first attempt at looking into that (shown at run-time):
20051002CombinedThreadsRunTime.JPG
Here I've got a single source adapter and a multicast component that splits that into 8 seperate data paths. I have EngineThreads=4. As you can see from the annotated rowcounts the value of EngineThreads does not affect the 8 data paths executing in parallel so obviously these are all still in the same thread.
So what do we know at this point? Well, if I have 1 source component that that results in 1 thread, no matter what you do with it downstream. So then I thought, what happens if we have more than 1 source component and downstream we combine them into 1 data-path. How many threads are executing then? Well here's an attempt to investigate exactly that:
20051002MergeThreadsRunTime.JPG
On this data-flow I had EngineThreads=2.
Now OK, there's alot of yellow boxes on here but the important thing to look at is the rowcount annotations. There are 5 source adapters here but only 2 of them are producing rows. 2 you say? Well hey, that's how many EngineThreads I've specified this thing to have!!! From this data-flow we can deduce that each source adapter produces a seperate thread regardless of whether the data paths are combined downstream or not.

So what have we learned here? Well quite simply every source adapter results in a seperate thread, nothing more that that. This is kind of backed up by the following statement that I have just found in BOL as well: "The source threads in an execution plan represent the source components in the data flow". This article actually talks about source threads and work threads as seperate things in the data-flow so I think there's more to be discovered here - but I'll save that for another day.
So the ultimate question to be answered here is "What should I set EngineThreads property to?" Well as usual the answer is "It depends" so the only advice I have is to test and measure, test and measure, test and measure... see what works for you.

No comments:

Post a Comment