Figure 1. Data Flow |
I changed the DefaultBufferMaxRows property for the data flow to 3,000,000 rows for this example. This means that each output is allowed 3 million rows per output buffer. This will be our baseline for performance measurements. The outputs for this script component source will each contain 1 integer field:
Figure 2. Inputs and Outputs |
In the code for our script component we're going to create a class that contains 3 arrays, that contain 65 million integers, for each of the 3 outputs and call these outputs synchronously.
#region Namespaces
using System;
using System.Data;
using System.Threading.Tasks;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion
/// <summary>
/// This is the class to which to add your code. Do not change the name, attributes, or parent
/// of this class.
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void CreateNewOutputRows()
{
IDTSComponentMetaData100 compMetadata = this.ComponentMetaData;
ArrayA AClass = new ArrayA();
//Outputs data to output A
OutputA(AClass.A, compMetadata);
//Outputs data to output B
OutputB(AClass.B, compMetadata);
//Outputs data to output B
OutputC(AClass.C, compMetadata);
}
private void OutputA(int[] iAarray, IDTSComponentMetaData100 compMetadata)
{
OutputMessage(String.Format("Sending data to output A [{0}]", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff")), compMetadata);
foreach (int i in iAarray)
{
ABuffer.AddRow();
ABuffer.OUTA = i;
}
OutputMessage(String.Format("Completed sending data to output A [{0}]", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff")), compMetadata);
}
private void OutputB(int[] iBarray, IDTSComponentMetaData100 compMetadata)
{
OutputMessage(String.Format("Sending data to output B [{0}]", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff")), compMetadata);
foreach (int i in iBarray)
{
BBuffer.AddRow();
BBuffer.OUTB = i;
}
OutputMessage(String.Format("Completed data to output B [{0}]", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff")), compMetadata);
}
private void OutputC(int[] iCarray, IDTSComponentMetaData100 compMetadata)
{
OutputMessage(String.Format("Sending data to output C [{0}]", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff")), compMetadata);
foreach (int i in iCarray)
{
CBuffer.AddRow();
CBuffer.OUTC = i;
}
OutputMessage(String.Format("Completed data to output C [{0}]", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff")), compMetadata);
}
public class ArrayA
{
public int[] A = new int[65000000];
public int[] B = new int[65000000];
public int[] C = new int[65000000];
public ArrayA()
{
for (int i = 0; i < A.Length; i++)
{
A[i] = i;
B[i] = i;
C[i] = i;
}
}
}
private void OutputMessage(string message, IDTSComponentMetaData100 compMetadata)
{
{
bool mbool = false;
compMetadata.FireInformation(1, compMetadata.Name, message, "", 0, ref mbool);
}
}
}
Let's run the package and watch the outputs run synchronously one after the other. First A outputs:
After A has completed B kicks off:
Then finishes up the outputs with C:
When viewing the execution results we can see each output ran right after the other and took about 41 seconds to complete total:
Now while 41 seconds is good, we can do better. We can change this code so that all 3 outputs run asynchronously(at the same time) on their own threads. To do this we take advantage of a task factory. We create tasks to run each of the methods that output data to the 3 outputs. The task factory will call these tasks asynchronously. So we change the code to call our methods to:
Task[] tasks = new Task[3];
tasks[0] = Task.Factory.StartNew(() => OutputA(AClass.A, compMetadata));
tasks[1] = Task.Factory.StartNew(() => OutputB(AClass.B, compMetadata));
tasks[2] = Task.Factory.StartNew(() => OutputC(AClass.C, compMetadata));
//Wait for all tasks to finish before going to post execute
Task.WaitAll(tasks);
Let's run the package again and see how it performs with asynchronous outputs. At a first look we can see all outputs pushing out data at the same time:
From viewing the execution results we can verify that they all started pushing out data at the same time, and finished in about 30 seconds:
This means we increased our performance by about 27%. If you have packages that pump out a lot of data, and take a long time to finish, a 27% increase in performance can mean a lot. Now keep in mind your mileage may vary based on memory/number of processors/cores etc.
Figure 7. Outputs Running Asynchronously |
Figure 8. Asynchronous Results |