Subscribe to Windows IT Pro
August 15, 2011 12:33 PM

Designing SSIS Packages for High Performance

How to find and fix performance bottlenecks
SQL Server Pro
InstantDoc ID #136428
Rating: (3)

The process for designing SQL Server Integration Services (SSIS) packages is typically iterative. You start by getting the components working individually or in small sets, then concentrate on ensuring that the components will work in the correct sequence. During later iterations, you add in more components or adjust properties to perform error handling. Then, in a final pass, you might add in abstractions, taking advantage of variables and expressions to enable runtime changes for your package. But your work is not yet complete at this stage.

Before you put your package into production, you need to take some more time to review your package with an eye toward preventing, or at least mitigating, performance problems. Sooner or later, the business environment is likely to change, requiring you to perform the same work within a shorter time span or to process higher data volumes than originally planned. I’ll describe areas within a package that can cause performance problems and offer suggestions for modifying the package design to get better performance.

Bear in mind that there are various factors that can affect the performance of SSIS packages. Several of these factors are completely external to SSIS, such as the source systems that you’re using for data extractions, disk speed and configuration, NICs, network bandwidth, and the amount of memory available on the server executing the packages. I won’t address how to resolve bottlenecks caused by external factors, but rather concentrate on specific strategies you can take within the design of your packages. I assume that you already have a general understanding of SSIS architecture and package development practices.

 

Understanding Control Flow Performance

Every SSIS package has at least one task in the control flow. If you add multiple tasks to the control flow, you can direct the sequence of tasks by adding precedence constraints to connect a series of tasks from beginning to end. You can even group these tasks together in containers. Apart from the Data Flow Task, which I’ll describe in more detail later, the performance of each individual task depends on the external systems with which the task must communicate to get its job done. Therefore, the only way within the package to speed up control flow processing is to have tasks or containers (collectively called executables) run in parallel. Figure 1 illustrates parallelism in Container A and sequential workflow in Container B. In this example, the containers themselves also execute in parallel.

Figure 1: Running executables in parallel to speed up control flow processing
Figure 1: Running executables in parallel to speed up control flow processing

The package property that determines how many executables can run in parallel is MaxConcurrentExecutables, as shown in Figure 2. The default is -1, which means that the control flow engine will use the number of logical processors on the server plus 2. For example, if you’re executing the package with the default setting on a quad-core server, you can have up to six executables running in parallel.

Figure 2: Using the MaxConcurrentExecutables property to configure how many executables can run in parallel
Figure 2: Using the MaxConcurrentExecutables property to configure how many executables can run in parallel

In some cases, you won’t see much change if you try to increase the MaxConcurrentExecutables value. However, one scenario with potential for improvement is when you have a package with tasks that must wait for a response from external systems and your server is dedicated to executing packages. In that case, you might be able to boost parallelism by replacing the MaxConcurrentExecutables value with a higher number. Start by increasing the value to the number of processors plus 3, then test the package in Business Intelligence Development Studio (BIDS) to see whether the number of tasks executing in parallel increased. If so, continue to increment the property value by 1, test, and repeat until no further parallelization is possible, or you have no more tasks to run in parallel.

 

Understanding Data Flow Performance

The Data Flow Task is used to retrieve data from one or more sources, optionally change the structure or content of the data in transit, and send the data to one or more destinations. In business intelligence (BI) applications, this task is used to perform extraction, transformation, and loading (ETL) operations.

SSIS uses a pipeline engine with an in-memory buffer architecture to efficiently manage Data Flow Task operations. Performance of the pipeline engine largely depends on the number of records moving through the pipeline and the number of buffers required to accomplish the transformations and to move the data to the destinations. Therefore, to optimize performance, you need to understand how data flow components and the Data Flow Task properties affect pipeline throughput and buffer requirements.

 

Related Content:

ARTICLE TOOLS

Comments
  • pzhu1968
    9 months ago
    Aug 24, 2011

    The images are not shown.

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.