Subscribe to Windows IT Pro
April 20, 2004 12:00 AM

What's New in DTS?

Just about everything. Take a look at some of the SQL Server 2005 highlights
SQL Server Pro
InstantDoc ID #42141
Rating: (0)

In early 2000, the Microsoft Data Transformation Services (DTS) development team I work on started revising DTS with the goals of building on previous success and of improving the product to support user requests and to provide a richer extraction, transformation, and loading (ETL) platform. We evaluated every aspect of DTS and decided to totally rewrite it. DTS in the upcoming SQL Server 2005 release, formerly code-named Yukon, sports many brand-new features as well as enhanced ones. Because so much of DTS is new in SQL Server 2005, I want to show you some of the most important changes and the new look of the DTS Designer. When I wrote this article, I was working with Beta 1 of SQL Server 2005 DTS, so some features might change in upcoming betas or in the final release. But if you're already familiar with SQL Server 2000 and 7.0 DTS releases, you'll be able to appreciate the coming improvements.

SQL Server 2005 DTS Design Goals
Because comprehending everything about DTS at a glance is difficult, let's just take a quick look at the most important goals and how the goals drove the design and feature decisions the DTS team made in SQL Server 2005. Although these descriptions are brief, they should help you grasp the magnitude of the changes.

Provide true ETL capabilities. Although the data pump in pre-SQL Server 2005 DTS is useful and flexible, most users recognize that it has its limitations and needs to be revamped. For example, the data pump supports only one source and one destination per pump. True enterprise ETL requires fast, flexible, extensible, and dependable data movement. SQL Server 2005 DTS provides this capability through the Data Flow Task—or, as our team calls it, the pipeline. The pipeline supports multiple sources, multiple transforms, and multiple destinations in one fast, flexible data flow. As of Beta 1, SQL Server 2005 DTS includes 26 transforms. The Conditional Split and Derived Column transforms use an expression evaluator to support operations that provide virtually limitless combinations of functionality for processing data. Other transforms such as the Slowly Changing Dimension, Fuzzy Match, Aggregate, File Extractor, File Inserter, Partition Processing, Data Mining Query, Dimension Processing, Lookup, Sort, Unpivot, and Data Conversion transforms provide powerful data-manipulation capabilities that don't require scripting. This change is a real benefit because users can develop transformation solutions faster and manage them easier than hand-coded solutions.

Distinguish between data flow, control flow, and event handling. SQL Server 2005 DTS emphasizes the differences between various kinds of data processing. In current DTS releases, users are sometimes confused when they try to distinguish between data flow and control flow because both appear on the DTS Designer surface. In SQL Server 2005 DTS, the concept of data flow includes all the activities users perform to extract, transform, and load data. Control flow comprises all the processes that set up a given environment to support ETL, including executing the data flow. SQL Server 2005 DTS also has event handlers that allow nonsequential control flow execution based on events that tasks and other objects generate inside a package. SQL Server 2005 DTS clearly distinguishes between data flow, control flow, and event handling in the UI by showing them in separate Designer surfaces.

Minimize disk usage. To make DTS into a screaming fast ETL tool, we needed to eliminate unnecessary disk writes, disk reads, and memory movement. Because ETL solutions can be quite complex, they typically involve some sort of disk caching and lots of memory movement and allocations. In some cases, you can't avoid disk usage—for example, during data extraction, data loading, or aggregation or sorting of data sets that are larger than available memory. But in many cases, moving memory and caching aren't necessary. The pipeline helps eliminate the avoidable cases by optimizing memory usage and being smart about moving memory only when absolutely necessary.

Improve scalability. To be accepted as an enterprise ETL platform, SQL Server 2005 DTS needed the ability to scale. Users in smaller shops might need to run DTS on less-powerful, affordable commodity hardware, and users in enterprise environments want it to scale up to SMP production machines. SQL Server 2005 DTS solves this scalability problem by using multiple threads in one process. This approach is more efficient and uses less memory than using multiple processes. SQL Server uses this scaling approach successfully, so we decided to use the same method for DTS.

Recognize the development-programming connection. Experienced DTS users know that developing packages is much like writing code, but DTS in SQL Server 2000 doesn't support that connection very well. However, SQL Server 2005 DTS provides a professional development environment that includes projects, deployment, configuration, debugging, source control, and sample code. Package writers will have the tools they need to effectively write, troubleshoot, maintain, deploy, configure, and update packages in a fully supported development environment.

Related Content:

ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.