One of the most dramatic changes in SQL Server 2005 is the redesign of Data Transformation Services (DTS); renamed SQL Server 2005 Integration Services. SSIS is a strong extraction, transformation, and loading (ETL) product that offers insane performance, a large catalog of dynamic components, a sound deployment model, and flexibility and extensibility. Extensibility has been a strong suit for Microsoft products, and the company has concentrated on building robust platforms that meet common customer needs but that customers can extend to meet uncommon needs as well. As you consider the role SQL Server 2005and particularly SSISwill fulfill in your environment, you need to understand what platform extensibility will let you do.
Let's walk through how to create, install, and test a sample custom source component for SSIS that can read and parse log files in an Internet Information Server (IIS) Web log. This custom component lets an SSIS package consume an IIS log file as a data source, then transform and direct the file to a destination component. You'll see how easily you can create a simple custom source component.
Note that we developed and tested this example on SQL Server 2005 Beta 2 October Community Technology Preview (IDW 9) release. At this point in the product's development, the major pieces should be pretty well baked, but some details, such as object names that include the old DTS name, might still change before the product is released. Although we expect that this example will still compile and run on future betas and the released product, Microsoft made some significant changes in SSIS between Beta 1 and Beta 2, and those kinds of large changes could still show up. For background information about the kinds of changes you can expect when you move from DTS to SSIS, see Kirk Haselden's articles in Related Reading.
What Is a Custom Source Component?
A source component is an SSIS adapter that feeds information into a data flow. SSIS lets you develop custom components that you can connect to unique sources or destinations or use to accomplish specific transformation tasks. You can develop a custom source component to connect to data sources that you can't access by using one of the existing source adapters or to consolidate parsing or script logic into the data extraction.
Microsoft has recently made a point of enhancing products by letting users and third parties extend the products with their own innovations and customizations. Although you could create custom components in DTS, it wasn't easy. Instead, most packages relied on extensive script tasks for complex operations. With SSIS, custom components are easier to create, drastically reducing the need for script and improving the performance and robustness of SSIS solutions.
Creating the Component
The example source component that we'll create parses a default IIS Web log into columns for an SSIS output buffer, allowing a package to input the log into a database, an Excel spreadsheet, or any destination. The default settings for an IIS Web log put the various field namesrequest time, requesting IP address, method, Uniform Resource Indicator (URI), and Status Codeon the fourth line of the file. If you've ever attempted to parse one of these Web logs by using a standard text adapter, you know that the format can be cumbersome; it contains space-delimited information and unpredictable information in the URI string. Although you can use other solutions to parse Web logs, this file type works well for our example.
You create a custom component by developing a .NET assembly that inherits the Microsoft.SqlServer.DTS.Pipeline.PipelineComponent base class. This base class defines the methods that SSIS will call in your component to drive the Data Flow task. Your custom component can safely override (or ignore) any methods it needs to accomplish its task. If you don't provide an override for a method, the default implementations in the PipelineComponent base class will handle the calls.
The PipelineComponent interface creates source, transformation, and destination components. Although they have different roles in the Data Flow, the components look similar; only the features they choose to implement differ. A source component has an output, a destination has an input, and a transform has both, with data-modifying logic between the input and output. SSIS asks only that you let it know which section of the toolbox the component belongs in.
Setup. Let's begin by creating a new Class Library project in Visual Studio 2005. The name you choose for the project will be the default name of the assembly you create, so choose a name that describes the component you're building. An unwritten convention is to use Src or Dest at the end of your component name to indicate its function. The project we create in this example is named IisLogFileSrc.
The next step is to add references to your project to tell Visual Studio where to find the SQL Server objects you'll be working with. Select the Add Reference item from the Visual Studio Project menu. On the .NET tab of the resulting dialog box, select four components: Microsoft.SqlServer.DTSPipelineWrap, Microsoft.SQLServer.DTSRuntimeWrap, Microsoft.SQLServer.ManagedDTS, and Microsoft.SqlServer.PipelineHost.
When you start your project, Visual Studio automatically creates an initial class for you (probably named Class1the name of the class isn't important). First, we need to tell the compiler which references we'll be using, which we do by employing the easy-to-remember using statement that the C# code in Listing 1 shows. We need to create using statements for the Pipeline and Runtime components we referenced above, and because we'll be reading from a file, we need to list System.IO as well.
We also need to slightly modify the AssemblyInfo file. As the name suggests, this file gives the compiler additional information about your assembly through attributes. By default, Visual Studio creates a new version of your component every time you build it. But because the registration of your component in SSIS is version-specific, we want to prevent this behavior, so we replace the default value of the AssemblyVersion attribute (1.0.×) with a specific version, such as 1.0.0.0.
Eventually, we want to add this custom source component to the Global Assembly Cache (GAC), so we need to give it a strong name. The easiest way to do this is by using the .NET Framework 2.0 SDK Strong Name Utility (sn.exe). The k option lets you create a keyfile that becomes part of your project and should be copied into your project folder. Use the AssemblyKeyFile attribute to specify the keyfile for the compiler.
Back in Class1, we next need to create a class attribute that tells the Business Intelligence Development Studio about our component. In Listing 1, the attribute contains minimal information: the name of the component and what type of component it is. You can supply additional information, including an icon that will appear in the Business Intelligence Development Studio toolbox.
Finally, Listing 1's code shows that we want our class to inherit from the PipelineComponent class. This relieves us of having to implement every method the interface requires because the base class will receive any calls we don't handle. The base class also lets Visual Studio help us out with IntelliSense, which automatically creates the signatures for methods that we choose to implement.
Discovery. Next, we begin adding to our class methods that will intercept calls from the SSIS Data Flow. First, we use the ProvideComponentProperties method that Listing 2 shows to describe the component to the SSIS package designer. Listing 2 shows a call from the designer to ask the component what inputs, outputs, connections, and other properties it needs to do its job. In the designer, SSIS adds to the component's property pages fields that let a user provide that information. Because our source component is simple, we need only a connection and an output.
Listing 2's code tells SSIS to forget what it already knows about the component. This safety precaution removes any previously added outputs or connections and prevents duplicate information if the component was previously initialized. The code then adds an output object to the output collection to tell SSIS that we intend to produce output. Because our example has only one output object, we can get away with naming it Output. To tell SSIS what our output will look like, the code adds columns to the output object to represent the default columns of the IIS log file we'll provide. For simplicity, this example uses strings for all fields.
Finally, the code tells SSIS that the component needs a connection. We want a file connection to the log we'll be reading but strangely, SSIS doesn't have a way to specify what type of connection the component requires. So we'll just name it File Connection. For a more robust implementation, we'd need to check the connection that the component receives later in the process to make sure that it's the correct type.
Start your engines. The second step in creating our component is to process the SSIS requests to acquire our connections before execution and to release the connections during cleanup after execution. As Listing 3 shows, we acquire the connection by accessing the connection manager and receiving the file connection we requested. The code includes a couple of lines that let it find the object we need, so we get a simple filename that a user has specified in the designer. We use the AcquireConnections method to open the file, and the ReleaseConnections method to close the file, storing the file handle in a private variable so that it's available for the execution step.
Prev. page  
[1]
2
next page