Suppose that the Acme manufacturing company
uses a mechanism such as OLE for
Process Control (OPC) to store records in a
database table. One column in the records contains
a set of comma-delimited values that tells us everything
we need to know about a production line’s
operation during a given hour. We need to split up
the column values and place one of them in a separate
column so that we can take some further action
on it. This capability is built into most modern-day
programming languages but not into T-SQL.
The most common solution to this problem is
to create a user-defined function (UDF) that parses
the individual
values in T-SQL’s
set-based queryexecution
engine.
SQL Server 2005
Common Language Runtime (CLR) integration
gives us a better alternative: employing the .NET
Framework’s Split() method in a UDF as a computed
column expression.
Since SQL Server 7.0, computed columns have
let us populate a field in a table with the result of
an expression. In SQL Server 2000, we gained the
ability to create an index based on computed columns.
In SQL Server 2005, computed columns fully
support creating and updating statistics and can now
be based on a managed, scalar-valued function. Let’s
see how we can use CLR integration and a UDF to
create the computed column that Acme needs.
Why Use CLR?
SQL Server 2005 CLR integration allows developers
to create managed database modules, including
stored procedures, functions, triggers, aggregates,
and types. Without question, the module that stands
to benefit the most when implemented as managed
code is the UDF because UDFs typically contain
some of the most compute-intensive algorithms in
the database and because UDFs are executed on a
row-by-row basis. UDFs take advantage of the two
main benefits afforded by CLR integration: access
to the .NET Framework’s rich Base Class Library
(BCL) and just-in-time (JIT) compilation.
Access to the .NET Framework BCL gives us
access to several of the BCL’s namespaces, including
those in Figure 1. These include the root namespace
System, which has the base CLR types, including
the System.String class and its associated Split()
method. JIT-compiled code provides performance
benefits over T-SQL code. When you’re writing database
routines that are processor-intensive and that
perform little or no data access, managed code will
clearly perform better than its T-SQL counterpart.
For more information about when to use managed
code, see “CLR or Not CLR: Is That the Question?”
April 2006, InstantDoc ID 49429.
Creating the
ProductionHistory Table
Our first step toward demonstrating how
managed code can split a column into its
separate values is to create the database
and table that will contain the column. The
ProductionHistory table will store hourly
snapshots of Acme’s various productionline
information, as mentioned above. For
brevity’s sake, we won’t create related tables
as we would in the real world. Listing 1 shows
the T-SQL script to enter in SQL Server Management
Studio to create AcmeProductionDB and
ProductionHistory.
We also need to insert a few dummy records that
use a fictional convention for the ProductionValues
column. This convention includes values for the
hour start, hour end, units produced, and defects
produced. Based on this standard, a typical entry in this column would look like this: '01,02,100,3'.
Listing 2 shows the scripts for populating the table.
Creating the UDF
To create a new managed UDF, we create a new
database project in either Visual Basic or Visual C#
(Visual C#, in this example). Start up Visual Studio
2005, and select File, New, Project from the editor’s
main menu. (You’ll need the Professional, Tools for
Office, or Team System edition of Visual Studio
2005 to create CLR integration projects.)
In the Project types pane of the New Project window,
expand the Visual C# node, and select the Database
project. In the Name field, enter AcmeProductionDB_
ParsingRoutines as the name for our new C# database
project.
Once you’ve done this, you’ll see the Add
Database Reference dialog box. Click Add New
Reference, and you’ll see the New Database Reference
dialog box. Enter SQL Server 2005’s instance
information, specify our new database (AcmeProductionDB),
and click OK. The Add Database
Reference dialog box now shows our new database
reference. Make sure the new reference is selected,
and click OK. Our new CLR Integration project has
now been created and set up.
The next step is to use Visual Studio’s
Solution Explorer pane to add a new database
UDF. Solution Explorer shows you
two levels of the AcmeProductionDB_
ParsingRoutines project: the project itself, plus its
properties, references, and test scripts. Right-click
the project in Solution Explorer and select User-
Defined Function from the Add menu. Set the
source file’s name in the resulting dialog box to
udf_ParseProductionHistoryValues.cs. By default,
this name will also be given to the initial function
(which will be a public static method).
At this point, you should see the UDF template
code in Visual Studio. We need to replace this template
code with our own custom code that parses the
ProductionValues column. Listing 3 shows this code.
As you can see, the UDF has two input parameters,
one for the ProductionValues content and a second
for specifying the actual value to be returned. The
UDF also sets the optional parameter IsDeterministic
to true, which affords us the option of creating
an index on the computed column that will reference
this UDF (the UnitsProduced column).
Deploying the UDF
We must now deploy our new UDF to our target
SQL Server instance. The target SQL Server instance
is the database reference we added in our Visual
Studio project. To deploy the new UDF, right-click
the project in Solution Explorer and select Deploy
as Figure 2 shows.
Note: To change your project’s current database
reference (target SQL Server instance), right-click
the project in Solution Explorer, and select Properties.
Go to the Database tab and click Browse to
open the Add Database Reference dialog box I
described earlier.
Altering the
ProductionHistory Table
We now must alter our table’s schema to set the
formula for the UnitsProduced column to reference
our new UDF. Before doing so, though, let’s confirm
that the new UDF was deployed to the target database.
Start SQL Server Management Studio, and
select View, Object Explorer. In the Object Explorer
pane, navigate to AcmeProductionDB\Programmability Functions\Scalar-valued Functions and
look for udf_ParseProductionHistoryValues under
that node.
We still need to modify the ProductionHistory
table. As with most operations in SQL Server, we
can do this via T-SQL or graphically by using
Management Studio. We generated the script from
Management Studio, so all we need to do is execute
the script in Listing 4 by using a query pane in Management
Studio.
Confirming UDF Use
To confirm that the UnitsProduced column is using
udf_ParseProductionHistoryValues correctly, we
need to issue a Select statement to the database
engine referencing the ProductionHistory table.
Figure 3 shows this Select statement and its result.
We have now created and deployed a managed
computed column and confirmed that it’s working
as it should.
Optionally, you can create an index based on the
UnitsProduced column. You must adhere to several "rules” when creating
indexes based on computed
columns. I won’t
go into detail on those
rules here; if you wish to
learn more about them,
please see the topic
“Creating Indexes On
Computed Columns” in
SQL Server 2005 Books
Online. Listing 5 shows
the T-SQL script for
creating a basic nonclustered
index on the
UnitsProduced column.
By leveraging managed
functions in computed
columns, you obtain the benefits of CLR
integration in a computed column context. Among
these benefits are a richer programming model
than that offered by T-SQL and the possibility for
improved performance over that of T-SQL code.
The performance improvement can be significant
because computed columns tend to contain complex algorithms and they’re calculated for every row.
I used a simple example in this article so that you
could focus on how managed computed columns
work without being distracted by external logic. If
you wish to learn more about SQL Server 2005’s
CLR integration, please see my recent book Professional
SQL Server 2005 CLR Programming: with Stored Procedures, Functions, Triggers, Aggregates,
and Types (Wrox Press).