DOWNLOAD THE CODE:
Download the Code 22431.zip

Forget cursors and get with set-based programming

Editor's Note: Send your experts-only T-SQL tips to Itzik Ben-Gan at blackbelt@sqlmag.com. If we use your tip in the magazine, you'll receive $100 and an exclusive T-SQL Black Belt shirt.

Programmers who have experience in sequential file processing, for example with COBOL, can encounter difficulty when they start programming in a relational environment such as SQL Server. Such programmers often try to solve T-SQL problems by using cursors, similar to the way they might program in their previous environments—working with one record at a time in sequential order. But this approach doesn't exploit the full power of a relational database engine, which is optimized for nonsequential set-based queries (i.e., queries that manipulate sets of rows nonsequentially). Furthermore, cursor-based solutions usually require considerably more code than their equivalent set-based solutions. If a cursor is the first thing that you think of when you face a T-SQL problem, you have some unlearning to do—you need to start thinking "set-based."

Let's look at a problem that seems to require a cursor-based solution but in fact has several set-based solutions. As I discuss the various set-based solutions (most of which SQL Server MVP Umachandar Jayachandran provided), I compare their performance. The example company I use tracks customer contract information in a table called Contracts. This information includes the customer ID, the date the contract was signed, and the contract's monetary amount. You can run the script that Listing 1 shows to create and populate the Contracts table.

Say you need to write a query that returns all the details of the latest contract (the contract with the most recent date) for each customer and includes the amount of the previous contract in the same result row. Usually, a good approach to solving a T-SQL problem is to break it into steps. For example, the first step here might be to return all the contracts and, for each contract, return the amount of the previous contract. The second step might be to filter the results of the first step so that only the latest contract for each customer remains.

When you're dealing with set-based queries, the term "previous" isn't always well defined. So you need to define a logical expression that determines which is the "previous" contract for a customer. For a given contract, you can define "the previous contract" as "the most recent contract for the same customer, having a contract date earlier than the current contract date." Listing 2 shows a first attempt at implementing this first step.

The code in Listing 2 retrieves all contracts from an instance of the Contracts table called C1. The tricky part here is obtaining the amount of the previous contract for a customer. To obtain this amount, the code in Listing 2 uses a correlated subquery in the SELECT list. The correlated subquery is two levels deep. The first subquery retrieves the amount from another instance of the Contracts table, C2, using the same customer ID as in the outer query (C1). The code then filters the result of the subquery by requesting only those rows in which the contract date is equal to "the previous contract date," a result I obtained by nesting another level with another subquery. The deepest subquery refers to another instance of the Contracts table, C3. This subquery returns the maximum (most recent) contract date for the same customer as in the first subquery, along with a contract date that's earlier than the contract date in the outer query (C1). If the outer query's contract is the first contract for that customer, the subquery returns NULL because no previous contract exists.

Table 1 shows the output of the query from Listing 2. Note that the row order in the output isn't guaranteed. If you want to guarantee that your output is sorted by custid and cdate, you need to add an appropriate ORDER BY clause.

Before moving on to the next step, let's take another look at the innermost subquery in Listing 2. Currently, that subquery finds the appropriate customer ID by using the following filter:

WHERE C3.custid = C2.custid

If you revise this filter to

WHERE C3.custid = C1.custid

the query returns the same results but with considerably less I/O cost. In many cases, SQL Server's optimizer generates the same execution plan for queries that are written differently but that logically perform the same task. Here, the optimizer produced different execution plans for the original query in Listing 2 and the query with the above revision, with the latter performing much better. As a test, issue the following command:

SET STATISTICS IO ON

then run both the code in Listing 2 and the code in Listing 3, which contains the revised query. With SQL Server 2000 Service Pack 1 (SP1), STATISTICS IO shows 112 logical reads for the query in Listing 2 but only 28 logical reads for the query in Listing 3. Even a small change can drastically affect a query's performance. The conclusion from this test should be that when you aim for better performing code, you should always test variations of queries that produce the same results and compare their performance.

The second step is to filter out everything except the rows containing the latest contract for each customer. You can achieve this goal by adding a WHERE clause containing another subquery that checks whether the contract date is equal to the maximum contract date for the customer. Listing 4 shows the code that implements Step 2. Note that the query in Listing 4 is ANSI-compliant.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I recently changed a T-SQL routine that used cursors to one that used five SQL statements, using the set-based concept. I reduced my runtime from 27 hours to 30 minutes. One of my cursors had 5 million rows, which were being compared to 4 million table records. Though the table was indexed, I looped through my cursor one record at a time and performed my comparisons (I'm from the old school). But now, cursors and I have split the sheets. - BW

Dallasmax