Last month I started a two-part series about SQL Server 2005’s APPLY operator. I covered the fundamentals of APPLY, explaining the three modes in which you can use it: CROSS APPLY, OUTER APPLY, and implicit APPLY. Now that I’ve covered the fundamentals, this month I discuss more advanced uses for APPLY, including efficient parallelism of queries, reuse of column aliases, aggregate over columns, unpivoting, and inlining scalar functions.
In the examples in this article, I use sample databases called InsideTSQL2008 and Performance that I originally created for my books. To create and populate those databases, go and download the source code for the book Inside Microsoft SQL Server 2008: T-SQL Querying (Microsoft Press, 2009). Use the file InsideTSQL2008 to create and populate the sample database InsideTSQL2008. Use the first part in the source code file for Chapter 4 (Query Tuning) to create and populate the Performance database.
Efficient Parallelism of Queries
The APPLY operator can be used to address a wide variety of problems—typically, problems that involve a partitioning element (e.g., an employee, a customer, a shipper). One interesting use of APPLY is to utilize parallelism efficiently in a certain class of problems. I initially learned about this technique from Adam Machanic, a SQL Server MVP.
To demonstrate the technique, let’s use the Performance sample database. Run the following code to create an index on the Orders table:
USE Performance
CREATE INDEX idx1 ON dbo.Orders(empid, orderid);
Suppose that you need to calculate for each row in the Orders table two row numbers: one partitioned by empid and ordered by orderid and another partitioned by empid and ordered by orderid DESC (or any other order that’s different from the first function’s order). You can use the following query to achieve this task:
SELECT empid, orderid,
ROW_NUMBER() OVER(PARTITION BY empid
ORDER BY orderid) AS rownum_asc,
ROW_NUMBER() OVER(PARTITION BY empid
ORDER BY orderid DESC) AS rownum_desc
FROM dbo.Orders; Recall that earlier you created an index called idx1 on (empid, orderid). The tricky part here is that the two functions have different ordering specifications, and in such a case, SQL Server can rely on an index ordering to support only one of them. For the other function, SQL Server has to sort the rows. Figure 1 shows the execution plan for this query.

Figure 1: Plan for two window functions without APPLY
SQL Server uses parallelism to sort one of the functions, but it sorts by both the partitioning element (empid) and the ordering element (orderid DESC). I ran this query on a machine with eight logical CPUs (if you have fewer logical CPUs, you can mimic a similar machine for test purposes by using the startup parameter -P8), and it took it 3 seconds to finish with hot cache and results discarded.
A more efficient strategy and utilization of parallelism is to filter each partition’s rows through the index and apply a sort only to one partition’s rows at a time. To achieve this, you can use the APPLY operator, like so:
SELECT A.*
FROM dbo.Employees AS E
CROSS APPLY
(SELECT empid, orderid,
ROW_NUMBER() OVER(ORDER BY orderid) AS rownum_asc,
ROW_NUMBER() OVER(ORDER BY orderid DESC) AS rownum_desc
FROM dbo.Orders AS O
WHERE O.empid = E.empid) AS A; Figure 2 shows the execution plan for this query.

Figure 2: Plan for two window functions with APPLY
Observe that the plan scans the employees clustered index first, and then for each employee, performs a seek in the index on Orders to filter the current employee’s rows. Then the sort is applied only to one employee’s rows. You can also see here the use of an optimization technique called few outer rows, in which the optimizer realizes that a small number of rows are returned from the outer part of the Nested Loops join and uses a Repartition Streams exchange operator to evenly distribute the rows to the different threads that then handle the work in the inner part of the join. This query finished in 1 second on the same machine—one third of the runtime of the previous query.
When you’re done, run the following code for cleanup:
DROP INDEX idx1 ON dbo.Orders;
Reuse of Column Aliases
Suppose that you have a query with lengthy expressions that you need to repeat more than once. For example, you need to write a query against the Sales.Orders table in the InsideTSQL2008 database, and in this query filter only orders that were placed prior to the last day of the month, and return for each order the respective beginning-of-month and end-of-month dates. It would be nice if you could alias the expression that calculates the beginning-of-month date (call it beginning_of_month), then use that alias in the expression that calculates the end-of-month date (call it end_of_month), and then use that alias in the query filter. Making the following attempt
USE InsideTSQL2008;
SELECT orderid, orderdate, custid, empid,
CAST(CONVERT(CHAR(6), orderdate, 112) + '01' AS DATETIME)
AS beginning_of_month,
DATEADD(day, -1, DATEADD(month, 1, beginning_of_month))
AS end_of_month
FROM Sales.Orders
WHERE orderdate < end_of_month; generates the error message that Figure 3 shows.

Figure 3: Error message from trying to reuse column aliases