This article is the last in a three-part series about SQL Server 2012's enhanced window functions. In the first two articles in the series, I focused
on the logical aspects of window functions. I described window aggregate, offset, and distribution functions. In this final article, I focus on
optimization of window functions. I provide indexing guidelines for optimal performance, and I describe cases that fall under the "fast-track"
category, with especially optimal treatment; cases that compute the difference between two cumulative values; cases that expand all frame rows; and
cases in which the optimizer can use an in-memory spool versus an on-disk spool.
As a reminder, you need to use SQL Server 2012 (CTP3 or later) to run the sample code from this series. Click the link to download SQL Server 2012 from .
For the purposes of this article, I use bank account transactions as sample data. To create the sample data, you need a helper table called GetNums
that accepts two integers as inputs and returns a sequence of integers in the requested range. Run the code in Listing 1 to create the GetNums function
in the tempdb database (for test purposes). Run the code in Listing 2 to create the Accounts and Transactions tables and fill them with sample data
(100 accounts ´ 20,000 transactions in each account).
Let's start our discussion of window function optimization with indexing guidelines.
Indexing Guidelines
To compute the enhanced aggregate window functions, the optimizer needs the rows sorted first by the window partitioning elements, then by the window
ordering elements. Also, the index needs to include in its leaf level all the rest of the columns in the query for coverage purposes. If the rows are
ordered as needed and covered by an index, the optimizer can scan the index in order, use a Segment operator to handle one partition at a time, a
Window Spool operator to expand the applicable frame rows per source row, and a Stream Aggregate operator to perform the actual aggregate calculations.
A simple way to remember what kind of index you need to create is to use the acronym POC for Partitioning, Ordering, and Coverage. The P and O columns
need to be defined as the index key elements. The C columns need to be included in the index leaf, using an INCLUDE clause in a nonclustered index. (If
the index is clustered, everything is implicitly covered.) In the absence of a POC index, the optimizer must sort the rows based on the P and O
elements -- and sorting a large set can be expensive.
As an example, supposed that you need to compute bank account balances at each transaction point. This task can be achieved with a running sum of the
transaction values, partitioned by the account ID and ordered by the transaction ID. A POC index already exists; that's the clustered index defined by
the primary key constraint PK_Transactions in Listing 2 as part of the definition of the Transactions table.
Here's the query computing the bank account balances:
SELECT actid, tranid, val,
SUM(val) OVER(PARTITION BY actid
ORDER BY tranid
ROWS UNBOUNDED PRECEDING) AS runval
FROM dbo.Transactions;
Figure 1 shows the plan for this query. Observe in the plan that the clustered index PK_Transactions is scanned in an ordered fashion, and therefore no
sorting is required.

Figure 1: Plan with POC index
To contrast, the following query computes a window function that doesn't have a POC index to support it:
SELECT actid, tranid, val,
SUM(val) OVER(PARTITION BY actid
ORDER BY val
ROWS UNBOUNDED PRECEDING) AS runval
FROM dbo.Transactions;
Figure 2 shows the plan for this query. Observe that in this plan the optimizer added a Sort operator that sorts the data by the P (actid) and O (val)
elements.

Figure 2: Plan without POC index
Fast Track
Conceptually, for each row in the underlying query, a window function defines a frame of rows within the current window partition and applies the
calculation against this frame to produce a scalar outcome. For example, consider a frame such as ROWS BETWEEN 11 PRECEDING AND CURRENT ROW. For each
row, the window frame will have up to 12 applicable rows. SQL Server 2012 introduces a Window Spool operator that's responsible for expanding for each
row the applicable frame of rows; then, a Stream Aggregate operator aggregates those rows to produce the scalar result. In the worst-case scenario, the
Window Spool operator expands all frame rows. I discuss such cases later in the article. However, the optimizer can identify special cases and avoid
such complete expansion.
The best-case scenario is what I refer to as the fast-track case. This case is applicable when the top (i.e., first) delimiter of the frame is
UNBOUNDED PRECEDING. With such a top delimiter, conceptually, the number of rows per frame should keep increasing by one for each underlying row; so
the total number of rows in all frames in the partition increases in a quadratic manner with respect to the number of underlying rows. But that's only
conceptually. In practice, the optimizer keeps just two rows in each frame -- one with the aggregated values so far (apply the calculation between the
previously aggregated values and the current value) and the current row (needed to return the detail elements). That's a huge savings!
Here's an example of a query that falls under the fast-track category because it uses UNBOUNDED PRECEDING as the top delimiter of the frame:
SELECT actid, tranid, val,
SUM(val) OVER(PARTITION BY actid
ORDER BY tranid
ROWS UNBOUNDED PRECEDING) AS runval
FROM dbo.Transactions;
This query ran for 7 seconds on my system. Figure 3 shows the execution plan for the query.

Figure 3: Plan for fast-track case