Sometimes you find yourself using a tool designed to solve
one problem
helpful in solving other problems as well. Recently I found by
accident that
the new RECOMPILE query hint in SQL Server 2005 can be helpful in
solving parameter sniffing problems.
I’ll first describe the original intended use of the
RECOMPILE query hint,
then I’ll describe the parameter sniffing problem and the
traditional solutions
for the problem, and then I’ll explain how you can use the
RECOMPILE
query hint to solve parameter sniffing problems.
RECOMPILE Query Hint
The RECOMPILE query hint was originally designed to give
you more
granular control over what would be treated as a recompilation unit.
Instead
of specifying the RECOMPILE procedure option, you can now specify a
RECOMPILE hint at the query level.
As to why you might want to have a procedure or query
recompile every
time you run it, this has to do with situations where it might
be inefficient to
reuse previously cached execution plans. Suppose you have a
stored
procedure that accepts input parameters that are used in the procedure’s
queries. Suppose also that users invoking the stored procedure provide
inputs
with varying selectivity such that for different inputs different plans are
considered optimal. In such a case, SQL Server’s default behavior of reusing
previously cashed execution plans might render bad performance; this
happens
when the procedure ends up reusing a previously cached plan that is
not optimal
for the current input parameter values. In SQL Server 2000 you
could specify the
RECOMPILE option only at the procedure level:
CREATE PROC proc_name input_parameters WITH RECOMPILE AS …
This means that even if you have multiple queries within
the stored
procedure, all of them get reoptimized every time you run the
procedure. But
what if some of the queries in the stored procedure can benefit
from reuse of
previously cached plans while the other queries in the procedure
need to be
reoptimized every time you run the procedure? In SQL Server 2000 you
didn’t have this level of control; the only way to solve the problem was to
separate the queries that need to be reoptimized every time to a different
procedure created with the RECOMPILE option, and invoke that procedure
from the
original procedure. SQL Server 2005 supports statement-level
recompilations (as
opposed to supporting only batch-level recompilations in
SQL Server 2000). So
now you can specify the RECOMPILE option as a
query hint and have only that
query recompile. So suppose you have five
queries in a stored procedure, and two
of them (say Query2 and Query5)
need to be reoptimized every time the procedure
is run. You can specify the
RECOMPILE query hint only in those two queries, and
allow the other three
queries to reuse previously cached plans:
CREATE PROC proc_name input_parameters AS
Query1
Query2 OPTION(RECOMPILE)
Query3
Query4
Query5 OPTION(RECOMPILE)
GO
Parameter Sniffing Problem
As for the parameter sniffing problem; this problem has to
do with the fact
that when SQL Server optimizes a query that refers to local
variables (as
opposed to stored procedure’s parameters), the optimizer doesn’t
know
what the values in the local variables are. So it has to use a hard coded
guess, which might render an inefficient plan. Take the following procedure
for
example:
USE Northwind;
GO
IF OBJECT_ID('dbo.usp_getorders', 'P') IS NOT NULL
DROP PROC dbo.usp_getorders;
GO
CREATE PROC dbo.usp_getorders
@days_back AS INT
AS
DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;
SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt;
GO
Turn on the STATISTICS PROFILE option (or the graphical
actual
execution plan), and run the procedure with the input value 0, meaning
you
are after all orders placed 0 days back from the maximum date:
SET STATISTICS PROFILE ON;
EXEC dbo.usp_getorders @days_back = 0;
Ignore the execution plan produced for the first query in
the procedure
(obtaining the maximum order date). Observe the execution plan
produced
for the main query in the procedure (the second query). Here’s the
output of
STATISTICS PROFILE (abbreviated to focus on the important parts):
Rows EstimatedRows StmtText
---- ------------- --------
4 249 |--Clustered Index Scan
(OBJECT:([PK_Orders]),
WHERE:([OrderDate]>=[@dt]))
The input parameter value is very selective so naturally you would expect the
optimizer to use the index created on the OrderDate column,
but it doesn’t;
instead the optimizer opts for a table scan (Clustered Index
Scan). The
reason is that at query optimization time the optimizer did know what
the
value in the local variable @dt was, so it used a hard coded guess of 30
percent selectivity. Notice that the estimated number of rows was 249, and
249
divided by 830 (total number of rows in the table) is 30 percent. In
practice,
the query was very selective; only 4 rows were returned, so
obviously you got a
suboptimal plan (table scan) compared to using the index
on OrderDate.
OPTIMIZE FOR Hint
SQL Server 2005 introduces a new query hint called OPTIMIZE
FOR that
allows you to specify a constant that you want the optimizer to use
instead of
the variable when optimizing the query. For example, if you know that
users
typically need to query recent orders, you can specify a selective value
for
optimization like so:
ALTER PROC dbo.usp_getorders
@days_back AS INT
AS
DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;
SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt
OPTION(OPTIMIZE FOR (@dt = '99991231'));
GO
EXEC dbo.usp_getorders @days_back = 0;
Rows EstimatedRows StmtText
---- ------------- --------
4 1 |--Nested Loops...
4 1 |--Index Seek
(OBJECT:([OrderDate]),
SEEK:([OrderDate] >= [@dt])
ORDERED FORWARD)
4 1 |--Clustered Index Seek
(OBJECT:([PK_Orders]),
SEEK:([OrderID]=[Orders].[OrderID])
LOOKUP ORDERED FORWARD)
As you can see, you got a good plan for a selective filter.
The estimated
number of qualifying rows based on the specified input was 1, so
the
optimizer naturally decided to use the index on OrderDate.
But what if different users can specify inputs with varying
selectivity, such that
for invocations of the procedure with selective inputs
the optimal plan would
be to use the index on OrderDate, and for invocations
with non-selective
inputs the optimal plan would be to use a table scan? If you
hadn’t used a
local variable rather simply referred directly to the stored
procedure’s
parameter in the query the solution would have been simple; use the
RECOMPILE query hint. But you need to refer to a local variable in the
query,
and there’s no one constant that is adequate for all scenarios that you
can
specify in the OPTIMIZE FOR hint.
Using the RECOMPILE Query Hint to Solve Parameter
Sniffing Problems
Now to the main point that I want to make…
Recently by accident I tried something that solved the
problem. I specified
the RECOMPILE query hint in a query that referred to a
local variable, and
got an optimal plan. First I was surprised; how does the
optimizer know
what the value in the local variable was? But then it made
perfect sense.
Apparently when you specify the RECOMPILE query hint, the
optimizer IS
aware of the values of local variables in the query. The reason
this makes
perfect sense is that SQL Server 2005 supports statement level
recompilations. With the RECOMPILE query hint, the query gets optimized
AFTER
the rest of the code was already run, including the assignments of
values to
variables.
To see this in action, alter the procedure’s definition to
include the
RECOMPILE query hint in the query:
ALTER PROC dbo.usp_getorders
@days_back AS INT
AS
DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;
SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt
OPTION(RECOMPILE);
GO
Run it with a selective input, and you will see in the plan
that the index on
OrderDate is used:
EXEC dbo.usp_getorders @days_back = 0;
Rows EstimatedRows StmtText
---- ------------- --------
4 4 |--Nested Loops...
4 4 |--Index Seek
(OBJECT:([OrderDate]),
SEEK:([OrderDate] >= [@dt])
ORDERED FORWARD)
4 1 |--Clustered Index Seek
(OBJECT:([PK_Orders]),
SEEK:([OrderID]=[Orders].[OrderID])
LOOKUP ORDERED FORWARD)
Run it with a non-selective input, and you will get a table
scan:
EXEC dbo.usp_getorders @days_back = 365;
Rows EstimatedRows StmtText
---- ------------- --------
550 549.3333 |--Clustered Index Scan
(OBJECT:([PK_Orders]),
WHERE:([OrderDate]>=[@dt]))
You can also observe that the optimizer makes much better
selectivity
estimations; hence you get an optimal plan for each case.
Conclusion
I hope that your conclusion from these findings will not be
to always use the
RECOMPILE query hint to solve parameter sniffing problems.
When a
query in your stored procedure refers to a local variable, and you know
that
the variable will have similar selectivity in different invocations of the
procedure, it’s better to use the OPTIMIZE FOR query hint; this way you
can
benefit from reusing previously cached plans. But if with each invocation
of the
procedure the local variable ends up with different selectivity, and the
query
can benefit from reoptimization upon each invocation, now you have a
solution;
you can use the RECOMPILE query hint and enjoy optimal plans
even though the
query refers to local variables, at the cost of recompilations.
--
BG