Subscribe to Windows IT Pro
November 22, 2006 12:00 AM

Using the RECOMPILE Query Hint to Solve Parameter Sniffing Problems

SQL Server Pro
InstantDoc ID #94369
Rating: (7)

Sometimes you find yourself using a tool designed to solve one problem
helpful in solving other problems as well. Recently I found by accident that
the new RECOMPILE query hint in SQL Server 2005 can be helpful in
solving parameter sniffing problems.

I’ll first describe the original intended use of the RECOMPILE query hint,
then I’ll describe the parameter sniffing problem and the traditional solutions
for the problem, and then I’ll explain how you can use the RECOMPILE
query hint to solve parameter sniffing problems.

RECOMPILE Query Hint

The RECOMPILE query hint was originally designed to give you more
granular control over what would be treated as a recompilation unit. Instead
of specifying the RECOMPILE procedure option, you can now specify a
RECOMPILE hint at the query level.

As to why you might want to have a procedure or query recompile every
time you run it, this has to do with situations where it might be inefficient to
reuse previously cached execution plans. Suppose you have a stored
procedure that accepts input parameters that are used in the procedure’s
queries. Suppose also that users invoking the stored procedure provide
inputs with varying selectivity such that for different inputs different plans are
considered optimal. In such a case, SQL Server’s default behavior of reusing
previously cashed execution plans might render bad performance; this
happens when the procedure ends up reusing a previously cached plan that is
not optimal for the current input parameter values. In SQL Server 2000 you
could specify the RECOMPILE option only at the procedure level:

CREATE PROC proc_name input_parameters WITH RECOMPILE AS …

This means that even if you have multiple queries within the stored
procedure, all of them get reoptimized every time you run the procedure. But
what if some of the queries in the stored procedure can benefit from reuse of
previously cached plans while the other queries in the procedure need to be
reoptimized every time you run the procedure? In SQL Server 2000 you
didn’t have this level of control; the only way to solve the problem was to
separate the queries that need to be reoptimized every time to a different
procedure created with the RECOMPILE option, and invoke that procedure
from the original procedure. SQL Server 2005 supports statement-level
recompilations (as opposed to supporting only batch-level recompilations in
SQL Server 2000). So now you can specify the RECOMPILE option as a
query hint and have only that query recompile. So suppose you have five
queries in a stored procedure, and two of them (say Query2 and Query5)
need to be reoptimized every time the procedure is run. You can specify the
RECOMPILE query hint only in those two queries, and allow the other three
queries to reuse previously cached plans:

CREATE PROC proc_name input_parameters AS 
Query1
Query2 OPTION(RECOMPILE)
Query3
Query4
Query5 OPTION(RECOMPILE)
GO

Parameter Sniffing Problem

As for the parameter sniffing problem; this problem has to do with the fact
that when SQL Server optimizes a query that refers to local variables (as
opposed to stored procedure’s parameters), the optimizer doesn’t know
what the values in the local variables are. So it has to use a hard coded
guess, which might render an inefficient plan. Take the following procedure
for example:

USE Northwind;
GO
IF OBJECT_ID('dbo.usp_getorders', 'P') IS NOT NULL
  DROP PROC dbo.usp_getorders;
GO
CREATE PROC dbo.usp_getorders
  @days_back AS INT
AS

DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;

SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt;
GO

Turn on the STATISTICS PROFILE option (or the graphical actual
execution plan), and run the procedure with the input value 0, meaning you
are after all orders placed 0 days back from the maximum date:

SET STATISTICS PROFILE ON;
EXEC dbo.usp_getorders @days_back = 0;

Ignore the execution plan produced for the first query in the procedure
(obtaining the maximum order date). Observe the execution plan produced
for the main query in the procedure (the second query). Here’s the output of
STATISTICS PROFILE (abbreviated to focus on the important parts):

Rows EstimatedRows StmtText
---- ------------- --------
4    249           |--Clustered Index Scan
                      (OBJECT:([PK_Orders]),
                       WHERE:([OrderDate]>=[@dt]))

The input parameter value is very selective so naturally you would expect the
optimizer to use the index created on the OrderDate column, but it doesn’t;
instead the optimizer opts for a table scan (Clustered Index Scan). The
reason is that at query optimization time the optimizer did know what the
value in the local variable @dt was, so it used a hard coded guess of 30
percent selectivity. Notice that the estimated number of rows was 249, and
249 divided by 830 (total number of rows in the table) is 30 percent. In
practice, the query was very selective; only 4 rows were returned, so
obviously you got a suboptimal plan (table scan) compared to using the index
on OrderDate.

OPTIMIZE FOR Hint

SQL Server 2005 introduces a new query hint called OPTIMIZE FOR that
allows you to specify a constant that you want the optimizer to use instead of
the variable when optimizing the query. For example, if you know that users
typically need to query recent orders, you can specify a selective value for
optimization like so:

ALTER PROC dbo.usp_getorders
  @days_back AS INT
AS

DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;

SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt
OPTION(OPTIMIZE FOR (@dt = '99991231'));
GO

EXEC dbo.usp_getorders @days_back = 0;

Rows EstimatedRows StmtText
---- ------------- --------
4    1             |--Nested Loops...
4    1               |--Index Seek
                        (OBJECT:([OrderDate]),
                         SEEK:([OrderDate] >= [@dt])
                         ORDERED FORWARD)
4    1               |--Clustered Index Seek
                        (OBJECT:([PK_Orders]),
                         SEEK:([OrderID]=[Orders].[OrderID])
                         LOOKUP ORDERED FORWARD)

As you can see, you got a good plan for a selective filter. The estimated
number of qualifying rows based on the specified input was 1, so the
optimizer naturally decided to use the index on OrderDate.

But what if different users can specify inputs with varying selectivity, such that
for invocations of the procedure with selective inputs the optimal plan would
be to use the index on OrderDate, and for invocations with non-selective
inputs the optimal plan would be to use a table scan? If you hadn’t used a
local variable rather simply referred directly to the stored procedure’s
parameter in the query the solution would have been simple; use the
RECOMPILE query hint. But you need to refer to a local variable in the
query, and there’s no one constant that is adequate for all scenarios that you
can specify in the OPTIMIZE FOR hint.

Using the RECOMPILE Query Hint to Solve Parameter Sniffing Problems

Now to the main point that I want to make…

Recently by accident I tried something that solved the problem. I specified
the RECOMPILE query hint in a query that referred to a local variable, and
got an optimal plan. First I was surprised; how does the optimizer know
what the value in the local variable was? But then it made perfect sense.
Apparently when you specify the RECOMPILE query hint, the optimizer IS
aware of the values of local variables in the query. The reason this makes
perfect sense is that SQL Server 2005 supports statement level
recompilations. With the RECOMPILE query hint, the query gets optimized
AFTER the rest of the code was already run, including the assignments of
values to variables.

To see this in action, alter the procedure’s definition to include the
RECOMPILE query hint in the query:

ALTER PROC dbo.usp_getorders
  @days_back AS INT
AS

DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;

SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt
OPTION(RECOMPILE);
GO

Run it with a selective input, and you will see in the plan that the index on
OrderDate is used:

EXEC dbo.usp_getorders @days_back = 0;

Rows EstimatedRows StmtText
---- ------------- --------
4    4             |--Nested Loops...
4    4               |--Index Seek
                        (OBJECT:([OrderDate]),
                         SEEK:([OrderDate] >= [@dt])
                         ORDERED FORWARD)
4    1               |--Clustered Index Seek
                        (OBJECT:([PK_Orders]),
                         SEEK:([OrderID]=[Orders].[OrderID])
                         LOOKUP ORDERED FORWARD)

Run it with a non-selective input, and you will get a table scan:

EXEC dbo.usp_getorders @days_back = 365;

Rows EstimatedRows StmtText
---- ------------- --------
550  549.3333      |--Clustered Index Scan
                      (OBJECT:([PK_Orders]),
                       WHERE:([OrderDate]>=[@dt]))

You can also observe that the optimizer makes much better selectivity
estimations; hence you get an optimal plan for each case.

Conclusion

I hope that your conclusion from these findings will not be to always use the
RECOMPILE query hint to solve parameter sniffing problems. When a
query in your stored procedure refers to a local variable, and you know that
the variable will have similar selectivity in different invocations of the
procedure, it’s better to use the OPTIMIZE FOR query hint; this way you
can benefit from reusing previously cached plans. But if with each invocation
of the procedure the local variable ends up with different selectivity, and the
query can benefit from reoptimization upon each invocation, now you have a
solution; you can use the RECOMPILE query hint and enjoy optimal plans
even though the query refers to local variables, at the cost of recompilations.

--
BG

Related Content:

ARTICLE TOOLS

Comments
  • Joe
    6 years ago
    Dec 07, 2006

    As an aside, other SQL products handle run time optimization of procedures differently.

    DB2 stores multiple execution plans (the number 16 sticks in my mind, but I am not sure) so that it can look at the parameters and pick the best one. As an example, if a parameter is NULL-able and the code includes (foo = COALESCE(my_parm, foo) we can sniff that (foo IS NULL) and drop the predicate from the execution plan.

    Sybase (nee WATCOM SQL) allows you to give an expectation of a predicate with a notation like (sex = 'male' 88) which means that 88% of the data tests TRUE for that predicate. The optimize then adjusts its plan with that in mind, but it is not compelled by the hint.

    Oracle hints compel action, whcih makes them dangerous over time as the stats change.

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.