Subscribe to Windows IT Pro

 

Get Newsletters

  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips

Subscribe Now!

March 25, 2004 12:00 AM

UDFs Endanger Performance

SQL Server Pro
InstantDoc ID #42139
Rating: (2)

Three of my last five performance-tuning clients faced problems associated with user-defined functions (UDFs). As SQL Server 2000's customer base matures and becomes more comfortable with the product's advanced features, more people are using UDFs without recognizing the problems they might cause. Many customers operate under a false sense of security regarding UDF I/O efficiency because Query Analyzer's SET SHOW_STATISTICS I/O option doesn't report I/Os associated with a UDF. Last May, I wrote about UDFs' insidious row-by-row nature, calling them cursors in sheep's clothing (see "Beware Row-by-Row Operations in UDF Clothing"). This week, I want to revisit the topic and raise awareness about potential UDF performance problems.

Many of you know that ANSI T-SQL cursors are evil and must be avoided at all costs (unless writing very slow and inefficient T-SQL code is your primary goal). You know that cursors are row-by-row operations, as opposed to efficient, set-based operations. However, few people understand the subtle ways that UDFs can cause a set-based operation to act like a row-by-row operation. Let's look at a simple example to illustrate the problem. Imagine you have an Employee table with 100,000 rows, a Department table with 50 distinct values, and a ranking system that assigns an employee annual-review grade derived from data stored in other database tables. Your boss wants a query that will return the average annual-review grade—avg(AnnualReviewGrade)—for each department. Writing the query would be simple if AnnualReviewGrade were a column in the table, but it isn't. So your lead developer writes a UDF called GetAnnualReviewGrade that accepts an EmployeeId and returns the grade.

Let's think through the UDF's row-by-row implications. Say that SQL Server can process the UDF in a modest 15 logical I/Os. The query will execute the UDF once for each row that needs to be evaluated—in this case, once for each employee (100,000 times total). That means the UDF alone adds 1.5 million logical I/Os to the query's processing cost. Now, the UDF looks expensive. I've seen conceptually similar cases in which a query's processing time dropped from 15-20 seconds to less than 500ms by replacing complex UDFs with join processing. True, the queries became more complex and the clients had to code business logic in more than one place, but dropping 15-20 seconds off a query's execution time might be worth the effort.

The UDF performance problem is obvious when laid out in an example like this. However, real-world problems are typically much more difficult to spot, and you usually catch them when moving from development to production. The UDF that worked great for a 1,000-row result set in development might become a performance pig on a 1 million-row production result set. Replacing UDF logic with joins (and other set-based techniques) after the fact can be difficult and costly if the development team used UDFs extensively.

Compounding the problem, Query Analyzer doesn't report I/O from the UDF as part of the query cost when DBCC SHOW_STATISTICS I/O is enabled. You can test this assertion by running a query with and without a UDF in the SELECT clause. You'll see that the I/O that DBCC SHOW_STATISTICS I/O reports doesn't change when you add or remove the UDF. This omission leads developers to underestimate the query cost. For the record, SQL Server Profiler does capture I/Os associated with a UDF.

UDFs aren't always bad. UDFs are powerful T-SQL tools that I use regularly when I understand the performance implications. However, generally you should avoid using UDFs in a SELECT clause that returns a large number of rows. Also, I rarely use a UDF that accesses a table directly within the UDF. Chain saws are powerful tools and perfect for certain jobs, but you can do serious damage if you're not careful. The same goes for UDF usage. UDFs might seem like a convenient and simple way to write set-based T-SQL code, but if you're not careful, you'll open an expensive, row-by-row can of worms.

Related Content:

ARTICLE TOOLS

Comments
  • Brian
    8 years ago
    Mar 25, 2004

    Well, certainly a developer would consider using a UDF for more worldly reasons than just performance. I mean, the whole point of a procedure or method is to encapsulate a piece of functionality once to be reused multiple times. In that respect, the benefit of a UDF can outweigh the performance issues. Duplicating the same piece of logic in query after query is NOT an agile approach to development on any tier. If you have a complex piece of logic in an application, then you shouldn't be sticking it in the data tier in the first place. That will typically go in the domain/business layer. In the scearios where I've had to use a UDF on larger resultsets (100,000 - 300,000) rows, they have had an unnoticable peformance impact.

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

White Papers

Get your Windows 7 deployment off to the right start by implementing PC lockdown. A locked-down environment is easier and cheaper to support since users are less likely to make unnecessary changes to the core system configuration - read more here!

Essential Guides

Is your iSCSI "lossy"? The reality is that most off-the-shelf Ethernet hardware deployed for iSCSI can lose packets, resulting in slow performance or application downtime. Learn how to assess your current iSCSI infrastructure and engineer an advanced iSCSI SAN infrastructure.

Web Seminars

What's the best way to keep your network safe from malware? In this web seminar, security expert Greg Shields suggests an alternative method to the traditional blacklisting approach that is common with anti-virus and anti-malware solutions.

eLearning Series

We bring the experts direct to you to share their real-world perspective and expertise. During each event, three sessions stream in real time, so you can learn, ask questions, and get solutions.
Upcoming event: Getting the Most with Exchange 2010 with Paul Robichaux

Subscribe to Windows IT Pro!

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.