Solving the Problems
The trail of clues I found—table and database-file fragmentation and page-density inefficiencies—led me to identify three issues that I needed to address to solve my database-performance problem. First, the largest, most heavily used table needed an index to support a foreign-key constraint. This particular problem wasn't really caused by fragmentation but instead by the table scans of the 2.5GB foreign-key table during updates of the primary-key table. The scans were causing so much disk I/O that it was impossible to tell whether other disk problems existed. Adding the index drastically decreased update times and cut disk I/O by about 80 percent during updates. To address the remaining two issues, I wrote the uspDefragTables stored procedure, which I'll discuss in more detail shortly.
The second issue I discovered was that the table was a heap, which means that its data was stored in the order in which it was inserted. This condition by itself wasn't detrimental; the table's pages were only a little fragmented since it received about 75 to 80 percent of the total inserts made to all tables in the database for the database, which had the effect of keeping the extents close together. Remember, though, that the logical and extent fragmentation numbers in DBCC SHOWCONTIG don't apply to a heap; therefore, a heap with 0 percent fragmentation can still be poorly organized for the way an application requests data.
In fact, it was the data's organization that was causing fragmentation-like effects on data retrieval. The largest table holds detail information about various topics for individuals who use the application, and the table's heap structure forces multiple individuals' detail data to share the same page. The net effect is that retrieving all the data for an individual (which can happen dozens of times in a few minutes) forces SQL Server to skip through the table and read a few rows from each of tens of pages per person. Not only does that result in more I/O, but it nearly guarantees that the extents holding those pages won't be adjacent. Additionally, because the table has thousands of inserts per minute, each extent probably holds only a small number of rows for any given individual.
The third issue, which makes the organization of the data even less efficient, is that DBCC SHOWCONTIG showed that page density was less than 50 percent. This meant that the table was using more than twice the pages necessary to store the data. Too many, half-empty pages scattered throughout the database file is a recipe for really bad performance.
The uspDefragTables stored procedure, which Listing 2 shows, solves the data-organization and page-density problems. The procedure iterates through all the rows in the table that holds the DBCC SHOWCONTIG output saved by uspBuild-Fraglist, and it looks for tables that exceed the level of fragmentation specified by the @MaxFrag parameter, as the code at callout A in Listing 2 shows.
UspDefragTables runs the DBCC DBREINDEX command on any index that has logical or extent fragmentation or an average page density that's less than 100 - @MaxFrag. For clustered indexes, this command reorganizes the entire table. For non-clustered indexes, it reorganizes the index's leaf level.
Notice that uspDefragTables specifically ignores the heap (indexID = 0) because DBCC DBREINDEX doesn't work on heaps. The best that DBCC DBREINDEX can do is create defragmented nonclustered indexes. Improving the organization of a heap requires other techniques that are often best done manually.
The most commonly recommended technique for defragmenting a heap is to create a clustered index, then drop the index immediately afterward. Dropping the index doesn't affect the physical order of the pages; therefore, you get the benefits of higher average page density and lower fragmentation without the overhead of maintaining a clustered index on a heavily inserted table. You should base your choice of columns used to build the clustered index on how your application uses the data, not how the data will be inserted over time. The best option for tables that have detail data is to use foreign-key columns, so that the clustered index will group the data by the related table's primary key. Any new data is inserted at the end of the table, but at least the existing data is stored in contiguous pages and extents.
Keeping Up with Fragmentation
It turned out that the ultimate reason for the early interruption of my sleep was because our maintenance job hadn't been run for a few weeks. (You can learn more about the maintenance procedures I wrote in the Web-exclusive sidebar "Relevant T-SQL Stored Procedures," http://www.sqlmag.com, InstantDoc ID 48913.) I'd been busy working on higher-priority tasks and hadn't had time to diagnose what was causing the maintenance job to fail. At the time, I believed the failure wasn't a critical problem because I thought it took months for the databases to become fragmented. That morning, I realized I was greatly mistaken, at least about some of the heavily inserted and updated tables.
After I added the index to prevent the table scans and built a clustered index to organize the data more efficiently (which had the effect of defragmenting the nonclustered indexes, too), I watched the fragmentation levels over the next 2 weeks. It took only a few days for the largest tables to start showing signs of fragmentation again, and by the end of a week, they were getting noticeably slower. It was clear that fragmentation would always be lurking in the shadows, and only by maintaining constant vigilance would I keep it at bay.
Morris Lewis (morris.lewis@decisionsource.com) is a senior consultant for Decision Source, which specializes in business intelligence consulting, and is the author of SQL Server Security Distilled (2nd ed., Apress).