Diagnosing Memory Leaks
You can diagnose most memory leaks with Performance Monitor and several Microsoft Windows NT Server 4.0 Resource Kit utilities. (For a list of resource kit tools, see the sidebar "Resource Kit Tools for Diagnosing and Monitoring Memory Leaks.") You start by verifying that a memory leak exists; then you identify the process or service responsible for the leak. Memory leaks in the System process are usually the result of an errant device driver; unfortunately, you can’t dynamically stop and start most device drivers, so driver leaks are difficult to find.
With Performance Monitor, you can watch overall statistics for thread, pool, and paging file usage (to verify that the leak exists). You can also monitor these counters on an individual process basis (to identify the problem service or application). Diagnosing memory leaks in services or applications is an interactive process that can take days and can use several performance-monitor profiles.
Performance Monitor objects. Performance Monitor has several object classes that assist in memory leak diagnosis. The four most important objects are the Paging File (% Usage, % Usage Peak), Memory (Pool Nonpaged Bytes, Pool Paged Bytes), Objects (Threads), and Process (Page File Bytes, Pool Nonpaged Bytes, Pool Paged Bytes, Private Bytes, Thread Count).
The Paging File category monitors overall paging file usage for the paging files you select from the instance list. The Memory category tracks overall system paging rates, pool space usage, and other metrics. The Objects class tracks systemwide counters for six items, including processes and threads. You use these classes and counters to verify that a memory leak exists. The Process object monitors activity on an individual process basis, rather than overall system statistics.
After you select the metrics to monitor in the Counter box, you select specific processes from the alphabetical list in the Instance box. For example, to monitor the DNS service and the spooler service, select dns and spoolss from the instance list. Microsoft applications and third-party applications and services also appear in the instance list (e.g., WinWord, DKService) if the application is currently active. You use the Process object and counters to identify the source of a memory leak.
Diagnosing thread leaks. Two potential sources of memory leaks exist in NT: undeleted threads and unreleased memory. For diagnostic purposes, you can monitor threads, pool space, or both. You can monitor the total number of threads with the Performance Monitor class Objects:Threads. If you see an increasing overall thread count, monitor the Process:Thread Count metric for individual processes to identify the process responsible for creating the threads.
If you think an NT or third-party service is causing the problem, start the Services applet in Control Panel. Then, tile the windows so that you can view the Performance Monitor and Services windows at the same time. When you stop and start the services one at a time, a sharp decrease in the total number of threads denotes the culprit service.
To identify problems in application software, use a similar technique. When you stop an application that is leaking memory, the total number of threads decreases dramatically. This sudden dip in the count becomes obvious when you chart the Objects:Threads and Process:Thread Count metrics.
Diagnosing pool leaks. When an NT component or service has a pool leak, the number of bytes in the paged or nonpaged pool increases steadily and never declines. To document this rise with Performance Monitor, profile the suspect services and watch Pagefile Bytes, Pool Paged Bytes, and Pool Nonpaged Bytes for an extended period of time. To create the performance profile, choose the Process category and metrics and pick the list of suspect components and services from the instance list. For example, when an application has a memory leak, the number of Private Bytes increases and never shrinks. To diagnose this problem, create a performance profile to track the application process and Private Bytes usage (Process:Private Bytes).
You might have to monitor a combination of classes and counters for hours or even days before you can clearly identify a leak. (To find out how to implement performance monitoring, see Marcia Loughry, "Monitoring Windows NT Server’s Performance," October 1998.) All memory leaks exhibit a similar pattern—pool usage and thread counts show a steady stair-step growth pattern over time. Usage remains flat for a time, then jumps up, repeatedly and indefinitely. If Memory Manager is releasing threads and pool space appropriately, the counters decline accordingly. (For a quick way to spot some memory leaks, see the sidebar "Shortcut for Spotting Memory Leaks.")
Taking Corrective Action
Many documented cases of NT memory leaks exist. Corrective action includes installing updated software; stopping and starting the service, process, or application responsible for the memory leak; and rebooting the machine regularly until a permanent fix is available. When the source of a memory leak is a core component of the OS that you can’t start and stop, the only way to correct the problem is to reboot the system often enough to keep adequate memory free until updated software is available.
When a native or third-party service is responsible for a memory leak, stopping the service releases its allocated memory. To temporarily correct the problem, stop and start the service in the Services applet in Control Panel or from the command line (using NET STOP and NET START commands). If memory leaks are causing problems on several servers, you can write a script to start and stop the responsible service and run the script on a daily or weekly basis.
You can also use the stop and start technique as a temporary solution for applications that are leaking memory. If you can’t stop an application because it is mission-critical, you might be able to schedule the application to restart after business hours. Be sure to use the application’s native shutdown feature, if available, to avoid database or file corruption and ensure an optimal application restart. These corrective techniques will help you keep memory utilization under control until a permanent solution (i.e., a service pack, hotfix, or upgrade) is available. Now that Windows 2000 (Win2K—formerly NT 5.0) is on the horizon, Microsoft has announced the elimination of more than 400 memory leaks—let’s just hope the developers don’t introduce as many problems as they fixed.