Subscribe to Windows IT Pro
April 18, 2001 12:00 AM

NT Performance Tuning

Windows IT Pro
InstantDoc ID #20366
Rating: (0)
Fix the weak links in your system's memory, CPU, disk, and network interface

When you think of a computer system's performance, imagine a chain: The slowest component (or weakest link) affects the performance of the overall system. This weak link in the performance chain is also called a bottleneck. The best indicator that a bottleneck exists is the end user's perception of a lag in a system's or application's response time. To tune a system's performance, you need to determine where—CPU, memory, disk, network, applications, clients, or Windows NT resources—a bottleneck exists. If you add resources to an area that isn't choking your system's performance, your efforts are in vain.

You can use NT Server's native tools (or those of third-party vendors) to optimize the performance of your system and identify potential bottlenecks. NT Server's primary performance tools are Task Manager, which Figure 1, page 42, shows, and Performance Monitor, which Figure 2, page 42, shows. Task Manager can give you a quick look at what's happening in your system. Although it doesn't provide a logging mechanism, Task Manager displays specific information about your system's programs and processes. Task Manager also lets you manage the processes that might be adversely affecting your system. You can use Performance Monitor to obtain more detailed performance information (in the form of charts, alerts, and reports that specify both current activity and ongoing logging) based on system events. The Microsoft Windows NT Server 4.0 Resource Kit also contains tools that you can use for troubleshooting. (For a sampling of NT performance-monitoring tools, see Table 1, page 43.)

Before you start performance tuning, you must understand your system. You should know what server hardware you have, how NT operates, what applications you're running, who uses the system, what kind of workload the system handles, and how your system fits into the network infrastructure. You also need to establish a performance baseline that tells you how your system uses its resources during periods of typical activity. (You can use Performance Monitor to establish your baseline.) Until you know how your system performs over time, you won't be able to recognize slowdowns or improvements in your NT server's performance. Include as many objects in your baseline measurements as possible (e.g., memory, processor, system, paging file, logical disk, physical disk, server, cache, network interface). At a minimum, include all four major resources (i.e., memory, processor, disk, and network interface) when taking a server's baseline measurements—regardless of server function (e.g., file server, print server, application server, domain server).

Because all four of a server's major resources are interrelated, locating a bottleneck can be difficult. Resolving one problem can cause another. When possible, make one change at a time, then compare your results with your baseline to determine whether the change was helpful. If you make several changes before performing a comparison, you won't know precisely what works and what doesn't work. Always test your new configuration, then retest it to be sure changes haven't adversely affected your server. Additionally, always document your processes and the effects of your modifications.

Memory
Insufficient memory is a common cause of bottlenecks in NT Server. A memory deficiency can disguise itself as other problems, such as an overloaded CPU or slow disk I/O. The best first indicator of a memory bottleneck is a sustained high rate of hard page faults (e.g., more than five per second). Hard page faults occur when a program can't find the data it needs in physical memory and therefore must retrieve the data from disk. You can use Performance Monitor to determine whether your system is suffering from a RAM shortage. The following counters are valuable for viewing the status of a system's memory:

  • Memory: Pages/sec—Shows the number of requested pages that aren't immediately available in RAM and that must be read from the disk or that had to be written to the disk to make room in RAM for other pages. If this number is high while your system is under a usual load, consider increasing your RAM. If Memory: Pages/sec is increasing but the Memory: Available Bytes counter is decreasing toward the minimum NT Server limit of 4MB, and the disks that contain the pagefile.sys files are busy (marked by an increase in %Disk Time, Disk Bytes/sec, and Average Disk Queue Length), you've identified a memory bottleneck. If Memory: Available Bytes isn't decreasing, you might not have a memory bottleneck. In this case, check for an application that's performing a large number of disk reads or writes (and make sure that the data isn't in cache). To do so, use Performance Monitor to monitor the Physical Disk and Cache objects. The Cache object counters can tell you whether a small cache is affecting system performance.
  • Memory: Available Bytes—Shows the amount of physical memory available to programs. This figure is typically low because NT's Disk Cache Manager uses extra memory for caching, then returns the extra memory when requests for memory occur. However, if this value is consistently below 4MB on a server, excessive paging is occurring.
  • Memory: Committed Bytes—Indicates the amount of virtual memory that the system has committed to either physical RAM for storage or to pagefile space. If the number of committed bytes is larger than the amount of physical memory, more RAM is probably necessary.
  • Memory: Pool Nonpaged Bytes—Indicates the amount of RAM in the nonpaged pool system memory area, in which OS components acquire space to accomplish tasks. If the Memory: Pool Nonpaged Bytes value shows a steady increase but you don't see a corresponding increase in server activity, a running process might have a memory leak. A memory leak occurs when a bug prevents a program from freeing up memory that it no longer needs. Over time, memory leaks can cause a system crash because all available memory (i.e., physical memory and pagefile space) has been allocated.
  • Paging File: %Usage—Shows the percentage of the maximum pagefile size that the system has used. If this value hits 80 percent or higher, consider increasing the pagefile size.

You can instruct NT Server to tune the memory that you have in your system. In the Control Panel Network applet, go to the Services tab and select Server. When you click Properties, a dialog box presents four optimization choices, as Figure 3 shows: Minimize Memory Used, Balance, Maximize Throughput for File Sharing, and Maximize Throughput for Network Applications. Another parameter that you can tune—on the Performance tab of the System Properties dialog box—is the virtual memory subsystem (aka the pagefile).

If you have a multiuser server environment, you'll be particularly interested in two of these memory-optimization strategies: Maximize Throughput for File Sharing and Maximize Throughput for Network Applications. When you select Maximize Throughput for File Sharing, NT Server allocates the maximum amount of memory for the file-system cache. (This process is called dynamic disk buffer allocation.) This option is especially useful if you're using an NT Server machine as a file server. Allocating all memory for file-system buffers generally enhances disk and network I/O performance. By providing more RAM for disk buffers, you increase the likelihood that NT Server will complete I/O requests in the faster RAM cache instead of in the slower file system on the physical disk.

When you select Maximize Throughput for Network Applications, NT Server allocates less memory for the file-system cache so that applications have access to more RAM. This option optimizes server memory for distributed applications that perform memory caching. You can tune applications (e.g., Microsoft SQL Server, Exchange Server) so that they use specific amounts of RAM for buffers for disk I/O and database cache.

However, if you allocate too much memory to each application in a multiapplication environment, excessive paging can turn into thrashing. Thrashing occurs when all active processes and file-system cache requests become so large that they overwhelm the system's memory resources. When thrashing occurs, requests for RAM create hard page faults at an alarming rate, and the OS devotes most of its time to moving data in and out of virtual memory (i.e., swapping pages) rather than executing programs. Thrashing quickly consumes system resources and typically increases response times. If an application you're working with stops responding but the disk drive LED keeps blinking, your computer is probably thrashing.

To ease a memory bottleneck, you can increase the size of the pagefile or spread the pagefile across multiple disks or controllers. An NT server can contain as many as 16 pagefiles at one time and can read and write to multiple pagefiles simultaneously. If disk space on your boot volume is limited, you can move the pagefile to another volume to achieve better performance. However, for the sake of recoverability, you might want to place a small pagefile on the boot volume and maintain a larger file on a different volume that offers more capacity. Alternatively, you might want to place the pagefile on a hard disk (or on multiple hard disks) that doesn't contain the NT system files or on a dedicated non-RAID FAT partition.

I also recommend that you schedule memory-intensive applications across several machines. Through registry editing, you can enable an NT server to use more than 256KB of Level 2 cache. Start regedit.exe, go to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management subkey, and double-click SecondLevelDataCache. Click the decimal base, and enter the amount of Level 2 cache that you have (e.g., 512 if you have 512KB). Then, click OK, close the registry editor, and reboot. I also recommend disabling or uninstalling unnecessary services, device drivers, and network protocols.

Processor
To determine whether an NT Server machine has a CPU bottleneck, remember to first ensure that the system doesn't have a memory bottleneck. CPU bottlenecks occur only when the processor is so busy that it can't respond to requests. Symptoms of this situation include high rates of processor activity, sustained long queues, and poor application response. CPU-bound applications and drivers and extreme interrupts (which badly designed disk or network-subsystem components create) are common causes of CPU bottlenecks.

You can use the following counters to view the status of your system's CPU utilization:

  • Processor: % Processor Time—Measures the amount of time a processor spends executing nonidle threads. (If your system has multiple processors, you need to monitor the System: % Total Processor Time counter.) If a processor consistently runs at more than 80 percent capacity, the processor might be experiencing a bottleneck. To determine the cause of a processor's activity, you can monitor individual processes through Performance Monitor. However, a high Processor: % Processor Time doesn't always mean the system has a CPU bottleneck. If the CPU is servicing all the NT Server scheduler requests without building up the Server Work Queues or the Processor Queue Length, the CPU is servicing the processes as fast as it can handle them. A processor bottleneck occurs when the System: Processor Queue Length value is growing; Processor: % Processor Time is high; and the system's memory, network interface, and disks don't exhibit any bottlenecks. When a CPU bottleneck occurs, the processor can't handle the workload that NT requires. The CPU is running as fast as it can, but requests are queued and waiting for CPU resources.
  • Processor: % Privileged Time—Measures the amount of time the CPU spends performing OS services.
  • Processor: % User Time—Measures the amount of time the processor spends running application and subsystem code (e.g., word processor, spreadsheet). A healthy percentage for this value is 75 percent or less.
  • Processor: Interrupts/sec—Measures the number of application and hardware device interrupts that the processor is servicing. The interrupt rate depends on the rate of disk I/O, the number of operations per second, and the number of network packets per second. Faster processors can tolerate higher interrupt rates. For most current CPUs, 1500 interrupts per second is typical.
  • Process: % Processor Time—Measures the amount of a processor's time that a process is occupying. Helps you determine which process is using up most of a CPU's time.
  • System: Processor Queue Length—Shows the number of tasks waiting for processor time. If you run numerous tasks, you'll occasionally see this counter go above 0. If this counter regularly shows a value of 2 or higher, your processor is definitely experiencing a bottleneck. Too many processes are waiting for the CPU. To determine what's causing the congestion, you need to use Performance Monitor to monitor the process object and further analyze the individual processes making requests on the processor.

One way to resolve processor bottlenecks is to upgrade to a faster CPU (if your system board supports it). If you have a multiuser system that's running multithreaded applications, you can obtain more processor power by adding CPUs. (If a process is multithreaded, adding a processor improves performance. If a process is single-threaded, a faster processor improves performance.) However, if you're running the single-processor NT kernel, you might need to update the kernel to the multiprocessor version. To do so, reinstall the OS or use the resource kit's uptomp.exe utility.

Related Content:

ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.