Subscribe to Windows IT Pro

 

Get Newsletters

  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips

Subscribe Now!

October 25, 2005 12:00 AM

Crouching Server, Hidden Memory Leak

How I rescued an SMB's server and restored its missing memory
Windows IT Pro
InstantDoc ID #47770
Rating: (11)

Monday, May 16, 9:30 a.m.: Customer's server crashes for the umpteenth time.
Accusations hurtled through the air, and angry email messages and phone calls flew furiously between the small-to-midsized business (SMB) customer and the Value Added Reseller (VAR) that supported the customer's financial application. What spawned this IT battle scene? It all started when a Windows 2000 server that hosted the customer's application started crashing intermittently. I work for a Microsoft Business Solutions Gold Partner, and customers who use Microsoft Business Solutions for Financial Management—Great Plains software are an important part of our practice. My boss dispatched me to the client's site to assess the problem.

By the time the client called us, the server was crashing every few days. Before the crash, ODBC connections from Great Plains clients would become sluggish and finally disconnect. The client's accounting managers, IT people, and Great Plains implementers hurled epithets at each other over the fallen server.

The Great Plains implementer on this project is a capable technician, but his training and experience hadn't prepared him to handle the problem at hand: resolving server lockups and crashes. In desperation, he emailed the client/server coordinator and copied me on the message.

Our Microsoft Customer Relationship Management (CRM) system contains our clients' histories for contacts, product purchases, licensing keys, trouble tickets, and other relevant customer information. I located the client's resident IT support person in the CRM database and phoned him.

10:00 a.m.: I begin problem resolution by calling the client's onsite IT person.
I introduced myself to the IT support person and explained why I was calling. Quickly, I reassured him that I—the VAR—was on his side and that I wanted to help him resolve the problem. I won his trust, and he gave me his full cooperation.

He told me that the server was downed like a badly wounded soldier, bleeding memory slowly but continuously. He also told me that his company's security policies prohibited using remote management software, which would have let me examine the injured system. I'd have to find another way to investigate the problem.

10:20 a.m.: I examine the server event logs for clues.
I asked the IT person whether he could send me the server's System and Application event logs, SQL Server event logs, and perhaps a snapshot of Task Manager. He emailed them to me at 10:40 a.m.

I opened the logs and looked at the System log first. The first thing I saw was a bright red streak of Event ID 2019 errors flashing on my laptop screen: The server was unable to allocate from the system nonpaged pool because the pool was empty. Then, in the Application log, I saw Event ID 208. This error fingered the Great Plains application as part of the problem.

In the SQL Server event log, I saw the Event ID 17052 error. And finally, in the Task Manager snapshot, I got a little more information about the Event ID 2019 error, as Figure 1 shows.

I looked in the Microsoft Help and Support Knowledge Base and found an article at http://support.microsoft.com/?kbid=888928 that showed that the Event ID 2019 error might be related to having McAfee VirusScan installed on the server. McAfee VirusScan was, in fact, on the server, and the vendor had a hotfix for the problem. I notified the local IT support person, who downloaded and quickly applied the hotfix and rebooted the server. Alas, the hotfix failed to stop the resource bleeding.

11:30 a.m.: En route to the client's site, I find a fruitful lead.
Finally I persuaded the client to let me investigate the problem on site. To pass time during my drive to the client's site, I listened to a CD; no, not Pink Floyd or Willie Nelson, but Mark Minasi's Tuning Your Windows 2000 Servers. While perusing the event logs, I'd been mulling over memory leaks and how to find them. On the CD, Mark talks about memory and mentions "leakers"—programs that allocate a file handle every few seconds. By itself, the file handle doesn't use much memory, but the repeated allocations gradually use up a great deal of it.

1:15 p.m.: I find the source of the problem.
When I arrived at the site, I met the IT support person, who ushered me into the server room. I opened Task Manager on the server and customized the view by adding the User Name, Paged Pool, Non-paged Pool, Handle Count, and Thread Count fields. I clicked OK, then maximized the Task Manager window and sorted by file handles.

On my Windows XP laptop, svchost.exe uses 1424 handles and outlook.exe uses 1333 handles. Running on the client's server, however, I found an applet associated with sending messages from the onboard SCSI card. That program had used 700,000 file handles since it had been rebooted 10 minutes before—and the file-handle count continued to climb.

I did a quick Google search on the filename of the errant program, and my results showed that many people were having problems with this file and certain motherboards. This added further evidence that we'd found the problem. Earlier, I'd told the Great Plains consultant that I suspected a memory leak. As I stared intently at Task Manager, I exclaimed, "Well, I guess we found our 'leaker'!"

1:45 p.m.: I bring the "crouching server" back to life.
The final step was to fix the rogue program so that it no longer created file handles ad infinitum. Although the server hardware was under warranty, its service level agreement (SLA) didn't cover onsite support. The server housed sensitive financial information, so moving it off site for service wasn't an option.

My alternative (and easier) solution was to modify the registry entries for the applet. I ran regedit, found the applet's launch areas in the registry, and made changes to the registry subkeys related to the applet (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run) to prevent the applet from running when the server was rebooted.

Finally, I rebooted the server, and the problem vanished. The administrator signed my time sheet and wished me well. As I drove back to the office, I put my trusty Windows technical CD back in the player. For me, it was just another day of tracking down technical problems, dispelling customer qualms, and relearning something interesting about Windows.

Related Content:

ARTICLE TOOLS

Comments
  • KALYAN
    4 years ago
    May 28, 2008

    Very useful information

  • Ward
    7 years ago
    Nov 10, 2005

    Excellent article, it contributed to my "Learn something new every day" plan. Where do I get the CD you mentioned, "Tuning Your Windows 2000 Servers"? Is this an audio book? I can find nothing on Amazon.

  • ASMB-Support
    7 years ago
    Nov 04, 2005

    Another great article from Curt dealing with “Real World” IT issues… Very informative article, definitely going into our Tips and Tricks collection. Another reminder why we also have the Minasi collection…

    Tim Bolton

  • CURT
    7 years ago
    Oct 27, 2005

    Nice article Curt. Persistance is the key to success in gaining the trusted advisor approach of a VAR. Nice job....

  • CURT
    7 years ago
    Oct 27, 2005

    Nice bit of troubleshooting, Curt. Although it is easy to suspect a resource leak of some kind, it is not always trivial to find it, especially if it is not just 'memory' but handles or even more esotheric stuff. Good job.

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

White Papers

Get your Windows 7 deployment off to the right start by implementing PC lockdown. A locked-down environment is easier and cheaper to support since users are less likely to make unnecessary changes to the core system configuration - read more here!

Essential Guides

Is your iSCSI "lossy"? The reality is that most off-the-shelf Ethernet hardware deployed for iSCSI can lose packets, resulting in slow performance or application downtime. Learn how to assess your current iSCSI infrastructure and engineer an advanced iSCSI SAN infrastructure.

Web Seminars

What's the best way to keep your network safe from malware? In this web seminar, security expert Greg Shields suggests an alternative method to the traditional blacklisting approach that is common with anti-virus and anti-malware solutions.

eLearning Series

We bring the experts direct to you to share their real-world perspective and expertise. During each event, three sessions stream in real time, so you can learn, ask questions, and get solutions.
Upcoming event: Getting the Most with Exchange 2010 with Paul Robichaux

Subscribe to Windows IT Pro!

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.