Subscribe to Windows IT Pro
December 01, 1997 12:00 AM

Inside the Blue Screen

Windows IT Pro
InstantDoc ID #301
Rating: (3)

Mapping the Blue Screen
The blue screen contains five areas of text from top to bottom: the Stop Code, system information, a list of loaded drivers, the stack trace, and an administrative message. In Screen 1, blank lines separate these areas. Some areas might be missing in a blue screen if the system state is too corrupt for NT to fill them in.

The administrative message tells you to contact your systems administrator if you have a chronic blue screen problem on your system. The most useful portion of the display is usually the Stop Code area. This area lists the Stop Code and the four additional parameters passed to KeBugCheck. In Screen 1, the Stop Code is 0x000000A, and the additional parameters appear inside the parentheses after the Stop Code.

The Stop Code is a number that represents the nature of the detected problem. The bugcodes.h file in the Windows NT Device Driver Kit contains a complete list of the 150 or so Stop Codes. However, you will typically encounter only 4 or 5 of them. The text line below the Stop Code provides the text equivalent of the Stop Code numeric identifier. I'll discuss some of the common Stop Codes a little later.

Interpreting the additional Stop Code parameters rarely provides any insight into a problem for anybody other than a device driver writer (or a member of the Microsoft NT development team). Fortunately, NT does some interpretation for us. KeBugCheck scans the parameters for one that looks like it might be an address pointing to the memory image of an Executive subsystem or a device driver. When KeBugCheck finds one, it prints the parameter, the base address of the module the parameter is in, and the name of the module. This last piece of information is crucial, and I'll describe how you can use it a little later.

The system information area of the screen is below the Stop Code area, and it simply identifies the system's processor type (e.g., Pentium, x486) and NT's base build number (no Service Pack information appears). In Screen 1, the Build Number is 0xf0000565 (1381 in decimal), which is what you'll see for any NT 4.0 installation. An IRQL number also appears in this area, but a bug in KeBugCheck causes it to record the IRQL incorrectly.

Below the system information on the blue screen is the loaded driver area. Here you'll see a listing of all the registered device drivers at the time of the stop. KeBugCheck prints the name, base memory address, and date-stamp (the time a driver was built). Unless you develop device drivers, this information is useless.

Finally, just below the loaded driver area is a snapshot of the system stack at the time of the call to KeBugCheck. Each module (except the first one) in the list had invoked the module printed on the line above it and was waiting for a result. The system detected a problem while the module on the first line was executing, and often this module matches the module shown in the Stop Code area (Ntfs.SYS in Screen 1).

Interpreting the Blue Screen Information
So, what do you do with the data the blue screen provides? Many times, all you can do is reset the system and hope that the blue screen doesn't happen again. But sometimes an important clue is lurking in the Stop Code area or stack trace that can help you take a more proactive approach to ridding the system of the blue screen.

First, the Stop Code can provide all the information you need to identify the problem. The sidebar, "Common Stop Codes," page 62, lists several Stop Codes, their causes, and some suggestions about what to do if you encounter one. Microsoft Windows NT Workstation Resource Kit contains more information about Stop Codes.

Often, you begin seeing blue screens after you install a new software product or piece of hardware. If you've just added a driver, rebooted, and got a blue screen early in system initialization, you can reset the machine and press the space bar when instructed, to get the Last Known Good configuration. Enabling Last Known Good causes NT to revert to a copy of the Registry's device driver registration key (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services) from the last successful boot (before you installed the driver).

If you keep getting blue screens, an obvious approach is to uninstall the things you added just before the appearance of the first blue screen. If some time has passed since you added something new or you added several things at about the same time, you need to note the names of the modules you see in both the Stop Code and stack trace areas. Note that ntoskrnl.exe refers to the image that contains all NT's core kernel-mode subsystems as well as the Microkernel.

If you recognize any of the module names as being related to something you just added (such as scsiport.sys if you put on a new drive), you've possibly found your culprit. Many device drivers have cryptic names, so one thing you can do to figure out which application or hardware device is associated with a name is to run the Regedit Registry viewing tool the next time you boot the system or on a similarly equipped machine. Search for the name of the driver under the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services key. This branch of the Registry is where NT stores registration information for every device driver in the system. If you find a match, look for a value called DisplayName. Some drivers fill in this value with a name descriptive of the device driver's purpose. For example, you might find Virus Scanner, which can implicate the antivirus software you have running.

You can also search Microsoft's online Knowledge Base (http://www.microsoft.com) for the Stop Code and the name of the suspect hardware or application. You might find information about a workaround, an update, or a Service Pack that fixes the problem you're having.

Setting the Blue Screen Options
Instead of just halting the system with a blue screen, you can have NT log an event to the system log, send you an administrative alert, write a dump of the machine's physical memory to disk, or automatically reboot the computer. You can configure these options on the Startup/Shutdown tab of the System applet in Control Panel, as shown in Screen 3, page 64.

If you want to track how often a computer runs into problems, select the option to record the event in the system log, which you can view with the Event Viewer administrative tool. In general, you won't want the machine's memory written to disk unless you have a chronic problem that a particular hardware vendor or Microsoft will help you debug. In this case, be prepared to copy a file as large as the computer's memory (i.e., 128MB for a 128MB machine) to send for debugging. Contact the hardware vendor or Microsoft for instructions about where and on what medium to send the dump.

Finally, automatic rebooting is an option you want to enable if your machine is performing a task for which you want to minimize downtime. If you have a Web server that configures itself automatically when NT starts, automatic rebooting after a stop will keep your site offline for as little time as possible.

At Wits End
Unfortunately, you can't run a magical program to identify the exact cause of blue screens or make them go away. Even with extensive knowledge of NT internals and device drivers, you'll still find that reading a blue screen and trying to figure out what happened is a little like fumbling around in a dark room. However, the next time you're unpleasantly surprised with a blue display, you might find some solace knowing what's going on behind the scenes--that a subsystem or driver made a call to KeBugCheck to provide the information in the different areas of the screen.

Related Content:

ARTICLE TOOLS

Comments
  • Anonymous User
    7 years ago
    Apr 01, 2005

    On a Fuji SMT assembly machine, they would crap out and give a long list of code in red letters/numbers. We always referred to it as the red error of death. Windows uses a blue screen, we call it the blue screen of death. Every software manufacturer has their preferred color that they wish to die under, Microsoft prefers to die under blue... :)

  • Anonymous User
    7 years ago
    Mar 22, 2005

    Very Good. Very informative...I learnt that - a BSOD can only be caused by a device driver or operating environment, not by an app which can cause exceptions or Dr.Watson's error

  • Anonymous User
    7 years ago
    Feb 27, 2005

    Oh how I wish I had had this information before this SOB BSOD assassinated my small computer. Is it ever possible to get out of that BSOD without having to reinstall the entire system?

    Superb article!

  • Anonymous User
    7 years ago
    Jan 07, 2005

    Regarding the person asking what "PAGE_FAULT_IN_NON_PAGED_AREA" means, - if you're programming savvy, it's normally the result of a driver dereferencing a NULL or invalid pointer. Another error code you might get because of that is IRQ_NOT_LESS_OR_EQUAL.

  • Anonymous User
    8 years ago
    Dec 21, 2004

    gigi, get a new monitor. LoL

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.