Subscribe to Windows IT Pro
December 01, 1997 12:00 AM

Inside the Blue Screen

Windows IT Pro
InstantDoc ID #301
Rating: (3)
Understand the clues the blue screen provides

The color blue has become synonymous with disaster in the Windows NT world. Although NT is more reliable and stable than its cousins, Windows 3.x and Windows 95, it nevertheless is subject to the frailties of third-party software, add-on peripherals and their device drivers, and Microsoft's bugs. Almost everyone who has used NT for any length of time has seen a blue screen (also known as the blue screen of death). Screen 1, page 58, displays a typical example. NT stops processing and paints one of these displays whenever it has encountered a situation in which it cannot continue, or in which continuing may lead to data corruption.

What most users and many developers don't know is what the screen's information means. If you're lucky, simply resetting the computer will get you on your way. If you're unlucky, you'll repeatedly get a blue screen every time you start NT or perform a particular operation (e.g., inserting a new floppy). Even if you've successfully moved past a blue screen with a reboot, understanding the clues it provides can help you avoid future blue screens or give you a hint about what driver or piece of hardware is causing problems.

This month, I'll talk about how NT generates blue screens, what leads to their appearance, how to interpret the cryptic data NT lists on them, and how to go about troubleshooting them. I'll tackle the topic from the perspective that NT device drivers are not your forte and that debugging a blue screen with dump analysis tools or a kernel-mode debugger is infeasible. In the process, I'll describe the inner workings of NT's kernel mode. (For a different angle on blue screens, see Mark Edmead, "The Blue Screen of Death," June 1997.)

NT Architecture Basics
To understand what leads to a blue screen, you first need to understand NT's basic architecture. NT executes in two modes, user mode and kernel mode, as shown in Figure 1, page 59. Kernel mode is a highly privileged processor mode, with direct access to all hardware and memory; user mode is a less privileged mode, with no direct access to hardware and restricted access to memory.

User mode is the mode in which applications and operating system environment subsystems execute. The operating system environments that NT supplies include POSIX, OS/2, Win16, DOS, and Win32. Applications are clients of exactly one environment subsystem and use only the APIs that subsystem exports. Thus, Win32 programs are clients of the Win32 subsystem and use only the Win32 API.

The subsystems use basic NT services that the NT Executive and the Microkernel provide. These services run in kernel mode. The Executive includes core operating system components: the Process Manager, Virtual Memory Manager, I/O Manager, Local Procedure Call (LPC) Facility, Object Manager, and Security Reference Monitor. The Executive is generally portable across processor architectures (e.g., Alpha, x86), and it relies on the Microkernel for processor-specific functions such as context-switching (scheduling) and synchronization primitives.

Beneath the Microkernel resides the Hardware Abstraction Layer (HAL), through which the Executive subsystems and the Microkernel interface with the processor. Microsoft ships different HALs for different processors and processor boards.

Device drivers are modules that interface NT and applications to specific hardware devices. A large number of device drivers for disk drives, video cards, modems, network cards, and input devices ship with NT. However, hardware vendors can include custom device drivers with their hardware, and NT dynamically adds the drivers to its kernel-mode environment.

User Mode vs. Kernel Mode
What differentiates user mode from kernel mode is the privilege level. A program executing in user mode runs in a sandbox (not unlike a Java virtual machine's sandbox) that the NT Executive and the program's operating system environment create for the program. The sandbox enforces restrictions as to what the program can do. One type of restriction relates to what parts of the computer's memory the program can reference and in what ways.

Figure 2 shows the virtual memory map that NT creates for applications. Addressable memory totals 4GB, but NT evenly divides the space between the memory assigned to a program and the memory that the kernel-mode portion of NT uses.

The lower 2GB mapping changes, depending on which program is currently running. For example, if Microsoft Word is running, NT places Word's address mapping in the lower 2GB; if Netscape Navigator runs next, its mapping replaces Word's mapping.

The upper 2GB mapping always remains that of the Executive, Microkernel, device drivers, and HAL. Thus, the split between user mode and kernel mode also shows up in NT's address space mapping. (In NT Server 4.0, Enterprise Edition, you can adjust the address split between user mode and kernel mode so that applications have 3GB of memory, with 1GB left for NT's Executive, drivers, and HAL. You will see this split only when NT is running on systems with several gigabytes of physical memory.)

The primary memory restriction placed on user-mode programs is that they cannot access any of the kernel-mode memory. User-mode programs also cannot access invalid portions of their mapping (i.e., portions not filled with data or code from the program). This arrangement contrasts with the kernel-mode portions of NT, which have free rein over the entire address map. For example, NT does not stop a device driver from writing data into Word's address map, but NT prevents Word from writing over the device driver's image.

The user-mode sandbox enforces another restriction that limits a program's ability to directly access hardware devices such as disks, the video screen, and the printer. Programs must typically go through their operating system environment (e.g., Win32) to read data from or write data to a peripheral. The operating system environment then usually calls on the services of the Executive in kernel mode, effectively forwarding the request. The Executive finally completes the request, sometimes with the aid of a device driver, but almost always with the use of functions in the HAL that interface with the computer's hardware. NT implements the transition between user mode and kernel mode as a system call gateway, through which the passage of data is precisely controlled.

Related Content:

ARTICLE TOOLS

Comments
  • Anonymous User
    7 years ago
    Apr 01, 2005

    On a Fuji SMT assembly machine, they would crap out and give a long list of code in red letters/numbers. We always referred to it as the red error of death. Windows uses a blue screen, we call it the blue screen of death. Every software manufacturer has their preferred color that they wish to die under, Microsoft prefers to die under blue... :)

  • Anonymous User
    7 years ago
    Mar 22, 2005

    Very Good. Very informative...I learnt that - a BSOD can only be caused by a device driver or operating environment, not by an app which can cause exceptions or Dr.Watson's error

  • Anonymous User
    7 years ago
    Feb 27, 2005

    Oh how I wish I had had this information before this SOB BSOD assassinated my small computer. Is it ever possible to get out of that BSOD without having to reinstall the entire system?

    Superb article!

  • Anonymous User
    7 years ago
    Jan 07, 2005

    Regarding the person asking what "PAGE_FAULT_IN_NON_PAGED_AREA" means, - if you're programming savvy, it's normally the result of a driver dereferencing a NULL or invalid pointer. Another error code you might get because of that is IRQ_NOT_LESS_OR_EQUAL.

  • Anonymous User
    8 years ago
    Dec 21, 2004

    gigi, get a new monitor. LoL

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.