Subscribe to Windows IT Pro

 

Get Newsletters

  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips

Subscribe Now!

August 10, 1999 11:16 AM

Recovering from NT Startup Failures, Part 1

Windows IT Pro
InstantDoc ID #7076
Rating: (0)
Tricks to prepare for and recover from NT meltdowns

That would you do if one of your core production servers crashed the next time you reboot it? More important, how much time would you need to fix the problem? For most Windows NT administrators, the thought of a mission-critical production server experiencing STOP errors (aka the blue screen of death) or any form of server outage makes them break out in a cold sweat.

A hosed NT system is never fun, but an unavailable critical server means lost productivity, lost time, lost money, and, of course, an angry boss. In this first installment of a two-part article, I discuss advanced tools and procedures that you can use to improve the availability of your network servers and to increase your chances of recovering from an NT boot failure. In addition, I delve into lesser-known techniques that you can employ right away to help you recover a downed NT system in the future. In this article, I don't address clustering solutions, and I assume that each system is a standalone, nonclustered NT system without system-level failover.

Common Calamities
Although various circumstances can cause an NT system to crash at startup, the result of these circumstances is usually the dreaded blue screen of death, which Screen 1, page 100, exemplifies. After NT halts the system, it displays this screen to protect the system against data corruption. In addition to being blue as its name implies, a blue screen displays important information about the system's state at the time of the STOP error. The screen lists the STOP code, the location in memory where the problem occurred, and the drivers loaded in memory when the STOP took place. However, pinning down the source of a STOP error isn't always easy. In my experience, a problem usually develops from one of the following scenarios:

  • You install software that corrupts the HKEY_LOCAL_MACHINE portion of the Registry—particularly, software that installs new services or drivers. This action usually results in a STOP error or blue screen, which indicates that the system Registry or a particular hive file failed.
  • You change a system's network configuration, which causes NT to rewrite network bindings and their related Registry entries (i.e., NT corrupts or overwrites critical OS files with invalid or incompatible versions while the system is in use).
  • You install a new service or driver on the system, which causes a system-level incompatibility problem that results in a STOP error when you reboot (i.e., underlying file corruption has occurred on a key system file that you loaded into memory before the corruption).

Each of these situations has a different set of underlying causes and solutions, so let's look at each scenario individually.

Registry Corruption
The system Registry is the heart of an NT installation. Thus, depending on the nature and extent of the damage, a corrupted Registry often results in a STOP error or blue screen of death at startup. Damage to the Registry can be physical or logical. Physical damage means that something (usually disk-related corruption) has scrambled the Registry hive files (e.g., the SOFTWARE or SYSTEM files in the \%winntroot%\system32\config folder). Logical damage means that a third-party application, a user, or NT has written invalid data to the Registry, which can trigger an NT startup failure if the logically damaged Registry entry is critical.

Unfortunately, you can't always tell whether a damaged Registry is the cause of your system's STOP error. The STOP error might identify a telltale sign such as a hard Registry error or a reference to a particular damaged hive file. However, in some cases, the STOP error doesn't indicate Registry damage.

If you suspect a Registry-related problem, the first line of defense is to restore a previous known-good Registry configuration. You can use several methods to accomplish this solution.

The Last Known Good Configuration option. You access this option by pressing the space bar when the system prompts you during the NT boot process, and selecting the option to restore a previous configuration. This method is the quickest and easiest solution, if it works. Unfortunately, this solution's failures outweigh its successes in real-world applications because its scope is only a previously known-good incarnation of one portion of the Registry (i.e., a ControlSet00X Registry subtree of the HKEY_LOCAL_MACHINE\SYSTEM key). You have a better chance of success using the Last Known Good Configuration option if the problem is localized to this portion of the Registry and an event that immediately precedes the invocation of the Last Known Good Configuration option caused the problem. However, this procedure won't cure most of your Registry-corruption ills.

NT Setup's Repair process and an Emergency Repair Disk (ERD). You can use NT Setup's Repair process to inspect and replace individual Registry hive files if the Last Known Good Configuration option fails to resolve the problem. After you insert your ERD, Setup lists the options you can select to specify which portions of the NT installation you want Setup to inspect, as Screen 2 shows. If you select Inspect registry files, Setup displays a list of Registry hive files and lets you select which files you want Setup to replace. Setup takes the replacement files from the ERD or, if you didn't provide an ERD, from the \%systemroot%\repair folder. The ERD and the \%systemroot%\repair folder store replacement files in compressed format, and each hive file has an underscore (_) extension (e.g., SYSTEM._, SOFTWARE._).

Using the most recent replacement files is important so that you don't lose application and service configuration information. (For information about how to update your ERD, see Michael Reilly's "The Emergency Repair Disk," January 1997.) In addition, don't restore the SAM and SECURITY hives on an NT server domain controller, unless you used the rdisk /s (or /s-) option when you ran the ERD utility (i.e., rdisk.exe). Otherwise, Setup overwrites your SAM database with the database version Setup created during the original NT installation and creates a new set of problems. In addition, ensure that you created the replacement files under the same service pack level as the files you're replacing because Service Pack 3 (SP3) and later make security-related changes to the SAM and SECURITY hives. Otherwise, you might not be able to log on after the repair is complete. Restoring the SAM and SECURITY files usually won't resolve your Registry corruption problems anyway because the SYSTEM and SOFTWARE hives usually cause Registry boot problems. Thus, start restoring previous Registry files with the SYSTEM and SOFTWARE files, and replace the SYSTEM hive first because it contains references to important system components, including drivers and services.

Related Content:

ARTICLE TOOLS

Comments
  • Sean Daily
    11 years ago
    Mar 06, 2001



    In the sidebar "Think Parallel" in "Recovering from NT Startup Failures, Part 1" (September 1999), I discuss a procedure that you can use to solve your problem. You can find the article online at http://www .win2000mag.com. To view the sidebar, enter 7075 in the InstantDoc ID text box; enter 7076 to view the whole article. You can also find tips about building multi-OS systems in "Mastering Multibooting Madness" (July 1999) and "Multibooting Windows 2000 Systems" (Summer 2000).
    --Sean Daily

  • Sean Daily
    11 years ago
    Mar 06, 2001



    In the sidebar "Think Parallel" in "Recovering from NT Startup Failures, Part 1" (September 1999), I discuss a procedure that you can use to solve your problem. You can find the article online at http://www .win2000mag.com. To view the sidebar, enter 7075 in the InstantDoc ID text box; enter 7076 to view the whole article. You can also find tips about building multi-OS systems in "Mastering Multibooting Madness" (July 1999) and "Multibooting Windows 2000 Systems" (Summer 2000).
    --Sean Daily

  • Laura Clary
    11 years ago
    Mar 06, 2001



    I'm interested in setting up a machine that will boot to DOS, Windows NT running multiple protocols, and NT running only TCP/IP. I've never set up a machine to boot to two different versions of NT. What's the best way to accomplish this task?

  • Theodor Schain
    12 years ago
    Sep 14, 2000

    I read Sean Daily's "Recovering from NT Startup Failures, Part 1" (September 1999), which includes the sidebar "Think Parallel." I've installed Windows NT 4.0 once from the three setup disks and another time from a bootable CD-ROM. I want to install a parallel NT installation on my workstation on the same partition as my original installation of NT 4.0, but I don't know where to start. Can you help?

  • SeanDaily
    12 years ago
    Sep 14, 2000

    Setting up a parallel NT installation is no different from setting up an initial installation. Simply run NT Setup, but choose to install a new installation rather than to upgrade the existing one (a very important step and the only major catch in the process). When you're finished, the NT Boot Loader menu will show two (four if you count the VGA-mode entries) choices for NT. To make the menu options clearer, you can edit the boot.ini file and rename the parallel installation to something like Windows NT Recovery Installation.
    --­Sean Daily

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

White Papers

Get your Windows 7 deployment off to the right start by implementing PC lockdown. A locked-down environment is easier and cheaper to support since users are less likely to make unnecessary changes to the core system configuration - read more here!

Essential Guides

Is your iSCSI "lossy"? The reality is that most off-the-shelf Ethernet hardware deployed for iSCSI can lose packets, resulting in slow performance or application downtime. Learn how to assess your current iSCSI infrastructure and engineer an advanced iSCSI SAN infrastructure.

Web Seminars

What's the best way to keep your network safe from malware? In this web seminar, security expert Greg Shields suggests an alternative method to the traditional blacklisting approach that is common with anti-virus and anti-malware solutions.

eLearning Series

We bring the experts direct to you to share their real-world perspective and expertise. During each event, three sessions stream in real time, so you can learn, ask questions, and get solutions.
Upcoming event: Getting the Most with Exchange 2010 with Paul Robichaux

Subscribe to Windows IT Pro!

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.