Is your Exchange 2000 Server environment a disaster waiting to happen? Exchange 2000's dependence on Windows 2000 Active Directory (AD) complicates Exchange 2000 disaster-recovery planning. Your recovery efforts might involve not just your Exchange team but also the people responsible for AD. Knowing how to back up and recover non-Exchange components, as well as being aware of recent changes in backup technologies, can help you plan and implement a course of action that can postpone disaster and speed recovery.
Determining Risk
As an Exchange administrator, you're probably responsible for several aspects of your company's disaster-recovery process. This process can include conducting a risk analysis that identifies the probability and impact of an outage according to specific points of risk, as Table 1 shows; developing a risk-mitigation plan that defines risk-mitigation techniques for each possible type of outage; and implementing the plan on a day-to-day basis.
The first step in a disaster-recovery plan is a risk analysis. With this information, you can begin developing detailed procedures to protect each risk point and calm executives suffering from "Chicken Little Syndrome," who might insist that you look at high-impact events even though those events have a low probability. Instead, you must look at the most likely disasters, which typically result from hardware or application faults.
From a planning standpoint, you need to make a distinction between increasing availability and speeding recovery. This distinction is important because anything you can do to increase availability and avoid disaster can save you a lot of time in recovery. For example, suppose that a power supply in one of your servers stops working. No big dealif you have redundant power supplies. If you don't, though, the server will go down hard, meaning that the OS might flag the drives to run Chkdsk to check for disk errors. This process can take hours for large disk arrays, increasing your outage window even if you don't need to recover any data. If the server is part of a cluster, server failover will increase your availability to the extent that you might not even need to measure recovery time. For another example, using Exchange 2000's built-in deleted mailbox retention can avert the need to set up an Exchange recovery server.
If you can't make components fully fault tolerant, though, you need to build recovery mechanisms that will help get failed components back up quickly. For Exchange 2000 environments, the most likely disaster scenarios involve Exchange or AD server recovery and Exchange Store recovery. Therefore, your disaster-recovery kit needs to include System State backups (the restoration point for both AD and Exchange servers) and Store backups, and you need to know how to recover these components.
System State Recovery
Ideally, you can maintain a standby recovery server that consists of the same hardware as your production servers. You need to keep the standby server updated with the same service packs and hotfixes you've installed on your production servers (or at least verify the OS and application versions before performing a recovery). A standby server shouldn't be a member of a domain; a System State restore will establish the server's domain identity. (I've even seen a System State restore work across partitions, meaning that you can create or maintain a second Windows installation on a production server and simply boot to that installation to perform the restore. However, with the many security patches that have appeared since the CodeRed virus, managing multiple boot partitions is an unwieldy process.) Be sure to keep the standby server off the network if you haven't applied the most recent security patches. And be prepared to deal with not only an Exchange server failure but AD server failures as well.
Exchange server failure. When you recover an Exchange server, first restore the System State to the standby server and reboot. This action reestablishes the server's identity. Then, run Exchange Setup with the Disaster Recovery option, as Figure 1 shows (or use the /disasterrecovery switch from the command line). This option pulls the Exchange configuration directly from AD. Reapply the Exchange service pack that you were using.
Be aware, however, of two potential problems with the Disaster Recovery option. First, the option doesn't work in a cluster (you must evict the node and reinstall it manually). Second, the option might not correctly install the Microsoft Search component, which is necessary for full-text indexing. "Troubleshooter: Restoring a Clustered Exchange Database to a Nonclustered System," September 2002, http://www.exchangeadmin.com, InstantDoc ID 25839, discusses the first problem; the Microsoft article "XADM: Disaster Recovery Does Not Correctly Setup Full-Text Indexing" (Q295921, http://support.microsoft.com) documents the latter problem.
AD server failure. When an AD server crashes, you must decide whether to rebuild the server or replace it with a new server. Because AD uses multimaster replication, you might not immediately notice the loss of a domain controller (DC)with two exceptions. First, if the DC is a Global Catalog (GC) server, which Exchange uses for directory access and referrals to Outlook clients, Outlook clients might hang. If you're running a version earlier than Exchange 2000 Service Pack 2 (SP2), you might even need to reboot the Exchange server so that it can find another GC server (another good reason to apply SP3 or SP2). Second, if the DC owns a Flexible Single-Master Operation (FSMO) role, you need to manually seize that role or specific operations such as password resets will fail.