Email systems depend on many hardware and software components. If any element fails to operate in the required manner, if the hardware suffers a catastrophic failure, or if a physical disaster such as an electricity outage afflicts the hardware, you must have good system backups to get users back online as quickly as possible.
The Exchange 2000 Server installation procedure enhances the standard Windows 2000 Server Backup utility (ntbackup.exe) to support the Exchange Store's transactional nature. These enhancements add support for Exchange 2000's .edb and .stm file formats, let backup agents (i.e., ntbackup.exe or third-party products) copy databases to tape without shutting down Exchange services, and let you select which servers and databases to back up or restore. Understanding the basics of the most important and useful disaster-recovery processesincluding full backups, snapshot and clone backups, and the general recovery procedurecan help you prepare for disasters and recover from them quickly.
Full Backups
Incremental and differential backups copy only transaction logs to the backup set. Incremental backups copy the logs created since the most recent backup of any type; differential backups copy the logs created since the most recent full backup. Thus, to restore Exchange databases, you need the most recent full backup, the most recent full backup plus all incremental backups taken since then, or the most recent full backup plus the most recent differential backup. (Some companies believe that taking a mailbox-levelaka brick-levelbackup is useful because then you can quickly restore a mailbox or specific items that have been deleted accidentally. However, Exchange 2000's Deleted Mailbox Recovery feature generally can prevent the necessity for this type of backup. Brick-level backups are now an anachronism; avoid them whenever possible because of the related performance penalty.) Obviously, restoring from a full backup is easiest because it involves the fewest tapes and the least chance for mistakes.
Whenever possible, avoid taking Exchange 2000 offline when you perform a backup. When Exchange is offline, users can't connect to their mailboxes. Also, each time you bring the Information Store service online again, the Store generates public folder replication messages to request updates from replication partners. Online backups are perfectly safe. During online backups, the Store calculates checksums for each page before streaming the pages to the backup media; if a checksum doesn't match, the Store generates the infamous -1018 error and halts the backup operation.
To prepare for a full online Exchange backup, the backup agent establishes the type of backup (i.e., full) and the target media (i.e., tape or disksee the sidebar "Snapshots and Clones," page 10, for an evaluation of using snapshots or clones to speed backups to disk). You can perform a remote backup across the network, but I recommend against doing so unless you have a capable high-speed link between the source database and the target backup device.
The agent then makes a function call to inform the Extensible Storage Engine (ESE) that the backup is about to begin. ESE logs event ID 210, which indicates the start of a full backup, to the Application log. ESE closes the current transaction log and opens a new transaction log. The Store then directs all transactions that occur during the backup to the new set of logs, which will remain on the server after the backup is complete. (For information about the role of checkpoint files and patch files in backing up transaction logs, see the sidebar "Checkpoint and Patch Files," page 11.)
The backup process begins. The backup agent requests data, and the Store streams the data to the media in 64KB chunks, each made up of sixteen 4KB pages. As it begins the backup of each database, ESE writes event ID 220 to the Application log, noting the size of the file.
As the Store processes each 4KB page, it verifies that the page number and cyclical redundancy check (CRC) checksum, which reside in the first 4 bytes of each page, are correct. This verification ensures that the page contains valid data. If either piece of data is incorrect, the Store records a -1018 error in the Application log and the backup API stops processing dataa step that might seem excessive but that stops administrators from blithely taking backups of databases that might contain internal errors.
ESE logs event ID 221 as the backup of each database is finished. After writing all the pages from the target databases to the backup media, the backup agent requests that ESE write the prebackup transaction logs to the backup media. ESE records event ID 223 to indicate that the transaction logs have been written to the backup media. During a full backup, ESE then deletes those logs (noting the fact in event ID 224) to release disk space back to the system. Doing so is quite safe because the transactions are committed to the database and are available in the backup log set.
ESE closes the backup set, and typical operations resume. ESE records event ID 213, which indicates successful backup completion.