Most of the time, we think of Microsoft Exchange Server databases as being big, monolithic chunks of information. Exchange administrators usually think
of a mailbox database as containing a stew of mailboxes, folders, individual mail items, and mail metadata, all lumped together in an amorphous mass.
This view is understandable given that Microsoft doesn't fully document the internal structure of the .edb files that Exchange Server's Extensible
Storage Engine (ESE) database creates. However, there's enough material out there (e.g., Brett Shirley's excellent presentations at TEC 2010) to give
database aficionados an idea of how data items are structured and linked together.
Despite this nebulous view of the database, I've never yet met an Exchange administrator who didn't know that .edb files are divided into pages. I
think that's because the term pages is so pervasive in the Exchange world; it's part of the documentation and online help for Eseutil and
Isinteg (remember those?), to cite just a couple of instances. However, it's a little misleading to think of an .edb file as nothing more than a
collection of pages because the information on those pages is interlinked. Losing a single page can have effects ranging from minimal to
catastrophic-it all depends on which page you lose.
ESE uses a number of mechanisms to protect against page-level corruption. For example, each page contains a checksum that's generated at the time the
page is written. Any time you want to know if a page is valid, you can read the page data, compute a new checksum, and see if the new value matches the
stored checksum. That's exactly what happens during streaming ESE backups in Exchange 2007, Exchange 2003, and Exchange2000.
In Exchange 2010, a background maintenance task scans each page and performs the checksum check; the process is scheduled such that every page of every
database, on both active and passive copies, should be scanned at least once every seven days. The checksum operation is throttled so that it doesn't
read more than about 5MB/second worth of data, so its I/O impact is light. You can change this behavior so that checksum scans take place during the
regular database maintenance window, but Microsoft recommends that you leave the default behavior in place.
What happens if the checksum scan indicates that a page contains an error? That's where the page patching process comes in. This name is a bit
misleading because the page itself isn't patched. Instead, the damaged page is replaced with a clean copy from a replica of the database. The patching
process is conceptually simple: When a page fails the checksum check, a new copy of the page is retrieved from the transaction logs on another database
availability group (DAG) member that contains the same database. By replaying only that portion of the logs that contains data for the target page, the
page contents can be replaced without affecting any other pages. The actual steps required to do this vary according to whether the damaged page is on
an active or passive copy of the database. Ross Smith's excellent post on the Exchange Team Blog explains the steps in detail.
Note that page patching can take place only in highly available Exchange environments such as DAGs. Otherwise, Exchange has a couple of built-in page
repair mechanisms, but if the page can't be repaired, it will be marked as bad. You'll then have to reload from backup.
These maintenance tasks, along with the others Ross describes, run with a goal of keeping your databases healthy and consistent, but they're no
substitute for a proper high-availability design (if your business needs warrant it) and a robust backup system. Yes, yes, I know that it's possible to
run Exchange 2010 using DAGs and not doing any backups, but that's a topic for another column!