Microsoft Exchange 2000 Server contains technology advancements that will change the way you store, maintain, and recover mission-critical messaging and knowledge-management data. To get the most from Exchange 2000, you need to understand several new concepts and paradigms. As organizations plan their migration to Exchange 2000, systems administrators want to know how the program stores data and provides disaster recovery. This article discusses how Exchange 2000's data storage and database engine differ from similar functionality in previous versions. In Part 2, I'll discuss the program's backup and restore operations and the anticipated best practices for disaster recovery.
Exchange Server Database 101
Thorough knowledge of Exchange 2000's database engine is fundamental to storage and disaster-recovery planning. The database engine, called Joint Engine Technology (JET) in earlier versions of Exchange Server, evolved into the Extensible Storage Engine (ESE) in later versions. ESE is a solid relational database technology similar to that of Microsoft SQL Server or Oracle, although ESE's implementation is quite different. Exchange 2000's ESE, a transacted storage engine that works primarily with messaging and collaborative data, guarantees that all database operations meet the Atomicity, Consistency, Isolation, and Durability (ACID) properties. ACID properties for database engines ensure that you can roll back transactions in the event of unsuccessful completion or replay them in recovery. Microsoft uses ESE throughout Exchange 2000, in places such as the Key Management Server (KMS) and the Site Replication Service (SRS), as well as in Windows 2000's Active Directory (AD).
ESE stores data in a balanced tree (B-tree) structure, which is well suited for storing semistructured datathe kind of content that messaging and collaboration servers deal with. Database engines such as SQL Server and Oracle are better for storing structured data and indexes that are less dynamic. B-tree technology isn't new to Exchange 2000. The technology has existed for years and is one of several fundamental database structures. Exchange 2000 arranges database files into 4KB pages in a hierarchical tree structure that stores messages, properties, and attachment data. Exchange 2000 uses the B+tree variant, which minimizes the tree structure's width and depth to ensure the fastest and most efficient access to data. Because ESE's design favors messaging and collaboration data, SQL Server and Exchange 2000 probably won't use the same technology any time soon.
Database Files
Exchange 2000's database engine operates on several key files that are important to the way the program stores messaging and knowledge-management data. Figure 1 shows how these files relate to one another. One key file is the properties store (i.e., the .edb file). In Exchange Server 5.5, priv.edb and pub.edb hold the private and public Information Stores (ISs), respectively. In Exchange 2000, the .edb file operates similarly to the way it operates in Exchange Server 5.5, storing Messaging API (MAPI) client data in Rich Text Format (RTF), with properties and attachments. Exchange 2000's IS overlays the logical mapping of messages, folders, and tables on the B-tree structure of ESE.
Exchange 2000 adds a new database file called the streaming store (i.e., the .stm file) to handle streaming and native Internet content. Internet clients such as POP3, IMAP, and HTTP use the streaming store exclusively. In addition, when content arrives through SMTP, Exchange 2000 pipes the content directly into the streaming store, thus bypassing the resource-intensive IMAIL conversion process that previous versions of Exchange Server used to convert native Internet content to RTF.
MAPI clients, however, don't use the .stm file. If a MAPI client needs access to content in the .stm file, Exchange 2000 converts the content on demand. The .edb file still contains properties and headers for content stored in the .stm file. The .edb files utilize the B+tree structure, but .stm files store data pages in a clustered-run style similar to that of a file system such as NTFS and better suited to the sequential access requirements of streaming content.
The concept of a message database (MDB) also changes in Exchange 2000. An MDB in Exchange 2000 is a set of .edb and .stm files. Each MDB in a storage group (SG) consists of one .edb file and one .stm file. Both storage mechanisms must exist for the MDB to be consistent and complete. The .edb file stores all the message properties and headers, including properties and checksums for the .stm file pages. The combination of these storage mechanisms lets Exchange 2000 store all data formats in a manner best suited to their type. For MAPI clients that need RTF data, the .edb file is the most efficient storage mechanism. Even when MAPI clients such as Microsoft Outlook 2000 and Outlook 98 use HTML, the .edb file stores their data as rich text. For Internet clients that require MIME content, the .stm file provides the fastest and most efficient approach.
Transaction Logs and Checkpoint Files
Because Exchange 2000 provides flexible, high-performance transacted storage, several other key files are worth mentioning. The most important is the transaction log file (i.e., the .log file). ESE first writes operations to the transaction logs, then to in-memory buffers, and finally to the database. This procedure ensures that all operations maintain database integrity and are recoverable. Each database engine instance (i.e., SG) in Exchange 2000 maintains a set of transaction logs. All the databases within an SG share the same transaction log files. As in earlier versions of Exchange, ESE writes 5MB of transactions to each .log file, closes it, and creates a new file for subsequent transactions. ESE creates transaction log files in sequential order, and the most current log file is edb.log. Because an MDB consists of .edb and .stm files, Exchange 2000 records transactions involving both types of files in the same edb.log file. When the current .log file reaches 5MB, it closes and takes on a new name, edbxxxx.log (in which xxxx is the sequential serial number, or generation, of the .log file in hexadecimal format).
Checkpoint (.chk) files play an important role as Exchange 2000 writes transactions to the .edb and .stm files. The .chk file records the location in the .log files of the last complete transaction that ESE wrote to the database. Each SG maintains .chk files, a practice that aids database recovery. The .chk file lets the ESE know where to start when the database engine replays .log files during recovery, which makes recovery faster and more efficient.