Patch Files
The Exchange 2000 database engine uses patch (.pat) files only during backup and recovery operations; it creates one .pat file for each .edb file during an online backup. As Exchange 2000 performs a full backup, it commits database pages to tape. But user transactions continue to occur during backup because Exchange 2000 supports an online backup. If a transaction occurs to an already-committed page in the database and a page split occurs, the .pat file records the page split. If such a transaction doesn't cause a page split, the .log filenot the .pat filerecords the event.
An example of a page-split operation is a situation in which updated data in a 4KB page exceeds the page size. In this case, the page must split into two 4KB pages. Page splits apply only to the .edb files. Patch files don't maintain entries for .stm files because .stm file structure lets Exchange 2000 easily allocate additional data. In addition, Exchange 2000 copies .stm file data differently than .edb data during a typical backup.
The .pat files for each MDB play a key role during recovery. Before Exchange 2000 replays the transaction logs, the program uses the .pat files to apply the page splits to the database file.
Integrity Ensured
Exchange Server's database engine has always ensured data integrity, and this essential feature lives on in Exchange 2000. When ESE writes a page, the program writes a page number and checksum cyclical redundancy check (CRC) to the first 4 bytes of the page. During backup and online maintenance, the database engine computes a checksum for each page so that the program can compare the page number and checksum with the original versions recorded in the page. If either is incorrect, Exchange 2000 lets the administrator know by logging an error in the event log. This -1018 error is a valuable early warning. Exchange 2000 also computes a checksum for each log record in transaction log files. This checksum ensures that every transaction log record is valid. Few database engine technologies provide this degree of integrity checking to page-level and transaction log record-level granularity.
The best protection against -1018 errors is to deploy solid hardware platforms and to practice good configuration management. Except for the transaction record checksum process, Exchange 2000's warning system doesn't change from earlier versions of Exchange Server, and Microsoft believes that customers want to know as soon as the program finds a database corruption problem. Multiple SGs and databases complicate the issue in Exchange 2000, but they also help with database recovery.
New Paradigms in Storage Management
An SG in Exchange 2000 is essentially another instance of the ESE database engine running within the context of the store.exe process. In Exchange 2000, multiple SGs can run on a server, and each SG can contain as many as six MDBs. Figure 2 shows the relationship between MDBs and SGs. In Exchange Server 5.5, only one SG is available on a server. Only one instance of the JET database engine runs on the IS in Exchange Server 5.5 and earlier. That JET instance supports two MDBspub.edb and priv.edb. In Exchange 2000, an administrator has more flexibility with IS design and can partition or segment the server population over a structure of SGs and MDBs, depending on organizational and disaster-recovery needs. For example, an administrator can spread users' mailboxes over several MDBs instead of limiting all users to one database, as in Exchange Server 5.5 and earlier. Suppose 2000 users on an Exchange 2000 server require 100GB of storage. A systems administrator can partition users across ten 10GB databases (200 users per MDB), rather than place them all in a 100GB database.
Multiple SGs in Exchange 2000 provide flexibility, manageability, and security. An ISP can host multiple companies on one Exchange 2000 server by separating them into different SGs. Commercial enterprises, departments, groups, or individuals (e.g., the CEO) can have their own SGsa feature that provides better security and manageability. SGs are also important for clustering scenarios because the typical failover unit is a virtual server configured with one or more SGs.
In Release Candidate 1 (RC1), Exchange 2000 technically supports 15 SGs (plus one reserved for backup and restore operations), with six MDBs per SG. I expect management and disaster-recovery planning requirements to bring the practical limit to five to seven SGs per server. At Exchange 2000's initial release, Microsoft will support only four SGs per server and five MDBs per SG. With clustered configurations, failover complexities might reduce the practical limit even more. Also, Exchange 2000 supports concurrent backup and restore operations on SGs. In this case, an Exchange 2000 server with multiple SGs will let you recover one or more SGs (or an MDB within an SG) while the other SGs are online servicing users. Whatever the scenario, 16 ESE instances are available on the Exchange 2000 server. When you reach that limit, you can't perform any more parallel operations.
New Methods of Access
Besides the standard methods by which messaging clients can access their data (e.g., POP3, IMAP, MAPI), Exchange 2000 offers new methods for accessing and storing data. You can address every item in the program's database with a unique URL. This gives Web clients using HTTP better performance and functionality than they received in earlier versions of Exchange Server. Microsoft calls this feature the Web Store, and it holds new possibilities for knowledge management and Web portal applications. (For more information about the Web Store, see Tony Redmond, "Web-Enabling Exchange 2000," February 2000.)
Exchange 2000 also makes every item in the IS accessible through Win32 API calls and the Server Message Block (SMB) protocol. Therefore, programmers can write applications that directly store data to the program. Users can map drives to their inboxes or favorite public folders. Exchange 2000 uses this method, based on Installable File System (IFS) technology, to access .stm files. The IFS driver (ExIFS) provides direct access for several Exchange 2000 components. The ability to address Exchange 2000 with URLs or Win32 APIs sets the stage for some killer applications from Microsoft and third parties.
Best Practices Become Complicated
Your possibilities for allocating and accessing data in Exchange 2000 are almost limitless. In addition, the best practices you use to manage your servers need to change. In Exchange Server 5.5, storage-design best practices dictate that you need to separate sequential from random I/O when you configure disk subsystems and allocate Exchange Server databases. The same holds true for Exchange 2000. However, because Exchange 2000 supports multiple SGs or database engine instances, you need to apply these best practices to each ESE instance. For example, because each SG has a set of transaction logs that one or more databases share, the best practice of separating sequential from random I/O still applies, and each SG's transaction log set needs to be on a separate volume. If you combine all transaction log sets for every SG onto one RAID 1 array, you lose the advantages of sequential access because a set of sequential patterns combines to create a random pattern.
Continuing with the example, you also need to separate each database onto a dedicated array (i.e., RAID 1 or RAID 5) for best performance. Finally, because of the .stm files' highly sequential organization, some environments might require separate arrays for the .stm files. Putting this example together for a server with six SGs, each hosting four databases, would create a fairly complex storage design. Until Exchange 2000 gains acceptance and widespread deployment, systems administrators can only anticipate the design possibilities. But managing storage for Exchange 2000 will be more complicated than for Exchange Server 5.5.
The powerful new methods of storing data in Exchange 2000 are worth the additional complexities. Fundamentally, the underlying database engine hasn't changed much in Exchange 2000. You can expect the same performance, scalability, and reliability features as in Exchange Server 5.5. Exchange 2000 takes messaging data storage to the next level. The ability to store all the semistructured data within an enterprise is a driving force behind Exchange 2000 storage. Microsoft will build many future products on Exchange 2000, such as the forthcoming Tahoe product. The term Web Store will gain wider use than simply in the context of a messaging system. Also, Exchange 2000 has a new target marketISPs and application service providers (ASPs). These markets have different storage and scalability requirements than most corporate messaging systems do. Exchange 2000 storage is well positioned for ISPs, ASPs, and corporate messaging systems.