For optimum performance, you need to place the transaction logs on the most responsive volume or the device with optimal write performance. Your aim is for Exchange Server to write transactions to the log files as quickly as possible. The typicaland correctapproach is to locate the log files on a disk separate from the disk that holds the database. To ensure data resilience, the logs and database must be separate. Remember that an operational database is never fully up-to-date. The transaction logs contain transactions that the IS might not have committed yet. Therefore, if the disk holding the database fails, you need to rebuild the database (by restoring the most recent full backup) and let Exchange Server replay the outstanding transactions from the logs that users created since the backup. Clearly, if the transaction logs and the database reside on the same disk and a fault occurs, you're in big trouble. To ensure resilience, mirror the transaction log disk. Don't use RAID 5 on the volume that hosts transaction logs, because it slows down the write operations to the logs and degrades overall system performance. (For more information about RAID 5, see the sidebar "Why Is RAID 5 Slow on Writes?" page 80.) RAID 0+1 (i.e., striping and mirroring) delivers the best write performance for larger volumes and is highly resilient to failure. However, RAID 0+1 is typically too expensive in terms of allocating disks to transaction logs. RAID 1 (i.e., mirroring), which provides an adequate level of protection balanced with good I/O performance, is the usual choice for volumes that host transaction logs. Never use RAID 0 for a disk that holds transaction logsif one disk fails, you run the risk of losing data.
Each storage group uses a separate set of transaction logs. You need to separate the log sets as effectively as possible on multiple mirrored volumes. However, one storage array can support only a limited number of LUs, so compromise might be necessary. On small servers, you can combine log sets from different storage groups on one volume. This approach reduces the amount of storage the server requires at the expense of placing all your eggs in one basket. A fault that occurs on the volume affects all log sets; therefore, you need to take every storage group offline.
Exchange Server databases' I/O characteristics exhibit random access across the entire database file. The IS uses parallel threads to update pages within the database, so a multispindle volume helps service multiple concurrent read or write requests. In fact, the system's ability to process multithreaded requests increases as you add more disks to a volume.
Since Exchange Server's earliest days, most system designers have recommended RAID 5 protection for the databases. RAID 5 is a good compromise for protecting storage and delivering reasonable read/write performance without using too many disks. However, given the low cost of disks and the need to drive up I/O performance, many high-end Exchange Server 5.5 implementations now use RAID 0+1 volumes to host the databases. Expect this trend to continue in Exchange 2000. Although you can now partition I/O across multiple databases, the number of mailboxes that an individual server supports will likely increase, thereby driving up the total generated I/O. Large 4-way Exchange 2000 clusters need to be able to support as many as 10,000 mailboxes and manage 200GB to 400GB of databases across multiple storage groups. In terms of write operations, RAID 0+1 can perform at twice the speed of RAID 5, so any large Exchange 2000 server needs to deploy this configuration for database protection.
To yield the best performance for both transaction log and database writes, use the write cache on the storage controller. However, don't use the write cache unless you're sure that you've adequately protected the data in the cache against failure and loss. You need to mirror the cache and use battery backup to protect it from power failure. You also need to be able to transfer the cache between controllers in case you want to replace the controller. Read operations to access messages and attachments from the database typically retrieve information across the entire file, so controller read cache doesn't help performance. The ESE performs application-level caching.
Don't attempt to combine too many spindles in a RAID 5 volume. Each time a failure occurs, the entire volume rebuilds. The duration of the rebuild is directly proportional to the number and size of disks in the volume, so each disk you add increases the rebuild time. Most volume rebuilds occur in the background, and the volume remains online. However, if another failure occurs during the rebuild, you might experience data loss. Therefore, reducing rebuild time by reducing the number of disks in the volume set is good practice. Deciding the precise number of disks to place in a volume can be a balancing act between the size of the volume you want to create, the expected rebuild time, the data that you want to store on the volume, and the expected mean time between failures. If you want to store nonessential data on a large volume for the sake of convenience, you can combine many disks into the volume. However, an Exchange Server database tends to be sensitive to failure. I recommend erring on the side of caution and not placing more than 12 disks into the volume.
Examination of Exchange 2000 servers' I/O pattern reveals some interesting points, some of which differ significantly from Exchange Server 5.5 patterns. The streaming database delivers sparkling performance to IMAP4, POP3, and HTTP clients because they can store or retrieve data much faster from the streaming database than they can from the traditional Exchange Database (EDB). Clients access the streaming database through a kernel-mode filter driver called the Exchange Server Installable File System (ExIFS). Like the EDB, the ExIFS processes data in 4KB pages. However, the ExIFS can allocate and access the pages contiguously, whereas the EDB merely requests pages from ESE and might end up receiving pages that are scattered across the file. You won't see a performance advantage for small messages, but consider the amount of work necessary to access a large attachment from a series of 4KB pages that the IS needs to fetch from multiple locations. Because its access is contiguous, the streaming database delivers much better performance for large files. Interestingly, contiguous disk access transfers far more data (as much as 64KB per I/O); therefore, to achieve the desired performance, the storage subsystem must be able to handle such demands. Advances in storage technology often focus on the amount of data that can reside on a physical device. As we move toward the consolidation of small servers into larger clusters, I/O performance becomes key. System designers need to focus on how to incorporate new technologies that enable I/O to get to CPUs faster. Exchange 2000 is the first general-purpose application to take full advantage of the fibre channel protocol, which delivers transfer rates as high as 100MBps. Systems that support thousands of users must manage large quantities of data. The ability to store and manage data isn't new, but advances such as fibre channel now let system configurations attain a better balance between CPU, memory, and storage.
Storage Configuration
Most Exchange Server 5.5 servers use SCSI connections. As a hardware layer, SCSI demonstrates expandability limitations, especially in the number of disks that you can connect to one SCSI bus and the distance over which you can connect the disks. As Exchange servers get larger and handle more data, SCSI becomes less acceptable.
As I noted, fibre channel delivers high I/O bandwidth and great flexibility. You can increase storage without making major changes to the underlying system, and fibre channel storage solutions that extend over several hundred meters are common. Win2K's disk-management tools simplify the addition or expansion of volumes, so you can add capacity for new storage groups or databases without affecting users. Better yet, fibre channel implementations let servers share powerful and highly protected storage enclosures called Storage Area Networks (SANs). For most individual servers, a SAN is an expensive data storage solution. However, a SAN makes sense when you need to host a large corporate user community by colocating several Exchange 2000 servers or clusters in a data center. You need to weigh the advantages of a SAN, as well as its additional cost, against the advantages and disadvantages of server-attached storage. A SAN can grow as storage requirements change. Its adaptability and ability to change without affecting server uptime might be a crucial factor in installations that need to support large user communities and deliver 99.99 percent or greater system availability.
Example Configuration
Let's put some of the theory I've discussed into the context of an example Exchange 2000 system configuration. Assume that your server must support 3000 mailboxes and you want to allocate a 100MB mailbox quota. This size might seem large, but given the ever-increasing size of messages and lower cost of storage, installations are raising mailbox quotas from the 10MB-to-20MB limits imposed in the early days of Exchange Server to 50MB-to-70MB limits. A simple calculation (i.e., mailboxes ¥ quota) gives you a storage requirement of 300GB. This calculation doesn't consider the single-instance ratio or the effect of the Deleted Items cache, but it serves as a general sizing figure.
A casual look at system configuration options suggests that you can solve your storage problem by combining seven 50GB disks into a RAID 5 volume. Although this volume would deliver the right capacity, the seven spindles probably couldn't handle the I/O load that 3000 users generate. Observation of production Exchange Server 5.5 servers reveals that each spindle in a RAID 5 volume can handle the I/O load of approximately 200 mailboxes. Spindles in a RAID 0+1 volume push the supported I/O load up to 300 mailboxes. If you apply these guidelines to our Exchange 2000 example, you'll need 15 spindles (i.e., physical disks) in a RAID 5 volume, or 10 spindles in a RAID 0+1 volume, to support the expected load.
Exchange Server 5.5 has one storage group, so splitting the I/O load across multiple volumes is difficult. Exchange 2000 lets you split the 3000 mailboxes across four storage groups. If you use one message database in each storage group, each database is 75GB, which is unwieldy for the purpose of maintenance. To achieve a situation in which each database supports 150 users and is about 15GB, you can split the users further across five message databases in each storage group.
Splitting users this way affects the single-instance storage model that Exchange Server uses. Single-instance storage means that users who receive the same message share one copy of the message's content. But single-instance storage extends across only one database. After you split users into separate databases, multiple copies of messages are necessaryone for each database that holds a recipient's mailbox. However, experience shows that most Exchange servers have low sharing ratios (e.g., between 1.5 and 2.5), and dividing users across multiple databases produces manageable databases that you can back up in less than 1 hour using a DLT. Also, a disk failure that affects a database will concern only 150 users, and you can restore the database in an hour or two. Although four storage groups, each containing five databases, might seem excessive, this example realistically represents the types of configurations that system designers are now considering for early Exchange 2000 deployments.
Each storage group contains a set of transaction logs. Recalling the basics of disk configuration, you might think that you need five mirror sets for the logs and five RAID 5 or RAID 0+1 sets for each set of databases. Managing such a large amount of storage from a backplane adapteryou'd probably double the physical storage to 600GB because you don't want to fill disks and you want room to growis impractical because you'd probably encounter a limit to the number of disks you can connect. Also, a system this large is a candidate for clustering, so you need a solution that can deliver the I/O performance, handle the number of spindles required to deliver the capacity, and support Win2K clustering. For all Exchange 2000 clusters, consider using a SAN either to share load between servers that use the Win2K active/active clustering model or to benefit from advanced data-protection mechanisms such as online snapshots and distant mirroring. If you need to add users, you simply create a new storage group, create a new volume in the SAN, and mount the database without interrupting service. The Win2K Disk Administrator can bring new disks online without requiring a system reboot. Generally speaking, Win2K greatly improves disk administrationa welcome advance given the size of volumes in large configurations. Screen 3 shows the Disk Management MMC snap-in dealing with some very large volumes, including one whose size is 406.9GB! This volume should be large enough to keep many Exchange Server databases happy.
Each database or storage group doesn't require its own volume. You can divide the databases across available volumes as long as you keep an eye on overall resilience against failure and don't put too many databases on the same volume. Exchange 2000 clusters use storage groups as cluster resources, so you need to place all the databases for a storage group on the same volume. This placement ensures that the complete storage group and the disks holding the databases will fail over as one unit.
Transaction logs that handle the traffic of 600 users will be busy. In such a configuration, you could create four separate RAID 1 sets for the logs. If you use 9.2GB disks, you'll need eight disks in four volumes. A 9GB volume has more than enough space to hold the transaction logs of even the most loaded server. For best performance, don't put files that other applications use on the transaction log volumes.
Systems that run with more than a couple of storage groups can group transaction logs from different storage groups on the same volumes. You don't want to create too many volumes only for the purpose of holding transaction logs. Figure 2 illustrates how you might lay out databases and transaction logs across a set of available volumes.
Disks that you use in Win2K clusters must be independently addressable, so if you want to consider a clustered system, you need to use hardware-based partitions, which let the controller present multiple LUs to the cluster or server, as well as use fewer disks. Clusters require a disk called the quorum disk to hold quorum data. I recommend using a hardware partition for this data; the actual data hardly ever exceeds 100MB, and dedicating an entire physical disk is a waste.
If you use RAID 5 to protect the four storage groups, you'll need five 18.2GB disks for each volume. You can gain better I/O performance by using 9 ¥ 9.2GB disks. The volumes have 72GB capacity, which is more than the predicted database size (3 ¥ 15GB = 45GB). You need the extra space for maintenance purposes (e.g., rebuilding a database with the Eseutil utility) and to ensure that the disk never becomes full. Stuffing a disk to its capacity is unwise because you'll probably reach capacity at the most inconvenient time. Exchange Server administrators typically find that databases grow past predicted sizes over time. After all, databases never shrinkthey only get bigger as users store more mail.
Expanding Boundaries
The changes that Microsoft has introduced in the Exchange 2000 IS offer system designers extra flexibility in hardware configurations. Partitioning the IS means that you can exert more control over I/O patterns. I'm still investigating the opportunities that SANs, Exchange 2000 clusters, and different storage group configurations offer, but clearly the number of mailboxes that one production server can support will climb well past Exchange Server 5.5's practical limit of 3000.