Subscribe to Windows IT Pro
August 01, 1997 12:00 AM

RAID: Enhanced Disk Storage for Windows NT

Windows IT Pro
InstantDoc ID #217
Rating: (0)

Optimizing for Fault Tolerance: RAID 1 and 5
Optimizing your server's disk storage is a balancing act: You want the best possible performance, but you need to protect your data, too. RAID 1 and RAID 5 are two widely used methods for protecting data.

RAID 1, disk mirroring, is most often used for smaller critical data volumes. It gives you complete fault tolerance (either drive in the mirror set can fail without affecting system integrity or performance) and slightly better performance than no RAID. The tradeoff? Because both drives are exact copies of each other, you get only 50 percent of the disk capacity you purchased.

RAID 5 is the most commonly used option for fault-tolerant disk volumes in NT because most manufacturers implement and support this method, it is part of NT Server, and it offers a reasonable compromise between performance and disk capacity. RAID 5 offers enhanced performance, protection, and far less capacity loss than RAID 1. Because you can build a RAID 5 volume out of as few as three drives, the maximum capacity you lose is 33 percent; the more drives you add, the less total space you lose. RAID 5 offers better I/O read performance than no RAID at all and in some cases, is even better than RAID 0 (because of the striping algorithm used). The drawback of RAID 5 is that write performance suffers significantly because every I/O operation requires a parity calculation. This performance hit in software RAID 5 is high; you'll probably want to use a fast RAID controller to compensate for the overhead.

The advantages to RAID 5 are that you can build very large fault-tolerant disk volumes, and any drive in the stripe set can fail without damaging data. However, fault tolerance doesn't mean you won't suffer a little if a drive fails. When one drive disappears from the stripe set, either your system CPU or the RAID controller must compensate on the fly by using the remaining data and parity information to reconstruct the data for every I/O request. Depending on your system and controller, this reconstruction could mean as much as a 50 percent performance hit on that volume--but at least you're still running!

In NT, this recovery process is automatic (as it is on hardware controllers). NT also automatically rebuilds the volume when you replace the faulty drive. As soon as the system gets a new drive, it begins the background process of reconstructing the data on the new drive in the same way it handles I/O requests on the fly (this process can take several hours, depending on the volume/disk size). The process slows performance (more with software RAID than on an accelerated controller), but as soon as reconstruction is finished, system operations return to normal.

Also note that in software RAID 5, you often cannot break the set to add a new drive. Such behavior makes RAID 5 on NT not such a great option, and some experts never recommend this approach. In contrast, this issue does not arise with hardware RAID.

Other Fault-Tolerance Options
Two additional RAID fault-tolerance hardware options are RAID 3 and 4. Although they're less common on NT systems than other options (and NT does not support them), they offer fault tolerance through striping with parity data.

In addition to providing fault tolerance through RAID, some disk controllers have special features that ensure availability in the event of a disk crash. Some RAID arrays feature hot-swap drives: You can remove and insert disks without powering off the disk cage or even the specific slot.

A hot swap­capable array should never go down due to a drive failure (barring component death of the backplane, faulty power supplies, or similar problems). Systems without hot-swap drives require you to power down the system to replace a bad drive. In systems with hot-swap bays, the controller/software detects the new drive coming online and begins repairing the volume.

Another option is a hot-spare--a drive in the array that waits in standby mode. If any other drive in the array fails, the system automatically switches over to the hot-spare and begins rebuilding, without administrator intervention. When you replace the faulty drive, it becomes the new hot-spare. You can enable hot-spares through the controller's BIOS or management software.

The Best of Both Worlds
A few combined RAID levels (e.g., RAID 10, 30, or 50) offer both performance and fault tolerance by using two forms of RAID on the same logical volume at the same time. As you might expect, you pay more to have both capabilities. This extra cost is because NT's Disk Administrator tool alone won't let you combine RAID levels; to do this, you must combine a hardware RAID controller with NT's RAID software functions.

One combined RAID level is RAID 10, also called mirrored stripe sets (i.e., a RAID 0 stripe set is mirrored to another stripe set). RAID 10 offers excellent gains in read and write performance in sequential and random transaction environments. In fact, it's the best overall performer of all RAID levels. The cost, as with mirroring, is that you lose 50 percent of your planned disk capacity. But, where simple mirroring (RAID 1) costs you only one drive per mirrored set, RAID 10 costs you as many drives as are in the RAID 0 stripe set (which can get expensive). Like RAID 1, RAID 10 makes a fault-tolerant volume with the performance advantages of striping and no performance hit in the event of a drive failure.

Another combination of RAID 0 and 1 is RAID 01, or striped mirror sets, which has similar characteristics to RAID 10. The main difference between RAID 10 and 01 is which RAID level the hardware controller handles and which the software handles. In RAID 10, for example, if the software handles the striping, the controller performs the mirroring; in RAID 01, vice versa

Not all RAID controllers support level 10 or 01. You'll need to check which RAID levels a controller supports before you buy it. However, you can make combined RAID by using hardware for the first part (RAID 0 striping or RAID 1 mirroring) and software for the second (the alternative mirror or stripe, respectively). This solution does not perform as well as using a RAID hardware controller that can handle both at the same time. But you can still build high-performance, fault-tolerant disk volumes without replacing an existing RAID controller.

Other RAID levels, such as 30 and 50, can also enhance performance and fault tolerance, depending on your applications. With them, you can build very large disk volumes out of commodity drives. However, these RAID levels are of limited use in most low- to midrange NT server situations, unless your goal is to experiment or achieve new and interesting disk configurations. RAID 50 is a good option on an enterprise-scale server where you are trying to build a 500GB or even 1000GB disk volume.

The Right RAID
With the variety of available RAID options, you can choose the right balance of performance and fault tolerance for your site. Mixing hardware and software RAID lets you build disk subsystems specifically tailored to your needs, such as extremely large disk volumes or multiple-fault­tolerant arrays. Whatever RAID you consider, it's a disk technology you can't afford to be without.

Related Content:

ARTICLE TOOLS

Comments
  • Dave Navarro, Jr.
    13 years ago
    Aug 13, 1999

    I read Joel Sloss’ August Lab Reports about RAID with much interest. We’re running several NT servers with software-supported RAID 5, and moving to hardware RAID is definitely in our future.
    I find it odd that many hardware developers are concentrating on server performance and forgetting about desktop performance, which can be just as crucial in many situations. Our company publishes programming languages for DOS and Windows, and our compiler is written in several million lines of assembler. On a noncaching SCSI controller and a 200MHz Pentium system, a full compile takes 68 minutes. When we switched to a caching VLB controller with 16MB of RAM, the compile time on the same machine was 39 minutes; 32MB of RAM was 22 minutes. When we switched to a PCI card with 32MB of RAM on the same motherboard, the compile time was less than 13 minutes. All other system hardware stayed the same. This compile was performed in a DOS box in Windows 95. When we upgraded to NT, the compile time was 9 minutes, again with the same hardware.
    After a year of constant service, we had to replace our controller. I was shocked to discover that nearly all SCSI controller manufacturers have abandoned making caching controllers. The only caching controllers we could find were for file servers and included features we didn’t need on a desktop environment. We paid for features we didn’t need, but the performance of a caching controller was worth it in this case. But as we purchase new systems, we’ll have to forgo caching controllers for many of them because the cost is too high on the desktop.
    I hope that manufacturers will realize that people in many industries rely on desktop system performance as much as they do server performance. Please bring back desktop caching controllers.

    --Dave Navarro, Jr.

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.