Subscribe to Windows IT Pro
August 01, 1996 12:00 AM

Digital Clusters for Windows NT

Windows IT Pro
InstantDoc ID #2638
Rating: (0)

What are 99.9% PC/LAN server up-time and availability worth to you? More to the point, can you afford to bet your business on Windows NT?

Many companies have their LAN, databases, and all other business functions on NT systems. But companies such as financial institutions question whether NT is ready for prime-time, mission-critical applications. When you rely on computers for your accounting, product development, human resources management, data management, and now sales through the Internet, your systems must be operational 24 hours a day, seven days a week. Failure is not an option.

Clustering, which has been around in Unix and VMS for more than 10 years, is one technology for achieving near 99.9% server up-time. By letting you duplicate a mission-critical system, this technology guarantees availability, so you can bet your business on your OS.

Now clustering is coming to NT. Although this technology is not on the grand scale of its Unix or VMS predecessors, clustering offers functionality heretofore unknown to PC operating systems and represents a big step for NT toward availability worthy of those major-league, mission-critical enterprise applications. By having two computers instead of just one to support a task, you double your chances for meeting the goal of 99.9% server up-time.

Clustering 101
Before I get into the specifics of Digital's cluster solution, let me explain some clustering terminology: load balancing, primary server, failover (or secondary) server, failover, and failback. You can set up each server so that all five terms apply to it.

For example, suppose you use SQL Server for your accounting and order fulfillment departments and you have two databases that you want to protect by implementing a cluster. In a single-cluster environment (two servers), you can manually load balance--divide the work between the two servers--by installing SQL on both machines. Make one the accounting database's primary server--the system with principal ownership and management responsibility for a resource--and the other system the ordering database's primary server. Then, set up each system to have a primary disk (or disks) on the shared storage array (a chassis housing shared disk drives where cluster software stores and shunts data between systems). This disk will serve as the database device. So far, this configuration is no different from setting up two independent servers, except that the shared disks are on a subsystem physically connected to both servers.

Now, you set up the cluster by configuring each machine to be the other machine's failover (secondary) server--the system that will inherit ownership and responsibility for a resource--to the other. So when one system (the primary server) goes down, it will fail over--relocate cluster services or resources from the faulty system to the operational one. Its resources move to the failover server, and the service (such as a database) keeps running. When the primary server comes back online, the service will fail back--automatically migrate cluster resources from the failover server back to the primary server.

The failover server is not just a cold standby server (as with Novell): The server performs meaningful work and provides more than disk-mirroring or single-system availability through hot-swappable disks. The open architecture of both the software and the off-the-shelf hardware means that you have scaleability built in. You can add disk storage almost ad infinitum and functionality with more CPUs and peripherals such as printers and tape drives.

Digital's Configuration
Digital Clusters for Windows NT is two servers, a network connection, cluster software, and an external disk array with SCSI adapters. (Although the 1.0 product release supports hardware-based RAID, the Digital BA356 storage subsystem doesn't. A future product release will have a built-in RAID 5 controller. Also, version 1.0 does not support software-based RAID through NT.) A key feature of this clustering solution (Digital will contribute this feature to Microsoft's Wolfpack standard--Mark Smith explains Wolfpack in, "Closing In on Clusters," page 51) is that Digital's clustering can use off-the-shelf hardware for disks, network cards, and SCSI controllers.

Digital will officially support only its listed hardware (AlphaServer 1000, 1000A, 400, 2000, 2100, 2100A, 4100, Prioris ZX Pentium, Prioris ZX Pentium Pro, Prioris HX, Prioris XL), but the software works on other systems, too. You can use any two servers running NT Server 3.51 with Service Pack 4, but the clustered CPUs must be the same. You can't mix Intel with Alpha because of differences in how the NT File System (NTFS) handles file tags (information on permissions, groups, etc.) and page logs on Intel and RISC platforms. The two clustered systems don't have to be similarly configured (one can be a dual Pentium and the other a quad Pentium Pro), but on each machine, you have to install the same software (SQL Server, Oracle7 Workgroup Server, or any other application) you intend to fail over from one system to the other.

The disk array is a BA356, which is part of the Prioris kit you buy from Digital, without disk drives. This standard external storage chassis has a multichannel-capable, Fast and Wide, differential SCSI-2 backplane­you can have as many SCSI channels on it as you have drives and controllers in your two servers. You can set up the disk array to be either in the middle of the SCSI chain between the two servers or at the end. Where you put the array depends on whether you leave the terminators installed and whether you use what Digital calls a trilink SCSI adapter. This adapter is a Y connector from the disk array to the two servers. You can order a standard cluster kit from Digital that comes with cables, terminators, and an Adaptec 2944W Fast and Wide differential SCSI-2 controller for each server.

The network connection is just a medium for a heartbeat between the two machines. The heartbeat lets each machine know the other is alive. If one disappears, the failover begins, and the remaining system takes over all assigned functions.

This connection can either be through a dedicated direct connection with a basic 10Mbit Ethernet card, or you can go through your usual high-speed LAN connection. Beware of using your usual LAN, because your domain controller and competition for your Ethernet media can introduce extra delays that can add to the 20- to 30-second failover time. Also, a failure in the part of your network between the clustered machines will initiate a failover: Each cluster machine will think the other is dead, so the clustering software on each server will drop ownership of the disks and leave them offline to prevent data corruption. Digital recommends a direct, standalone connection between the two servers for best performance.

The cluster software is where all the magic occurs. This software acts as a shim--new code that the software adds without disrupting existing OS code. The software provides the means for the SCSI drivers and the network layers in the OS to carry out the clustering capabilities. The software also has an administration tool for setting up drives, failover scripts, and other characteristics of the cluster (such as its network alias and administrator login and password). The logic behind the cluster's operation is complex, but the user and administrator aspects are simple.

Related Content:

ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.