Subscribe to Windows IT Pro

 

Get Newsletters

  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips

Subscribe Now!

February 01, 1998 12:00 AM

Inside Microsoft Cluster Server

Windows IT Pro
InstantDoc ID #2943
Rating: (0)
Understand how this clustering solution works

1997 was a landmark year with respect to high-availability solutions for Windows NT: A slew of independent vendors released products that make applications more robust in the face of system failures, and Microsoft released its long-awaited clustering solution. Microsoft Cluster Server (MSCS), formerly code-named Wolfpack, is likely to be the dominant player on the clustering scene, which is already crowded with solutions. Two things differentiate MSCS from the crowd. First, it has been codeveloped with input from a number of big-league players, including Tandem, Digital, and IBM. Second, it is a standalone clustering solution and an open platform for the development of applications that can take advantage of the fault-tolerance the MSCS infrastructure provides. Third-party clustering solutions usually attempt to work with off-the-shelf applications. These applications don't realize they're executing on top of failover capabilities, a situation that can limit effectiveness.

Because MSCS has an extensible architecture for building cluster-aware applications and enjoys wide industry support, it is the de facto clustering standard for NT. More and more enterprise-level applications will be designed to be MSCS-aware, and as a systems administrator or developer, you are likely to cross paths with MSCS in the near future—if you haven't already. In this column, I'll define clustering, take a look inside MSCS, and describe how it works.

Introduction to Clustering
In clustering, two or more servers work together to service a group of tasks, provide fault tolerance, or offer scalability (see Mark Smith, "Clusters for Everyone," June 1997). To provide continuous availability, a cluster must include at least two systems, so that if one system crashes, the applications it was running can move to another machine, or node, in the cluster (one feature of MSCS is its ability to restart applications before moving them to functioning machines). From the outside, a cluster appears to be one computer because MSCS supports the concept of virtual servers: MSCS creates a virtual server to represent a particular application. When MSCS moves applications from a failed system to working nodes in a cluster, clients aren't aware of the change because they are talking to the virtual server, which moves with the application—clients do not talk to the node that the virtual server is mapped to. As applications migrate, a client might notice only a pause in service. The virtual server model insures that applications running on a cluster are highly available—only multiple failures would cause a real disruption.

Right now there are two software models employed in clustering that affect the way nodes share hardware: the shared-disk model and the shared-nothing model. In the shared-disk approach, software running on any of a cluster's nodes can access any disk connected to any node, as Figure 1, page 58, shows. Locking protocols coordinate the data consistency the nodes see when they access a disk. The shared-disk model can reduce the need for peripherals and allows for easy data sharing. However, the shared-disk model has a primary disadvantage: As the number of nodes in a cluster grows, so does the locking protocol overhead for accessing shared disks—a situation that can take a toll on performance.

The shared-nothing model, as Figure 2, page 58, illustrates, assumes that each node in a cluster owns certain disks and that no direct sharing of disks between nodes occurs. Even when disks are connected to multiple nodes, as MSCS requires, only one node can own each disk. A node cannot access a disk it does not own unless the node that owns the disk fails or gives up control of the disk. The shared-nothing model can increase costs because it requires more resources, but it improves scalability because there is no sharing to create bottlenecks.

Clustering contrasts with other methods of making applications highly available. In clustering, there can be a significant pause when a functioning node takes over running applications from a failed node. The length of the pause is application-dependent and can be as short as 30 seconds or as long as 10 minutes. Thus, clustering is a good choice for environments such as Web serving or corporate databases, in which pauses in service are not system-threatening. However, for systems in which an outage of even a few seconds can cause serious problems (e.g., avionics or airline reservations computers), another solution is appropriate. In this setup, applications run simultaneously on two or more computers. The system detects failures when it compares the output from different application instances. If different instances have different output, the system votes or arbitrarily chooses one instance as correct. Then, even when one computer fails, there is no pause in functioning. The multiple-computer approach is more costly than clustering because it requires two or more computers to do the work of one, and it might also require proprietary hardware.

MSCS Concepts
MSCS clusters consist of nodes, individual computers complete with their own processor, memory, and system disks. Nodes in an MSCS cluster must have access to at least one shared disk. In the initial release of MSCS, this shared disk must be a SCSI disk that is either dual-ported or accessible through two different SCSI adapters attached to the same SCSI bus. The nodes must be connected on a private network separate from LAN traffic. The data files, IP addresses, network shares, and other parts of the installed server applications on the nodes are the cluster's resources. A resource can be active on only one node at a time, and when the cluster detects that a resource has failed, it relocates the failed resource to a different node.

MSCS organizes resources into relational groups. One type of relationship is a dependency relationship. For example, if an application requires a network name and TCP/IP address to become active before the network share comes online, you can specify the TCP/IP address resource, network name resource, and file share resource as resources that belong to the same group (the file share group). When MSCS detects a resource failure, it moves all the resources in the failed resource's group to a different node and restarts the failed resource. In addition, you can establish dependencies between various resources in the same group so that they come online in a specific order.

Related Content:

ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

White Papers

Get your Windows 7 deployment off to the right start by implementing PC lockdown. A locked-down environment is easier and cheaper to support since users are less likely to make unnecessary changes to the core system configuration - read more here!

Essential Guides

Is your iSCSI "lossy"? The reality is that most off-the-shelf Ethernet hardware deployed for iSCSI can lose packets, resulting in slow performance or application downtime. Learn how to assess your current iSCSI infrastructure and engineer an advanced iSCSI SAN infrastructure.

Web Seminars

What's the best way to keep your network safe from malware? In this web seminar, security expert Greg Shields suggests an alternative method to the traditional blacklisting approach that is common with anti-virus and anti-malware solutions.

eLearning Series

We bring the experts direct to you to share their real-world perspective and expertise. During each event, three sessions stream in real time, so you can learn, ask questions, and get solutions.
Upcoming event: Getting the Most with Exchange 2010 with Paul Robichaux

Subscribe to Windows IT Pro!

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.