Clustering and mirroring your Web servers
for maximum uptime

As Web master for Windows NT Magazine, I know that
downtime is the absolute worst thing that can happen to a Web site. Several
vendors have solutions to help prevent this problem. One such vendor, Valence
Research, is developing a Web clustering solution, Convoy Cluster Software, that
lets you balance your Web servers' load and make them fault tolerant. The
product looked interesting and simple to implement, so I gave it a try.
In addition to setting up the Web cluster, I needed a way to make sure that
both Web servers in our cluster were serving the same Web pages. Applications
such as Octopus SASO can help you synchronize the information on both servers
(for a review of SASO, see Carlos Bernal, "Octopus SASO 2.0," June
1997). However, I felt this product was overkill for data replication. After
making a few inquiries, I decided to use Windows NT's directory replication.
Convoy Cluster
Convoy is simple to install and operate. If you follow the detailed
directions, you can have a working cluster up and running in about 30 minutes.
However, if you skip one vital step, such as I accidentally did, your Web
servers will start playing ping-pong with blue screens of death. In this
situation, one server covers for the other while it's down. Unfortunately, when
the server that was down comes back up, it causes the other server to go down.
This cycle will repeat indefinitely. Valence Research's technical staff was
helpful in pinpointing the problem in the configuration I had set up. When I
reinstalled and reconfigured the machines the second time, everything worked.
You can set up Convoy on machines with only one NIC. However, if you want
the machines to be able to talk to each other so that you can duplicate
information, you need to install two NICs in each Web server. I configured my
environment using two NICs so that I could use NT's directory replication.
Convoy refers to the two NICs as the dedicated adapter card and the cluster
adapter card.
Installing Convoy
Although I can give you a general sense of how to install Convoy, make sure
you follow the installation directions to the letter. You install Convoy as a
new adapter. The installer adds the Convoy Virtual adapter and a Convoy Driver
protocol to your system. After the installation is complete, the Convoy Setup
screen, which you see in Screen 1, automatically opens so you can
enter your Convoy clustering variables. You use this screen to type in your
cluster IP number, each server's dedicated IP number, the priority status of
each server in the cluster (the lower the number, the higher the status), and
how you want to distribute the cluster.
The next step is to view the network bindings for all protocols in the
Network applet of the NT Control Panel. While you're at this screen, you need to
configure the bindings so that the Convoy Driver protocol can talk to the Convoy
Virtual adapter and cluster adapter, but not to the dedicated adapter. You also
need to configure the bindings so that TCP/IP can talk to the Convoy Virtual
adapter and dedicated adapter, but not to the cluster adapter. For information
on how to configure these bindings, refer to the Convoy documentation. In
essence, you are creating a firewall because only Convoy knows how to talk
directly to your machine via the Convoy Virtual adapter and cluster adapter. The
outside world can't see or use the IP for your dedicated adapter.
How Convoy Performs
To test Convoy, I simulated 50 simultaneous users requesting HTML pages from
the cluster IP. Right off the bat, I could see the two machines sharing the
load. When I made a page request from the Web cluster, Convoy built some of the
page from one server and the rest from the other. I was able to verify this load
sharing because my two development machines didn't have the same version of Web
pages when I started the test. I then increased the number of simultaneous users
to 75, and the machines just kept purring. For reference, the first server is an
Intergraph Web-300, 200MHz Pentium Pro with 128MB of RAM. The second is an
Intergraph Web-300, 150MHz Pentium Pro with 64MB of RAM. In my environment, I
couldn't create enough client requests to slow down these machines.
To provide fault tolerance, Convoy redirects incoming traffic to another
server in the cluster when the software detects that the first server is not
responding. To determine which servers are active, the clustered machines
periodically exchange broadcast messages with each other. This communication
lets each machine know the status of the other members in the cluster. When the
status changes, such as when a server fails or leaves the cluster, Convoy
invokes a convergence. In Convoy terms, a convergence is when the cluster
reestablishes itself so that it can redistribute the load. Convoy invokes a
convergence every time you add or remove a server from the cluster.
By default, each server broadcasts a message every second to monitor the
status of the cluster. The cluster waits five seconds (five missed messages)
before it initiates the convergence. The software takes another five seconds to
redistribute the load, so the average failover time is 10 seconds. You can
adjust these parameters as needed, but the default values work well without
making the process too slow or overburdening the network. When I tested the
fault tolerance, it worked every time. I could stop the Web service or shut down
one of the servers in the cluster, and the remaining machine took over the
entire load. Even with the default settings, my failover times were closer to 15
seconds. During that time, a Web server will experience a few failed
connections, but these losses beat having to reboot or fire up another machine.
Overall, I was pleased with the way the cluster performed.