Subscribe to Windows IT Pro
August 11, 2009 12:00 AM

New Hyper-V Features in Windows Server 2008 R2

Live Migration and Cluster Shared Volumes add high availability
Windows IT Pro
InstantDoc ID #102485
Rating: (2)

The CSV filter actually gives us another great feature. In the event a non-coordinator node loses direct access to the LUN—for example, its iSCSI network connection fails—all of its I/O can be performed over the network via the coordinator node using the cluster network (more on this in a second).

This is known as redirected I/O, and it works great. During testing, I accidentally shut off the iSCSI network from one of my boxes, and I didn’t know until I happened to see the CSV was in Online (Redirected I/O) mode. All of the VMs on it were still running great with no performance degradation. Everything continued to work because all the I/O was now being sent over the network between the node running the VMs and the coordinator node for the LUN, where the VMs resided.

Figure 5 shows such a scenario, in which a node has lost access to the storage directly and the CSV filter redirects all I/O via the network.

One question that often comes up when talking about the CSV redirect I/O is, which network is used? Suddenly potentially huge amounts of traffic are being sent over the network between nodes in the cluster instead of over the dedicated storage networks (if iSCSI is used) or cabling (for Fibre Channel/SAS). 

The NetFT network is a virtual network that binds to one of the physical cluster networks that has been enabled for cluster use. It’s the equivalent of the old private network we had in Windows Server 2003 and was used for internal cluster communications such as heartbeat.

The route that NetFT creates for internal communication traffic is based on an automatic metric, which is given to each cluster network: The network with the lowest metric is used for internal communication. The metric assignment is such that a cluster-enabled network that’s not enabled for client communications, and therefore is only for cluster communications, will have the lowest metric and thus will more likely be used by NetFT.

Given the critical role of the network with CSV and Live Migration, you need to make sure your cluster has a dedicated network just for cluster communications that’s connected to a gigabit or higher dedicated switch. Actually, Live Migration doesn’t really use the NetFT virtual network; instead, every VM has its own properties including those that determine which networks can be used for the Live Migration traffic.

In the beta builds of Windows 2008 R2 the default order for Live Migration is based on the same metrics used by NetFT, so whatever network NetFT binds to would be the top network used by Live Migration. This changed in the Release Candidate and the final code, as Microsoft decided it didn't want the NetFT traffic and Live Migration traffic on the same network due to network traffic conflict.

So, by default, the Live Migration traffic is enabled on the network with the second lowest metric. You can change the Live Migration network order and available networks for Live Migration traffic at your discretion.

In Figure 6, you can see that I manually deselected the other networks so Live Migration traffic can only be sent over the Cluster Internal network as I didn't want to use a separate network for Live Migration. You should make sure you check the networks you are using for Live Migration in your environment as it's quite possible Live Migration may choose a network you did not want used for cluster traffic, such as the iSCSI network!

The actual coordinator node can be changed with minimal impact. There’s a slight pause in I/O if you move the coordinator to another node, as the I/O is queued at each node. However, the pause is unlikely to be noticed, which is crucial given how important the coordinator node is to CSV.

Having multiple nodes directly writing to blocks on the disk can cause some complications, mainly because most utilities don’t expect it. When you want to perform a backup or other disk action such as a defragmentation or chkdsk, you need to put the disk in maintenance mode, which disables direct I/O from the other nodes in the cluster and makes them use redirected I/O. This ensures only the coordinator node is accessing the disk, which stops interference with backups and disk operations.

The good news is that in the final Server 2008 R2 release, a PowerShell comdlet, RepairClusterSharedVolume exposes the defrag and chkdsk actions and performs all the other preparation tasks for you.

The Current CSV Scenario
It’s important to note that currently CSV supports only Hyper-V. Although CSV is visible from all nodes (and I’m sure we can all think of many other uses for this method to share a LUN concurrently on multiple nodes in the cluster for Server 2008 R2), when you enable CSV in the Failover Cluster Management MMC snap-in, you are reminded of the Hyper-V exclusive use, so don’t stray.

In the future, other scenarios for CSV might be added. By using CSV, we’re no longer required to move LUNs between nodes in the cluster during the migration of a VM because the LUN is available to all nodes all the time, solving the mount/dismount problem.

CSV combined with Live Migration offers a migration with no user impact. To perform a migration, you simply complete the action shown in Figure 7. You can also still perform the old-style quick migration using the Move virtual machine action, which Figure 7 also shows.

It’s important to look at CSV as more than part of a zero-downtime VM migration story. Previously we had to maintain multiple LUNs to be able to make the information on them available to different nodes in the cluster. For example, at a minimum, a four-node cluster required four LUNs to be able to move VMs independently of one another. Now, with CSV, the LUNs that are part of cluster storage are available to all nodes, so you don’t need separate LUNs. This lets you share your free space among all VMs on a LUN and makes the configuration validation wizard faster, since it has to test fewer LUNs.

Likewise, you don’t have to use CSV with Live Migration. You can use it on its own and accept the small suspension of availability while a LUN is failed over to the new target node. (But why would you want to?)

Or you can use another cluster file system such as Melio FS, which allows multiple concurrent connections from nodes in a cluster. However, it costs more to use a proprietary file system, whereas CSV only requires standard NTFS.

A Great High Availability Story
Live Migration and Cluster Shared Volumes together offer a great high availability story with Hyper-V—and after trying for a long time to break Hyper-V, I can honestly say it works well. For those of us using the standalone Hyper-V Server, the great news is that Hyper-V Server 2008 R2 is built on the Enterprise Edition of 2008 R2 Server Core, which means the free virtualization platform has clustering support—we get Live Migration and CSV for nothing!

 

Related Reading:

And some more good reading:

Related Content:

ARTICLE TOOLS

Comments
  • Dimitrios
    3 years ago
    Oct 12, 2009

    The phrase

    "For example, at a minimum,
    a four-node cluster
    required four LUNs
    to be able to move VMs
    independently of each other"

    should be changed to read:

    "For example, at a minimum,
    a cluster serving four VMs
    required four LUNs
    to be able to move VMs
    independently of each other"

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.