Subscribe to Windows IT Pro

 

Get Newsletters

  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips

Subscribe Now!

August 11, 2009 12:00 AM

New Hyper-V Features in Windows Server 2008 R2

Live Migration and Cluster Shared Volumes add high availability
Windows IT Pro
InstantDoc ID #102485
Rating: (2)

With virtualization, you can drastically reduce the number of physical boxes in your environment, carving up fewer but more powerful servers into multiple virtual environments and allocating resources based on the needs of the particular guest instances. This sounds great—until you realize you’re taking all of your eggs and putting them into a much smaller number of baskets.

To manage a virtual environment well, you need to be able to move virtual machines (VMs) between the virtual servers with no downtime and provide high availability for services that don't natively support high availability. Additionally, you need ways to make virtual environments highly available. For that, you need Failover Clustering. http://windowsitpro.com/article/articleid/101489/windows-server-2008-failover-clustering.html

2 Challenges With Windows Server 2008’s Failover Clustering
Windows Server 2008 introduced a Failover Clustering Virtual Machine application/service type, which allows Hyper-V VM configuration and virtual disk resources to be part of a resource group that can be moved between the nodes in the failover cluster. http://windowsitpro.com/article/articleid/101489/windows-server-2008-failover-clustering.html The VM configuration and virtual disk resources must be stored on shared storage.

With the VM as part of a resource group, you can perform a quick migration in planned situations, suspending the VM on the active node and writing the content of the memory, processor, and device registers related to the VM to a file on the shared storage. The LUN (essentially a portion of space carved from a SAN, think of it like a disk) containing the configuration and virtual hard disks (VHDs) is moved to the target node, then the memory read from the file into a new VM created on the target node. After all this is done, the VM becomes available again.

It sounds like a lot of time, but in reality it takes around eight seconds per 1GB of memory configured to the VM; still, it’s a period of unavailability and clients with connections to the VM will time out. You could perform these failovers after hours, so the downtime wouldn’t be a big deal; however, many people want to be able to move VMs between nodes without downtime.

I’d like to point out two potential challenges, however. quick migration works in planned situations where you manually move the VM to a new node. In the event of a node crash where the memory can’t be written to file first, there’s no way to perform a quick migration. Although the VM is started on an alternate node, it will start in a crash-consistent state, which basically means it performs a full boot from the current VHD content, and anything in memory at the time that had not been written to disk would be lost.

The second challenge is that because you’re moving the LUN between nodes when you perform a quick migration, if you want the granularity of failover to be at the VM level, then you can have only one VM on each LUN. This is because the LUN is the smallest disk unit that can be moved between nodes in a cluster. If you placed two VMs on a single LUN and wanted to move only one VM to another node, you couldn’t; the move would force the second VM to also move.

The Solution: R2's Live Migration and Cluster Shared Volumes
In Windows Server 2008 R2, both Hyper-V and Failover Clustering have undergone changes that help to support improved high availability in a virtual environment. The goal with Server 2008 R2 is to provide a zero-downtime planned failover. However, in the event of a node crash, the VM will still start in a crash-consistent state on the new owning node with a period of downtime.

Still, Server 2008 R2’s changes address the two challenges with Server 2008 and planned failover:
1. The need to pause the VM to copy its memory to the target node
2. The need to move LUN ownership from one node to another, which requires a time-consuming dismount and mount operation of the physical disk resource.
Let’s take a look at the changes in Server 2008 R2. They can help you get to a zero-downtime planned failover.

Live Migration and Challenge #1: Pausing the VM
To address the first challenge of having to suspend the VM to copy the memory, the Hyper-V team came up with Live Migration, which copies the VM’s memory to the target node while it’s still running. This sounds very easy, but it’s a little more complicated.

We can’t just copy the memory of a VM to another node, because as we are copying the memory, the VM is still running and parts of the memory are changing. Although we are copying from memory to memory over very fast networks, it still takes a finite amount of time. We can’t just pause the VM while we copy the memory, as that would be an outage.

The solution is to take an iterative approach. The first stage in Live Migration is to copy the VM’s configuration and device information from the existing node to the target node. This creates a shell VM on the target node that acts as a container and receives the VM memory and state.

The next stage is the transfer of the VM memory, which is the bulk of the information and which takes up the bulk of the time during a Live Migration. Remember that the VM is still running, so we need a way to track pages of memory that change while we are copying. To this end, the worker process on the current node creates a “dirty bitmap” of memory pages used by the VM and registers for modify-notifications on the pages of memory used by the VM.

When a memory page is modified, the bitmap of memory is updated to show a page has been modified. After the first pass of the memory copy is complete, all the pages of memory that have been marked “dirty” in the memory map are re-copied to the target. This time only the changed pages are copied, which means fewer pages to copy and the operation should be much faster. However, once again while we are copying these pages, other memory pages change and so this memory copy process repeats itself.

Related Content:

ARTICLE TOOLS

Comments
  • Dimitrios
    3 years ago
    Oct 12, 2009

    The phrase

    "For example, at a minimum,
    a four-node cluster
    required four LUNs
    to be able to move VMs
    independently of each other"

    should be changed to read:

    "For example, at a minimum,
    a cluster serving four VMs
    required four LUNs
    to be able to move VMs
    independently of each other"

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

White Papers

Get your Windows 7 deployment off to the right start by implementing PC lockdown. A locked-down environment is easier and cheaper to support since users are less likely to make unnecessary changes to the core system configuration - read more here!

Essential Guides

Is your iSCSI "lossy"? The reality is that most off-the-shelf Ethernet hardware deployed for iSCSI can lose packets, resulting in slow performance or application downtime. Learn how to assess your current iSCSI infrastructure and engineer an advanced iSCSI SAN infrastructure.

Web Seminars

What's the best way to keep your network safe from malware? In this web seminar, security expert Greg Shields suggests an alternative method to the traditional blacklisting approach that is common with anti-virus and anti-malware solutions.

eLearning Series

We bring the experts direct to you to share their real-world perspective and expertise. During each event, three sessions stream in real time, so you can learn, ask questions, and get solutions.
Upcoming event: Getting the Most with Exchange 2010 with Paul Robichaux

Subscribe to Windows IT Pro!

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.