Virtual Machines
Live Storage Migration in Virtual Machines and VMWare Example
Posted by Mohammed Q. Hussain on .In Virtual Machine Monitors’ world there is a term called “Live Migration”. In this post I’m going to explain what do “Live Migration” and “Live Storage Migration” mean, then I’m going to show how does VMWare implements its Live Storage Migration and finally I’m going to share with you a simple implementation of VMWare’s Live Storage Migration.
The Meaning of Live Migration
Well, imagine that there is a running virtual machine - let’s call it VM1 - which has a webserver on it and serving some important websites that we need them to be up 24/7, at some point we need to do some upgrades on the computer which runs this virtual machine, let’s say, for example, we want to add more RAM for that computer or we want to update the kernel of the operating system and we need to reboot that computer. In this case the obvious solution is to shutdown the virtual machine then doing the upgrading process, but remember, downtime for the websites that we serve is not an option, maybe it is acceptable for few seconds but not for minutes.
In this situation we could use the “Live Migration”, the concept is simple, we are going to run a new virtual machine - let’s call it VM2 - on another computer then we run “Live Migration” process which is going to move the state of VM1 (source) to VM2 (destination), after the process finishes the webserver will be running on VM2 instead of VM1, so it is fine to shutdown VM1 to do the upgrade while VM2, which is working on another computer, serves those important websites.
And, What is Live Storage Migration?
According to the paper “The Design and Evolution of Live Storage Migration in VMware ESX”, early “Live Migration” solutions did not migrate the virtual disks of the source virtual machines but only the state of CPU and the main memory. “Live Storage Migration” means that also the virtual disks of the source virtual machine will be moved during the process.
VMWare ESX’s Implementation of Live Storage Migration
(Note: this part of the post uses the paper “The Design and Evolution of Live Storage Migration in VMware ESX” as a reference)
In VMWare ESX there are three techniques that implement Live Storage Migration: (1) Snapshotting, (2) Dirty Block Tracking (DBT) and (3) I/O Mirroring.
Snapshotting
In this technique a snapshot - let’s say a copy - of the virtual disk will be taken by the hypervisor, any new changes on the files will be written on this newly created snapshot while the base virtual disk will be copied to the destination virtual machine. When the copying process completes the snapshot will be written to the base disk, a new one will be created and the process will be repeated until the changes on the snapshot be as minimum as possible (according to some threshold) where the source virtual machine is suspended – to ensure that there are no more modifications on the source’s disk take place - to complete the migration process.
Dirty Block Tracking (DBT)
We know that the hard disks are divided into blocks. In this technique the migration process starts by copying the virtual disk of the source and while doing that the hypervisor is going to track any changes on the data of the disk and mark the block of this data as a dirty block. After the initial copying finishes the hypervisor examines the dirty blocks and moves them to the destination. This process will be repeated until reaching some threshold which causes the source virtual machine to be suspended and copying the remaining dirty blocks to complete the migration process.
I/O Mirroring
The most efficient technique is I/O Mirroring in terms of migration time and downtime. As usual, the process here starts by copying the source’s virtual disk and the offset where the copying process reached will be held - let’s call it cp_offet -. When a write process happens while the copying process is working there are three cases. (1) The modification is on an offset before cp_offset, (2) The modification is on the same offset as cp_offset or (3) The modification is on an offset after cp_offset.
In case (1), Obviously the offset which the changes must be written on is already copied into the destination. Therefore, the changes will be written on both source’s and destination’s disks. In case (2) the offset which the changes must be written on is under copying, the write process will be waiting for a while until the copying of this offset is done, then the write process starts and the new changes will be written on both source’s and destination’s disks. Finally in case (3) the data there has not been copied yet, therefore, it will be written only on the source’s disk and at some point of time it will be copied.
The Paper “The Design and Evolution of Live Storage Migration in VMware ESX”
In case you are interested in more details of those three techniques and their evaluations I recommend you to read the paper The Design and Evolution of Live Storage Migration in VMware ESX” which this post based on. Some really interesting parts have been omitted on this post in sake of simplicity - specially the hot blocks part -.
Example Implementation of I/O Mirroring
Under the supervision of Prof. Hussain Almohri I have written a simple implementation of I/O Mirroring in C and I used FUSE to track the offsets. I published the source code under GNU GPL in my GitHub account here