VMware released a variety of great features in vSphere 6.5, including support for HA for NVIDIA GRID technology. This is great news for some of our VDI customers. Previously this function was unsupported and any outages caused considerable manual intervention.
The way GRID works is by using a hardware passthrough to the virtual desktops. This presents a problem for HA and backups because once the GRID card is presented to a VM, the VM has a one-to-one relationship with that specific piece of hardware. In other words, before, the machine could no longer snapshot or vMotion to other hosts in the cluster since it was basically married to that physical GRID card.
Learn more about how NVIDIA GRID works with VDI.
In our typical highly available VMware cluster, this caused problems.
Along with the other features and improvements to vSphere 6.5, VMware introduced a solution to this HA issue. Basically, in vSphere 6.0, it would just fail and the admin would then have to manually power those VMs back on in a different host, allowing that it had the same exact GRID hardware. This is very tedious and is a challenge in larger VDI environments, especially when uptime is a major concern.
This has been a longstanding fundamental problem with VDI – you’re putting more eggs in one proverbial basket (centralized hardware). Before, if a user’s machine had trouble, the downtime was limited to that one person. Within VDI, a server outage could impact a large portion of your users in one fell swoop.
Now, instead of just failing, vSphere now knows that it needs to attempt to vMotion those GRID-associated VMs to another host. VMware has to check first if the new host has the existing slots to fit the GRID profiles coming over, and you still need to have the exact same GRID hardware in your new host.
Fault Tolerance is basically how you keep a machine up without having to move it in the event of some kind of failure. This is achieved through real-time mirroring, so you have two complete copies of your VM in sync with each other at any given moment. When one side goes down, the other can pick up where it left off without any user disruption. vSphere has had this for a while, but it is still not supported for any type of GRID or other hardware passthrough.
Bringing hardware up to snuff to support the workloads we’ve started asking of our VDI deployments has been a difficult and incremental road for sure. Hopefully we’ll see continual improvements in integrating these robust graphic cards into the virtual desktop environments to create a more seamless experience both for the users and for management.