An ESXi host has access to multiple different networks, but most of them are only for guest VMs to traverse. There are, however, some networks that the kernel itself needs to access, such as iSCSI, NFS, vMotion, vSAN and management. This can create a problem because the kernel has one default TCP/IP stack that has a single default gateway for all of those different networks.
When building a new environment, it’s common to start with the management network first, which will automatically be assigned the default gateway. Whatever you initially set as your default gateway cannot be changed in the UI, so anything in your TCP/IP stack will have to go through that default gateway one way or another if it needs routing. VMware only stores one routing table, no matter how many VMkernels you’ve made. It’s important that you’ve already decided from an architecture perspective how your advanced services will interact with the networking. By default, we prefer if none of those other networks (iSCSI, NFS, vMotion, vSAN) need to route. They should all be flat layer two networks by default unless some very fancy stuff is going on.
Even across multiple buildings, we prefer if these networks stay on a single Layer 2 VLAN and don’t have to route. When it comes to vMotion, since it usually lives between hosts and doesn’t get stretched to another location, you can get away with just making it its own separate vLAN, having it communicate as a layer 2 connection. For more information about ways to achieve that, check out our blog comparing stretch Layer 2 and VXLAN/NVGRE.
You may be asking yourself how you know if something is going to need routing. Well, for example, if you have NFS that has to come from a separate, secure network, you’ve got a routing situation. If you’re using vSAN, you may also run into this when trying to replicate between clusters or sites if they have different networks.
Long story short, you have to make sure that that default gateway, from a networking perspective, will be able to access all these different networks that it will need to talk to. Again, if you can avoid routing and keep it to Layer 2, that’s much preferred. In vSphere 6.0 and up, VMware started on a solution for this where you can configure multiple TCP/IP stacks, which will allow you to have multiple default gateways and routes. However, it’s important to note that custom TCP/IP stacks do not support fault tolerance, management, vSAN, vSphere replication, or vSphere replication NFC. In other words, you can use this feature for NFS or iSCSI.
If you have vSAN, you’ll have to go a different method to allow it to talk to your default network. You’ll have to work through it with your networking team, because designing this properly is highly contingent on your architecture and hardware.
It’s important to make sure that when you start looking at routing any of this high-bandwidth traffic that you consider the implications for the network as a whole. If you suddenly start routing vSAN or iSCSI or NFS, you could impose a very heavy load on the routing devices in the network that could potentially DoS them. You could also be at risk for leading a host plugged into the same switch as its destination to have to go upstream to a routing device before coming back to the same switch, introducing a heavy load on the uplinks and potentially DoSing the switch.