Storage is kind of expensive, so oversubscribing where possible is a good way for you to get the bang for your buck, but there are some additional challenges that come along with it.
Traditionally, when you carved out storage for servers in a SAN environment, you would provision it to the server or application. You would set the outright size of the volume to whatever value you determined you needed for that application or server, and all that space would be dedicated to that server or application. So, if you allocated 100 GBs and only 20 are in use, the remaining 80 GBs are held hostage by that application or server and are unusable to the rest of your environment.
Enter thin provisioning. Thin provisioning allows you to allocate your storage on demand, and you can set this at either the hypervisor or storage level (or both, but we don’t recommend that). In other words — to borrow numbers from the previous example — thin provisioning a LUN or datastore tells the server that it has up to 100 GBs when on the backend, the SAN or hypervisor — depending on which one you’re thin provisioning — is showing the actual 20 GBs in use as allocated. In contrast, thick-provisioned LUNs or datastores allocate all the space up front and the SAN sees that space as allocated.
Think of thin provisioning as credit. You have a 100-dollar limit, but you’ve only spent 20 bucks so your bill at the end of the month is only 20. So, conversely, thick provisioning is buying outright – you get the full amount upfront. Thick provisioning preallocates the space on your storage ahead of time and therefore the capacity is shown as allocated from the SAN’s perspective.
If you use thin provisioning on your SAN LUNs as well as thin provisioned virtual disks, you will struggle to fully understand your available capacity, and you will need to monitor from both the hypervisor and the back-end storage array carefully.
You can way oversubscribe your storage doing this and not be able to easily track it. And, if and when you run out of physical space on your SAN, your LUN can go offline, which would then take all of its VMs offline, and you could have a significant outage on your hands.
For example, let’s start with a datastore of 500 GB of physical storage that is thin provisioned from the SAN. We place 10 VMs on it that are thin provisioned at 100 GBs each. So, that takes us to approximately one terabyte of allocated capacity whereas we only have 500 GB actually available. If you’re not closely monitoring the backend storage capacity from the SAN, you take a great risk of having storage outages if you run out of capacity on the SAN. At the same time, however, you have to watch the individual datastore to make sure it too will not run out of space.
That’s why if you thin provision at one place, you only have one location where you might run out of space unexpectedly.
It’s also important to note that if thin provisioning your virtual disks, you may experience first-write penalties on your workloads, which means you may take performance hits on those initial writes as it first must zero out the space before writing the data. For a typical virtual machine, the first writes happen during the OS installation – which is not a big deal – but for virtual machines that contain high IO workloads like databases and log files (i.e. SQL or Exchange) it is possible that you could experience some performance impact with first writes. That’s another reason we usually recommend thick provisioning virtual disks, especially ones that host databases or log files.
Closely monitoring your capacity is essential when it comes to thin provisioning, so we also recommend utilizing a monitoring tool like PRTG or SolarWinds. If you have a one-terabyte datastore and you overcommit your VMDKs on that, you could run out of space on your VMFS volume even though it would show ample space available within the guest OS. A monitoring tool would be configurable to notify you when you pass a certain capacity threshold to prevent issues like this. Bear in mind that you will need to understand how oversubscribed your storage is in order to determine what thresholds to set.
We usually say to thick provision your virtual disks as much as you can, bearing in mind any special requirements like performance. We also recommend thin provisioning your LUNs that are going to be used for VMFS volumes because it eliminates a significant amount of maintenance and attention required from the virtual and storage administrators.
Additionally, it makes sense no matter where you’re thin provisioning to leave a certain amount of emergency space at the hypervisor level for things like snapshots and at the SAN level for any emergency provisioning needs. You don’t really want either level to be more than 80 percent utilized.
For example, with thin-provisioned LUNs, you need to do some regular cleanup in order to get the most space efficiency out of your configuration. For example, if you have a datastore with, say, three VMs on it with the VMFS volume being thin provisioned at the LUN level, and then you perform a storage vMotion of two of those VMs off to another datastore or you delete them off that VMFS volume, the capacity of those two VMs would still show as allocated from the SAN’s perspective.
In order to reclaim that space back to the SAN, there are two options. If you’re on an older version of ESXi, you can use the esxcli storage vmfs unmap command to essentially write zeroes to the space that was previous allocated by those two VMs. This operation will release that space back to the SAN.
This is always great to do after decommissioning servers or if you frequently do storage vMotions. Depending on your environment, this unused allocated storage could unknowingly take up a lot of your allocated capacity, so consider checking on this before opting to buy more storage. It’s important to note that this is not an automated command and it could potentially have a negative impact on your SAN’s overall write performance while it is running, so I usually run it during maintenance windows or slow times.
The VAAI primitive API should also take care of this on the fly if you’re on a newer version of ESXi and your SAN supports it
To sum it all up, lots of consideration needs to be taken with thin provisioning in general. It’s an amazing way to get more efficiency out of your storage, but if it is not properly configured or closely monitored, it can get you into a lot of trouble. Thin provisioning can also be challenging when it comes to capacity planning for your future storage.