Generally, critical servers in a rack have two power supplies. The theory behind that is – since all your critical data is in your servers — if a power source fails, you have the redundancy of another power supply to take over without downtime.
It’s a common misconception that all redundant power supplies automatically fail over without any downtime. In reality, redundant power supplies require a carefully planned deployment. The reason for that is the redundant power sources may have fluctuations in voltage. If there is enough of a fluctuation, the redundant power supplies may not automatically switch or fail over by design. This prevents the internal components of the servers from getting fried by an internal power spike. See example:
If you have ever accidentally unplugged or shut down one power supply and wondered why your server powered off instead of staying online, that is likely why.
Another common misconception is that servers pull all the power they need from both power supplies at the same time. Generally, in an effort to be “green,” modern servers only pull the total power they need at any one time from their sources, either split between the two power supplies equally (for example .4 A from each) or all from one, with the other as a hot standby (.8 A from one and zero from the other, as shown above). Most modern servers are active with hot standby to be as energy efficient as possible.
Another common mistake we see is both power supplies being plugged into one uninterruptible power supply (UPS). Doing so voids redundancy by reducing it to one point of failure. Keep in mind redundancy at every level of your connectivity. Two power supplies, two separate power distribution units (power strips), two separate UPS, two separate generators, two separate transformers, two separate feeds, two separate carriers. Always try to supply as much redundancy as possible within reason for your organization.
Also take into consideration the total pull for the environment. The most common mistake is that people don’t account for a power distribution unit (PDU) failure when they look at their amperage used. For example, if I have two 20A PDU in my rack, and plug one power supply from each server into each PDU, if I’m at 11A on both, I’m technically below each PDU’s limit. However, if either PDU, circuit or UPS goes down at this point, when things fail over to the other PDU, the amperage will exceed 20A and then everything will still go down. You must account for failures at every level and plan for them.
This is one of the main reason most datacenter racks tend to be going towards 2x 30A 220v instead of 2x 20a 110v — not because the 110 can’t supply enough power, but because they don’t have enough overhead to accommodate a failure once everything is turned on. Additionally, every datacenter has their own high water marks on how much amperage they want you to pull on each circuit. Normally that’s around 75-80 percent usage. In that case, when I buy 2x 20A circuits — rather than having 40A of usable power — and account for redundancy and maximum datacenter amperage allowance, I may only be able to use a TOTAL of 16A, either all 16A on one leg and none on the other or split between.
Another common mistake is to only have the servers on UPS and not the rest of the equipment. If I’m running iSCSI to my storage, keeping the iSCSI switches up (with redundant power supplies connected to multiple circuits) is every bit as important as keeping the servers up, because otherwise all my servers will lose access to their storage and go down hard.
Also keep in mind a small blip in the power that only lasts two or three seconds may be enough to take down user switches. While the issue was only two or three seconds, my switches now have to boot up cold — which may take a few minutes — and then do spanning tree convergence, which may take another minute. Then, a simple two- or three-second blip may cause my users to not have access to servers for five minutes.
It is also all too common to not have a solution in place to cleanly shut down the servers when a power outage exhausts the batteries on the UPSs.
Cleanly shutting down a server during a power failure requires applicable software to be installed on the server. This software is triggered on power failure to initiate a graceful shutdown. In other words, it keeps an operating system from shutting down in the middle of processes and minimizes the risk of data corruption. We can help you set up your redundant power supplies and help you configure a UPS/power solution that not only keeps your servers online longer but also shuts them off cleanly when necessary.