Ran into something odd the other day. I was observing a 30 second power on delay for VMs on an ESXi host. I messed with it for a few weeks, dug through logs, and opened a VMware SR. Got nothing from all of the normal avenues. After the ticket was open for a good week with no solution I was a little miffed. So I browsed reddit for a while. It occurred to me after read some of the old r/sysadmin posts (don’t get me started on how the normal post there now is someone complaining about their job) and the DNS haiku showed up. You know:
“It’s Not DNS
There’s no way it’s DNS
It was DNS”
Well long story short it was DNS. The host gets a DHCP address for vmk0 that provides a DNS server that doesn’t respond. Oddly enough when you try and power on a VM the host does a name resolution of itself before it completes the action:
[root@localhost:~] date && vim-cmd vmsvc/power.on 2 && date
Wed Sep 28 12:30:07 UTC 2022
Powering on VM:
Wed Sep 28 12:30:33 UTC 20ss
The associated packet capture output:
12:30:08.257207 10.1.2.3.50966 > 168.63.129.16.53: 39689+ A? localhost.internal.dns (81)
12:30:13.260444 10.1.2.3.50966 > 168.63.129.16.53: 39689+ A? localhost.internal.dns (81)
12:30:13.261432 10.1.2.3.54383 > 168.63.129.16.53: 27060+ A? localhost.internal.dns (81)
12:30:18.263761 10.1.2.3.61923 > 168.63.129.16.53: 22372+ A? localhost.internal.dns (102)
12:30:18.264643 10.1.2.3.54383 > 168.63.129.16.53: 27060+ A? localhost.internal.dns (81)
12:30:23.266920 10.1.2.3.61923 > 168.63.129.16.53: 22372+ A? localhost.internal.dns (102)
12:30:23.267782 10.1.2.3.56398 > 168.63.129.16.53: 21279+ A? localhost.internal.dns (102)
12:30:28.270808 10.1.2.3.56398 > 168.63.129.16.53: 21279+ A? localhost.internal.dns (102)
Once I set the DNS server to one that responds the VM powers on fine. So with that in mind I did some more testing with some help from a friend. This issue only occurs with DHCP hosts. If you set a static IP and an unresponsive DNS server there is no issue.