ESXi VM Power On Delay

Ran into something odd the other day. I was observing a 30 second power on delay for VMs on an ESXi host. I messed with it for a few weeks, dug through logs, and opened a VMware SR. Got nothing from all of the normal avenues. After the ticket was open for a good week with no solution I was a little miffed. So I browsed reddit for a while. It occurred to me after read some of the old r/sysadmin posts (don’t get me started on how the normal post there now is someone complaining about their job) and the DNS haiku showed up. You know:

“It’s Not DNS
There’s no way it’s DNS
It was DNS”

Well long story short it was DNS. The host gets a DHCP address for vmk0 that provides a DNS server that doesn’t respond. Oddly enough when you try and power on a VM the host does a name resolution of itself before it completes the action:

[root@localhost:~] date && vim-cmd vmsvc/power.on 2  && date 
Wed Sep 28 12:30:07 UTC 2022 
Powering on VM: 
Wed Sep 28 12:30:33 UTC 20ss

The associated packet capture output:

12:30:08.257207 > 39689+ A? localhost.internal.dns (81)
12:30:13.260444 > 39689+ A? localhost.internal.dns (81)
12:30:13.261432 > 27060+ A? localhost.internal.dns (81)
12:30:18.263761 > 22372+ A? localhost.internal.dns (102)
12:30:18.264643 > 27060+ A? localhost.internal.dns (81)
12:30:23.266920 > 22372+ A? localhost.internal.dns (102)
12:30:23.267782 > 21279+ A? localhost.internal.dns (102)
12:30:28.270808 > 21279+ A? localhost.internal.dns (102)

Once I set the DNS server to one that responds the VM powers on fine. So with that in mind I did some more testing with some help from a friend. This issue only occurs with DHCP hosts. If you set a static IP and an unresponsive DNS server there is no issue.

