There is a lot of information around about generating VMware zdump files. These files are the way ESXi stores information about a Purple Screen of Death (PSOD). Normally an ESXi host attempts to write these files to disk during a crash. It can also send this file over the network to the vmware-netdumper service. This normally runs on the vCenter Server Appliance. This is great for normal installations but what if you had thousands of hosts spread over hundreds of vCenters? Wouldn’t it be nice to be able to collect these dumps and do a quick analysis of the dump file? What if you are afraid of vCenter Server running out of disk space and not being able to collect the dump? Well if you have all of these issues you’d be me. And that’s a scary thought. So what do you do? Easy go find the vmware-esx-netdumper-*.x86_64.rpm file on the ISO embedded in the VCSA OVA as disk 2. You can read my previous post https://vskeeball.com/2024/01/27/installing-vcenter-as-an-azure-vm/ to find out how to get that ISO. Take that rpm file and extract all the contents:
rpm2cpio vmware-esx-netdumper-8.0.2.00100-11979815.x86_64.rpm | cpio -i --make-directories
In this archive there is a vmware-netdumper executable. There are other files to make it a systemd service and such but that’s up to you. You could of course install the RMP. Run the file:
./usr/sbin/vmware-netdumper -i <Listen IP address> -d <output path> -o 6500
Then set your ESXi host to use the IP you are listening on for it’s coredump server (https://kb.vmware.com/s/article/2002954)
esxcli system coredump network set --interface-name vmk0 --server-ipv4 xx.xx.xx.xx --server-port 6500
esxcli system coredump network set --enable true
esxcli system coredump network get
From here you can now run crashme
vsish -e set /reliability/crashMe/Panic 1
The host will now send a crash dump to the running netdumper server and you will end up with a zdump file in a sub directory in your output path. If your esxi vmk0 interface was 10.0.0.1 the directory will be 10/0/0/1 and the dump file will be zdump_10.0.0.1-YYY-MM-DD-hh-mm_ss-0
“So what” you may ask now I have a binary zdump file that I can’t do anything with unless I send it to VMware or copy it back to another ESXi host and dump the vmkernel.log file. Seems like more work that is necessary. After peeking at the file for a few minutes I found that the the dump is not encrypted and the vmkernel.log file starts at offset 4096 (0x1000) and seems to end at 8458239(0x810FFF) and is padded with nulls after the end of the log. I may be a bit long on the ending offset as my log file wasn’t that long. Ten lines of python later and I can dump the vmkernel log file.
import sys
fo = open(sys.argv[1],"rb")
foutput = open(sys.argv[2],"wb")
fo.seek(4096)
for line in fo:
if line.startswith(b'\x00'):
fo.close()
foutput.close()
break
foutput.write(line)
Run the script like python3 scriptname.py <zdump file name> <output log file name>
And tada! You have extracted the vmkerel.log from the zdump file. Now you can attempt to determine why it crashed. Thanks for reading my rambling.