So you want to restore a Proxmox VM from an array snapshot?

So you’ve been using array based snapshots for your “backups” on your KVM based cloud? Don’t lie I know you’ve been doing it. Your DR plan involves some shoestring and bubble gum. Maybe some duct tape? I know I’ve done it too. Everything untested just have the data somewhere and some hope that you can pull it off.

So why all this melodrama? Well someone asked me a question. As per usual that’s not where it ended. A few people know if you put something in my head that doesn’t’ have and answer and wait you normally get an answer a few days later. Some other folks know you end up with a scripted solution that makes your life easier. Well that happened last week. Ask my wife she may tell you I’ve been talking to myself more than normal the past couple of days.

If you saw the xcopy post from the other day that was a precursor to this. No I don’t use it here. Yes you can use it instead of DD. Yes it takes more work. Just give it some time and I’ll help you out.

Restore of a Proxmox VM from an Everpure FlashArray snapshot

Ok so the idea here is you’ve been taking array based snaps on a schedule of your LVM Thick backed VMs that are on an Everpure FlashArray (look you can do this with any other storage but this is what I have so this is what you get). The basic workflow is clone snap, attach snap to host, vgimportclone volume group, copy lv from snap to prod volume, build vm and attach volume as the disk. So how do all of that? Well the manual procedure is somewhat involved. We will go through that first. If you want to just skip it and use a tool don’t worry I’ve got you covered there too.

Manual Procedure for iSCSI/FC

First gather all the data you will need:

1) VMID and Proxmox Host (in this example 120 and proxmox-01)
2) Proxmox Storage ID (in this example pure-iscsi) and the VG name (run pvesm status or go look in /etc/pve/storage.cfg)
3) The source volume name on the FA
4) The snapshot name on the FA
5) The hostgroup or host on the FA
6) A temporary name for the restore from volume and the vg

Make sure that the VM and disk you are attempting to restore was created BEFORE the snapshot was taken. Then clone the snapshot to a volume on the FA (FlashArray) and attach it to the host/hostgroup. You can use the name you made up in #6. You will also need to get the volume serial number from the FA it’s in the details panel of the volume. It should be a 32 character hex value. You can convert this to the WWID of the volume by concatenating 3624a9370 and the serial number.

Now you need to get Proxmox to be happy with the new volume. Run this on the Proxmox node.

rescan-scsi-bus.sh -r
iscsiadm -m session --rescan
multipath -r
pvscan --cache
vgchange -ay
# Confirm the new device is present. Use the serial from above.
SERIAL=<lowercase-serial-from-gui>
WWID="3624a9370${SERIAL}"
ls -l /dev/disk/by-id/ | grep -i "$SERIAL"
ls -l /dev/mapper/"$WWID"
lsblk -o NAME,SIZE,WWN,SERIAL | grep -i "$SERIAL"

This should get the device multipathed and almost connected. Now to get the VG up and imported. Since this is a clone of an existing VG it will have the same LVM UUID so we have to use vgimportclone:

TAG=<tempname>
TEMP_VG="restore_${TAG}"
vgimportclone --basevgname "$TEMP_VG" /dev/mapper/"$WWID"
vgchange -ay "$TEMP_VG"
lvs -o lv_name,vg_name,lv_size,lv_path "$TEMP_VG"

Now make a decision. Overwrite the existing VM disks or create a new one. If you want to overwrite it then turn off the VM. If you want to make a new one make one with the same disk sizes the existing. Use the VMID of which ever VM you decide to use. I’m going to use the new VM since it’s “safer.”

# Get a new VMID
NEW_VMID=$(pvesh get /cluster/nextid)
echo "$NEW_VMID"
#Create new LVs for this VM
N=0
SRC_LV="vm-<SRC_VMID>-disk-${N}"
NEW_LV="vm-${NEW_VMID}-disk-${N}"
SRC_VG="<live-vg-from-storage.cfg>"
SRC="/dev/${TEMP_VG}/${SRC_LV}"
DST="/dev/${SRC_VG}/${NEW_LV}"
# Get the size of the source and create a new volume
bytes=$(blockdev --getsize64 "$SRC")
lvcreate -Wy --yes -L "${bytes}B" -n "$NEW_LV" "$SRC_VG"
# Copy the data
dd if="$SRC" of="$DST" bs=8M iflag=direct oflag=direct conv=fsync status=progress
# Repeat from N=0 changing the 0 to the next number for each disk

Ok now you’ve got new volume(s) with the data from the snapshot. This is the hardest part. Now create a new VM that uses them. This will work for simple VMs but you may want to review the config to make sure it’s correct. Things like EFI disks, CD-ROMs, TPM states, SMBIOS UUIDs…. may not copy correctly. Just check it for validation.

SRC_CONF=/etc/pve/qemu-server/<SRC_VMID>.conf
NEW_CONF=/etc/pve/qemu-server/${NEW_VMID}.conf
# Copy, rewriting every disk reference from old VMID to new VMID
sed "s/vm-<SRC_VMID>-disk-/vm-${NEW_VMID}-disk-/g" "$SRC_CONF" > "$NEW_CONF"
# Give the new VM a distinct name and a fresh vmgenid so the guest OS
# treats it as a brand-new instance.
qm set "$NEW_VMID" --name "<SRC_NAME>-restore" --vmgenid 1

Now you have a new VM with the disk contents from the snapshot. Don’t power it on yet. You need to change at least the MAC of the NIC. Go update the vm cfg file to generate a new MAC.

qm set "$NEW_VMID" --net0 virtio,bridge=vmbr0,firewall=1
# Before:
# net0: virtio=BC:24:11:AA:BB:CC,bridge=vmbr0,firewall=1
# After:
# net0: virtio,bridge=vmbr0,firewall=1

Now you can either start the VM with the NIC(s) connected or disconnect the NIC(s). I would disconnect them just in case. You don’t want an IP conflict.

Now this is important. Don’t skip this. Clean up the temp volume.

vgchange -an "$TEMP_VG" || true
vgremove -f "$TEMP_VG" || true
# Drop the stale multipath map before deleting the Pure volume, otherwise
# future rescans can hang on queued I/O to paths that will never return.
dmsetup message "$WWID" 0 fail_if_no_path 2>/dev/null || true
for p in $(multipath -l "$WWID" 2>/dev/null \
| awk '{for(i=1;i<=NF;i++) if ($i ~ /^sd[a-z]+$/) print $i}' \
| sort -u); do
echo 1 > "/sys/block/${p}/device/delete" 2>/dev/null || true
done
multipath -f "$WWID" 2>/dev/null || true

Now go and disconnect and destroy the temporary volume on the FA. Finally make sure the Proxmox node cleans up its SCSI devices and multipaths:

rescan-scsi-bus.sh -r
multipath -r

Now if you’ve gotten this far yes I did write a tool to do all this with a GUI, xcopy support, and easy workflows. I’ll have it out hopefully next week and you can forget you read any of this.

Leave a comment