Linux NVME-TCP shared targets with LVM

I’ve been working on Proxmox the last few months and ran into an issue when using nvme-tcp volumes. I connected multiple hosts to a number of shared volumes. I’ve done this before using a number of different methods such as iscsi or shared managed disks on Azure. All of these just worked. When connecting to the nvme-tcp volumes I was able to connect them and see the volumes but after creating a PV (LVM physical volume) on one of them it appeared only on the host the volume was created on. After enabling the shared option and selecting the nodes I wanted to connect the volume to. The volume showed up but was unusable and had a “?” on top of the storage icon. To clear this a command on the other hosts was necessary.

pvscan --cache

This rescanned all the disks on the node and the volume was usable.

Another challenge was getting auto discovery to work after a host reboot correctly. Assuming the nvme “things” are already installed (apt update && apt -y install nvme-cli) and the modules are set to load on boot (modprobe nvme_tcp && echo "nvme_tcp" > /etc/modules-load.d/nvme_tcp.conf). There are a number of guides on connecting manually but the docs are pretty straight forward (Ubuntu Manpage: nvme-discover – Send Get Log Page request to Discovery Controller and Ubuntu Manpage: nvme-connect – Connect to a Fabrics controller). Automatic discovery and connection on boot is managed in the /etc/nvme/discovery.conf file. The file looks like this be default:

# Used for extracting default parameters for discovery
#
# Example:
# --transport=<trtype> --traddr=<traddr> --trsvcid=<trsvcid> --host-traddr=<host-traddr> --host-iface=<host-iface>

This seems pretty straightforward but a couple of things to remember. Much like ESXi iscsi port binding you need to make sure you set the appropriate host-traddr and host-iface. Setting just the host-traddr property on a host may lead to unexpected behavior. Depending upon the configuration of the host it is possible that the host will use a different interface than the one you expected if the target is a routed address vs an L2 address also multiple interfaces on the same L2 may cause unexpected behavior. Both of these behaviors can lead to packets egressing the host from an unexpected interface. In the case of a routeable target the interface used will be the interface with a route to that target (in many cases this will be the management interface with the default route). Since the host may have ip forwarding enabled the source IP address of the traffic will be that of the host-traddr but the physical interface and source MAC will be that of the interface with the default route. This could end up with dropped traffic depending on the urpf settings on the host or switches in between. Since the host “spoofed” the packet with the intended source IP address the return traffic will come back to the expected interface. This results in asymmetric traffic. The last step is to enable the auto-discovery service. systemctl enable nvmf-autoconnect. This will run nvme connect-all as a oneshot on boot if the discovery file exists.

TLDR: set both host-traddr and host-iface. These are the IP address (Ex: 10.10.10.10) and interface name (Ex: ens192) respectively.

Multipathing is native in the nvme stack with a limited set of policies. But that’s for another day.

Leave a comment