I recently had a question from a colleague about how to move the gateway for a layer 2 extended network in HCX. The move gateway operation in HCX forces you to unextend the network. This is fine if all the workloads on the segment have been migrated but the story I’ve always been told and told in turn is that you can move the gateway when you are ready. IE move half of your workloads and then change the gateway for that network to help decrease your usage of the extension for latency and bandwidth concerns. Same thing should apply if you are using the NSX-T L2 VPN feature. Well if the source and destination networks are both NSX-T logical segments being routed with distributed routing you’ll end up with a problem.
The L2 VPN use case is somewhat annoying as the NSX-T documentation says that vni to vni L2 VPN is supported. https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.2/administration/GUID-86C8D6BB-F185-46DC-828C-1E1876B854E8.html: “Beginning with NSX-T Data Center 2.4 release, L2 VPN service support is available between an NSX-T Data Center L2 VPN server and NSX-T Data Center L2 VPN clients. In this scenario, you can extend the logical L2 segments between two on-premises software-defined data centers (SDDCs).” Well ok that seems find and dandy. In fact it does work. VMs on both sides and talk to each other no problem. But when you try and use the gateway address from the client side you can’t access it. No ping and no L3 traffic. The server side is fine. VMs can get out of the network like you would expect.
The issues lies with the vdr (virtual distributed router) mac address. It is the same across all NSX-T and V installations: 02:50:56:56:44:52. This is an issue for the client L2 segment as the traffic never makes it to the vdr on the server site that has the gateway IP. Luckily VMware has addressed this for NSX-V to NSX-T migrations. https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.2/migration/GUID-538774C2-DE66-4F24-B9B7-537CA2FA87E9.html#:~:text=The%20virtual%20distributed%20routers%20%28VDR%29%20in%20all%20the,with%20the%20following%20PUT%20API%3A%20PUT%20https%3A%2F%2F%20%7Bpolicy-manager%7D%2Fpolicy%2Fapi%. The same change can be applied to one of the sides of the L2VPN to solve this issue. Change the MAC and all the sudden everything works. Why in the world isn’t this mentioned in the L2VPN docs is beyond me.
HCX ends up having the same issue when you manually move the gateway. To move the gateway in an HCX L2 extended network with NSX-T on both sides. First change the MAC address of the vdr on one side. Then on the source side configure the segment and disable Gateway Connectivity:
On the destination site configure the appropriate IP on the extended segment and enable Gateway Connectivity:
Now the gateway is moved and VMs still have connectivity. This wasn’t tested with MON since the supported scale for MON is so low and the version of NSX-T 3.1 being used has a bug around static route handling for MON enabled networks. The interfaces look slightly different here as the source has NSX-T 3.2 while the destination is still NSX-T 3.1.