lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6IVsbKw0CF+jSq3@t-dallas>
Date: Tue, 4 Feb 2025 21:27:13 +0800
From: Ted Chen <znscnchen@...il.com>
To: Ido Schimmel <idosch@...sch.org>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
	pabeni@...hat.com, andrew+netdev@...n.ch, netdev@...r.kernel.org
Subject: Re: [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to
 use the same VNI

On Sun, Feb 02, 2025 at 03:40:35PM +0200, Ido Schimmel wrote:
> On Sat, Feb 01, 2025 at 07:32:07PM +0800, Ted Chen wrote:
> > This RFC series proposes an implementation to enable the configuration of vxlan
> > devices in a Hub-Spoke Network, allowing multiple vxlan devices to share the
> > same VNI while being associated with different remote IPs under the same UDP
> > port.
> > 
> > == Use case ==
> > In a Hub-Spoke Network, there is a central VTEP acting as the gateway, along
> > with multiple outer VTEPs. Each outer VTEP communicates exclusively with the
> > central VTEP and has no direct connection to other outer VTEPs. As a result,
> > data exchanged between outer VTEPs must traverse the central VTEP. This design
> > enhances security and enables centralized auditing and monitoring at the
> > central VTEP.
> > 
> > == Existing methods ==
> > Currently, there are three methods to implement the use case.
> > 
> > Method 1:
> >          The central VTEP establishes a separate vxlan tunnel with each outer
> >          VTEP, creating a vxlan device with a different VNI for each tunnel.
> >          All vxlan devices are then added to the same Linux bridge to enable
> >          forwarding.
> > 
> >          Drawbacks: Complex configuration.
> >          Each tenant requires multiple VNIs.
> 
> This looks like the most straightforward option to me.
> 
> Why do you view it as complex? Why multiple VNIs per tenant are a
> problem when we have 16M of them?
Yes, the issue is not due to a lack of VNIs.
IMO, using a single VNI within a single Layer 2 network is clearer and more
intuitive.

> > 
> > Method 2:
> >         The central VTEP creates a single vxlan device using the same VNI,
> >         without configuring a remote IP. The IP addresses of all outer VTEPs
> >         are stored in the fdb. To enable forwarding, the vxlan device is added
> >         to a Linux bridge with hairpin mode enabled.
> > 
> >         Drawbacks: unnecessary overhead or network anomalies
> >         The hairpin mode may broadcast packets to all outer VTEPs, causing the
> >         source outer VTEP receiving packets it originally sent to the central
> >         VTEP. If the packet from the source outer VTEP is a broadcast packet,
> >         the broadcasting back of the packet can cause network anomalies.
> > 
> > Method 3:
> >         The central VTEP uses the same VNI but different UDP ports to create a
> >         vxlan device for each outer VTEP, each tunneling to its corresponding
> >         outer VTEP. All the vxlan devices in the central VTEP are then added to
> >         the same Linux bridge to enable forwarding.
> > 
> >         Drawbacks: complex configuration and potential security issues.
> >         Multiple UDP ports are required.
> > 
> > == Proposed implementation ==
> > In the central VTEP, each tenant only requires a single VNI, and all tenants
> > share the same UDP port. This can avoid the drawbacks of the above three
> > methods.
> 
> This method also has drawbacks. It breaks existing behavior (see my
> comment on patch #1) and it also bloats the VXLAN receive path.
> 
> I want to suggest an alternative which allows you to keep the existing
> topology (same VNI), but without kernel changes. The configuration of
> the outer VTEPs remains the same. The steps below are for the central
> VTEP.
> 
> First, create a VXLAN device in "external" mode. It will consume all the
> VNIs in a namespace, but you can limit it with the "vnifilter" keyword,
> if needed:
> 
> # ip -n ns_c link add name vx0 type vxlan dstport 4789 nolearning external
> # tc -n ns_c qdisc add dev vx0 clsact
> 
> Then, for each outer VTEP, create a dummy device and enslave it to the
> bridge. Taking outer VTEP1 as an example:
> 
> # ip -n ns_c link add name dummy_vtep1 up master br0
> # tc -n ns_c qdisc add dev dummy_vtep1 clsact
> 
> In order to demultiplex incoming VXLAN packets to the appropriate bridge
> member, use an ingress tc filter on the VXLAN device that matches on the
> encapsulating source IP (you can't do it w/o the "external" keyword) and
> redirects the traffic to the corresponding bridge member:
> 
> # tc -n ns_c filter add dev vx0 ingress pref 1 proto all \
> 	flower enc_key_id 42 enc_src_ip 10.0.0.1 \
> 	action mirred ingress redirect dev dummy_ns1
> 
> (add filters for other VTEPs with "pref 1" to avoid unnecessary
> lookups).
> 
> For Tx, on each bridge member, configure an egress tc filter that
> attaches tunnel metadata for the matching outer VTEP and redirects to
> the VXLAN device:
> 
> # tc -n ns_c filter add dev dummy_vtep1 egress pref 1 proto all \
> 	matchall \
> 	action tunnel_key set src_ip 10.0.0.3 dst_ip 10.0.0.1 id 42 dst_port 4789 \
> 	action mirred egress redirect dev vx0
> 
> The end result should be that the bridge forwards known unicast traffic
> to the appropriate outer VTEP and floods BUM traffic to all the outer
> VTEPs but the one from which the traffic was received.
Cool!
I wasn’t aware that TC could be used in this way. Will give it a try.

Thanks a lot!

> > 
> > As in below example,
> > - a tunnel is established between vxlan42.1 in the central VTEP and vxlan42 in
> >   the outer VTEP1:
> >   ip link add vxlan42.1 type vxlan id 42 \
> >           local 10.0.0.3 remote 10.0.0.1 dstport 4789
> > 
> > - a tunnel is established between vxlan42.2 in the central VTEP and vxlan42 in
> >   the outer VTEP2:
> >   ip link add vxlan42.2 type vxlan id 42 \
> >   		  local 10.0.0.3 remote 10.0.0.2 dstport 4789
> > 
> > 
> >     ┌────────────────────────────────────────────┐
> >     │       ┌─────────────────────────┐  central │
> >     │       │          br0            │    VTEP  │
> >     │       └─┬────────────────────┬──┘          │
> >     │   ┌─────┴───────┐      ┌─────┴───────┐     │          
> >     │   │ vxlan42.1   │      │  vxlan42.2  │     │
> >     │   └─────────────┘      └─────────────┘     │  
> >     └───────────────────┬─┬──────────────────────┘
> >                         │ │ eth0 10.0.0.3:4789
> >                         │ │            
> >                         │ │            
> >        ┌────────────────┘ └───────────────┐
> >        │eth0 10.0.0.1:4789                │eth0 10.0.0.2:4789
> >  ┌─────┴───────┐                    ┌─────┴───────┐
> >  │outer VTEP1  │                    │outer VTEP2  │
> >  │     vxlan42 │                    │     vxlan42 │
> >  └─────────────┘                    └─────────────┘
> > 
> > 
> > == Test scenario ==
> > ip netns add ns_1
> > ip link add veth1 type veth peer name veth1-peer
> > ip link set veth1 netns ns_1
> > ip netns exec ns_1 ip addr add 10.0.1.1/24 dev veth1
> > ip netns exec ns_1 ip link set veth1 up
> > ip netns exec ns_1 ip link add vxlan42 type vxlan id 42 \
> >                    remote 10.0.1.3 dstport 4789
> > ip netns exec ns_1 ip addr add 192.168.0.1/24 dev vxlan42
> > ip netns exec ns_1 ip link set up dev vxlan42
> > 
> > ip netns add ns_2
> > ip link add veth2 type veth peer name veth2-peer
> > ip link set veth2 netns ns_2
> > ip netns exec ns_2 ip addr add 10.0.1.2/24 dev veth2
> > ip netns exec ns_2 ip link set veth2 up
> > ip netns exec ns_2 ip link add vxlan42 type vxlan id 42 \
> >                    remote 10.0.1.3 dstport 4789
> > ip netns exec ns_2 ip addr add 192.168.0.2/24 dev vxlan42
> > ip netns exec ns_2 ip link set up dev vxlan42
> > 
> > ip netns add ns_c
> > ip link add veth3 type veth peer name veth3-peer
> > ip link set veth3 netns ns_c
> > ip netns exec ns_c ip addr add 10.0.1.3/24 dev veth3
> > ip netns exec ns_c ip link set veth3 up
> > ip netns exec ns_c ip link add vxlan42.1 type vxlan id 42 \
> >                    local 10.0.1.3 remote 10.0.1.1 dstport 4789
> > ip netns exec ns_c ip link add vxlan42.2 type vxlan id 42 \
> >                    local 10.0.1.3 remote 10.0.1.2 dstport 4789
> > ip netns exec ns_c ip link set up dev vxlan42.1
> > ip netns exec ns_c ip link set up dev vxlan42.2
> > ip netns exec ns_c ip link add name br0 type bridge
> > ip netns exec ns_c ip link set br0 up
> > ip netns exec ns_c ip link set vxlan42.1 master br0
> > ip netns exec ns_c ip link set vxlan42.2 master br0
> > 
> > ip link add name br1 type bridge
> > ip link set br1 up
> > ip link set veth1-peer up
> > ip link set veth2-peer up
> > ip link set veth3-peer up
> > ip link set veth1-peer master br1
> > ip link set veth2-peer master br1
> > ip link set veth3-peer master br1
> > 
> > ip netns exec ns_1 ping 192.168.0.2 -I 192.168.0.1
> > 
> > Ted Chen (3):
> >   vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and
> >     remote_ip
> >   vxlan: Do not treat vxlan dev as used when unicast remote_ip
> >     mismatches
> >   vxlan: vxlan_rcv(): Update comment to inlucde ipv6
> > 
> >  drivers/net/vxlan/vxlan_core.c | 38 +++++++++++++++++++++++++++-------
> >  1 file changed, 31 insertions(+), 7 deletions(-)
> > 
> > -- 
> > 2.39.2
> > 
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ