[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6IVsbKw0CF+jSq3@t-dallas>
Date: Tue, 4 Feb 2025 21:27:13 +0800
From: Ted Chen <znscnchen@...il.com>
To: Ido Schimmel <idosch@...sch.org>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, andrew+netdev@...n.ch, netdev@...r.kernel.org
Subject: Re: [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to
use the same VNI
On Sun, Feb 02, 2025 at 03:40:35PM +0200, Ido Schimmel wrote:
> On Sat, Feb 01, 2025 at 07:32:07PM +0800, Ted Chen wrote:
> > This RFC series proposes an implementation to enable the configuration of vxlan
> > devices in a Hub-Spoke Network, allowing multiple vxlan devices to share the
> > same VNI while being associated with different remote IPs under the same UDP
> > port.
> >
> > == Use case ==
> > In a Hub-Spoke Network, there is a central VTEP acting as the gateway, along
> > with multiple outer VTEPs. Each outer VTEP communicates exclusively with the
> > central VTEP and has no direct connection to other outer VTEPs. As a result,
> > data exchanged between outer VTEPs must traverse the central VTEP. This design
> > enhances security and enables centralized auditing and monitoring at the
> > central VTEP.
> >
> > == Existing methods ==
> > Currently, there are three methods to implement the use case.
> >
> > Method 1:
> > The central VTEP establishes a separate vxlan tunnel with each outer
> > VTEP, creating a vxlan device with a different VNI for each tunnel.
> > All vxlan devices are then added to the same Linux bridge to enable
> > forwarding.
> >
> > Drawbacks: Complex configuration.
> > Each tenant requires multiple VNIs.
>
> This looks like the most straightforward option to me.
>
> Why do you view it as complex? Why multiple VNIs per tenant are a
> problem when we have 16M of them?
Yes, the issue is not due to a lack of VNIs.
IMO, using a single VNI within a single Layer 2 network is clearer and more
intuitive.
> >
> > Method 2:
> > The central VTEP creates a single vxlan device using the same VNI,
> > without configuring a remote IP. The IP addresses of all outer VTEPs
> > are stored in the fdb. To enable forwarding, the vxlan device is added
> > to a Linux bridge with hairpin mode enabled.
> >
> > Drawbacks: unnecessary overhead or network anomalies
> > The hairpin mode may broadcast packets to all outer VTEPs, causing the
> > source outer VTEP receiving packets it originally sent to the central
> > VTEP. If the packet from the source outer VTEP is a broadcast packet,
> > the broadcasting back of the packet can cause network anomalies.
> >
> > Method 3:
> > The central VTEP uses the same VNI but different UDP ports to create a
> > vxlan device for each outer VTEP, each tunneling to its corresponding
> > outer VTEP. All the vxlan devices in the central VTEP are then added to
> > the same Linux bridge to enable forwarding.
> >
> > Drawbacks: complex configuration and potential security issues.
> > Multiple UDP ports are required.
> >
> > == Proposed implementation ==
> > In the central VTEP, each tenant only requires a single VNI, and all tenants
> > share the same UDP port. This can avoid the drawbacks of the above three
> > methods.
>
> This method also has drawbacks. It breaks existing behavior (see my
> comment on patch #1) and it also bloats the VXLAN receive path.
>
> I want to suggest an alternative which allows you to keep the existing
> topology (same VNI), but without kernel changes. The configuration of
> the outer VTEPs remains the same. The steps below are for the central
> VTEP.
>
> First, create a VXLAN device in "external" mode. It will consume all the
> VNIs in a namespace, but you can limit it with the "vnifilter" keyword,
> if needed:
>
> # ip -n ns_c link add name vx0 type vxlan dstport 4789 nolearning external
> # tc -n ns_c qdisc add dev vx0 clsact
>
> Then, for each outer VTEP, create a dummy device and enslave it to the
> bridge. Taking outer VTEP1 as an example:
>
> # ip -n ns_c link add name dummy_vtep1 up master br0
> # tc -n ns_c qdisc add dev dummy_vtep1 clsact
>
> In order to demultiplex incoming VXLAN packets to the appropriate bridge
> member, use an ingress tc filter on the VXLAN device that matches on the
> encapsulating source IP (you can't do it w/o the "external" keyword) and
> redirects the traffic to the corresponding bridge member:
>
> # tc -n ns_c filter add dev vx0 ingress pref 1 proto all \
> flower enc_key_id 42 enc_src_ip 10.0.0.1 \
> action mirred ingress redirect dev dummy_ns1
>
> (add filters for other VTEPs with "pref 1" to avoid unnecessary
> lookups).
>
> For Tx, on each bridge member, configure an egress tc filter that
> attaches tunnel metadata for the matching outer VTEP and redirects to
> the VXLAN device:
>
> # tc -n ns_c filter add dev dummy_vtep1 egress pref 1 proto all \
> matchall \
> action tunnel_key set src_ip 10.0.0.3 dst_ip 10.0.0.1 id 42 dst_port 4789 \
> action mirred egress redirect dev vx0
>
> The end result should be that the bridge forwards known unicast traffic
> to the appropriate outer VTEP and floods BUM traffic to all the outer
> VTEPs but the one from which the traffic was received.
Cool!
I wasn’t aware that TC could be used in this way. Will give it a try.
Thanks a lot!
> >
> > As in below example,
> > - a tunnel is established between vxlan42.1 in the central VTEP and vxlan42 in
> > the outer VTEP1:
> > ip link add vxlan42.1 type vxlan id 42 \
> > local 10.0.0.3 remote 10.0.0.1 dstport 4789
> >
> > - a tunnel is established between vxlan42.2 in the central VTEP and vxlan42 in
> > the outer VTEP2:
> > ip link add vxlan42.2 type vxlan id 42 \
> > local 10.0.0.3 remote 10.0.0.2 dstport 4789
> >
> >
> > ┌────────────────────────────────────────────┐
> > │ ┌─────────────────────────┐ central │
> > │ │ br0 │ VTEP │
> > │ └─┬────────────────────┬──┘ │
> > │ ┌─────┴───────┐ ┌─────┴───────┐ │
> > │ │ vxlan42.1 │ │ vxlan42.2 │ │
> > │ └─────────────┘ └─────────────┘ │
> > └───────────────────┬─┬──────────────────────┘
> > │ │ eth0 10.0.0.3:4789
> > │ │
> > │ │
> > ┌────────────────┘ └───────────────┐
> > │eth0 10.0.0.1:4789 │eth0 10.0.0.2:4789
> > ┌─────┴───────┐ ┌─────┴───────┐
> > │outer VTEP1 │ │outer VTEP2 │
> > │ vxlan42 │ │ vxlan42 │
> > └─────────────┘ └─────────────┘
> >
> >
> > == Test scenario ==
> > ip netns add ns_1
> > ip link add veth1 type veth peer name veth1-peer
> > ip link set veth1 netns ns_1
> > ip netns exec ns_1 ip addr add 10.0.1.1/24 dev veth1
> > ip netns exec ns_1 ip link set veth1 up
> > ip netns exec ns_1 ip link add vxlan42 type vxlan id 42 \
> > remote 10.0.1.3 dstport 4789
> > ip netns exec ns_1 ip addr add 192.168.0.1/24 dev vxlan42
> > ip netns exec ns_1 ip link set up dev vxlan42
> >
> > ip netns add ns_2
> > ip link add veth2 type veth peer name veth2-peer
> > ip link set veth2 netns ns_2
> > ip netns exec ns_2 ip addr add 10.0.1.2/24 dev veth2
> > ip netns exec ns_2 ip link set veth2 up
> > ip netns exec ns_2 ip link add vxlan42 type vxlan id 42 \
> > remote 10.0.1.3 dstport 4789
> > ip netns exec ns_2 ip addr add 192.168.0.2/24 dev vxlan42
> > ip netns exec ns_2 ip link set up dev vxlan42
> >
> > ip netns add ns_c
> > ip link add veth3 type veth peer name veth3-peer
> > ip link set veth3 netns ns_c
> > ip netns exec ns_c ip addr add 10.0.1.3/24 dev veth3
> > ip netns exec ns_c ip link set veth3 up
> > ip netns exec ns_c ip link add vxlan42.1 type vxlan id 42 \
> > local 10.0.1.3 remote 10.0.1.1 dstport 4789
> > ip netns exec ns_c ip link add vxlan42.2 type vxlan id 42 \
> > local 10.0.1.3 remote 10.0.1.2 dstport 4789
> > ip netns exec ns_c ip link set up dev vxlan42.1
> > ip netns exec ns_c ip link set up dev vxlan42.2
> > ip netns exec ns_c ip link add name br0 type bridge
> > ip netns exec ns_c ip link set br0 up
> > ip netns exec ns_c ip link set vxlan42.1 master br0
> > ip netns exec ns_c ip link set vxlan42.2 master br0
> >
> > ip link add name br1 type bridge
> > ip link set br1 up
> > ip link set veth1-peer up
> > ip link set veth2-peer up
> > ip link set veth3-peer up
> > ip link set veth1-peer master br1
> > ip link set veth2-peer master br1
> > ip link set veth3-peer master br1
> >
> > ip netns exec ns_1 ping 192.168.0.2 -I 192.168.0.1
> >
> > Ted Chen (3):
> > vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and
> > remote_ip
> > vxlan: Do not treat vxlan dev as used when unicast remote_ip
> > mismatches
> > vxlan: vxlan_rcv(): Update comment to inlucde ipv6
> >
> > drivers/net/vxlan/vxlan_core.c | 38 +++++++++++++++++++++++++++-------
> > 1 file changed, 31 insertions(+), 7 deletions(-)
> >
> > --
> > 2.39.2
> >
> >
Powered by blists - more mailing lists