[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b26b64d8-9569-8d1d-aef5-3d01c819267c@redhat.com>
Date: Mon, 23 May 2022 09:35:38 -0400
From: Jonathan Toppins <jtoppins@...hat.com>
To: Nikolay Aleksandrov <razor@...ckwall.org>, netdev@...r.kernel.org
Cc: toke@...hat.com, Long Xin <lxin@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Jonathan Corbet <corbet@....net>,
Jay Vosburgh <j.vosburgh@...il.com>,
Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v3] bond: add mac filter option for balance-xor
On 5/15/22 02:32, Nikolay Aleksandrov wrote:
> On 15/05/2022 00:41, Nikolay Aleksandrov wrote:
>> On 13/05/2022 20:43, Jonathan Toppins wrote:
>>> Implement a MAC filter that prevents duplicate frame delivery when
>>> handling BUM traffic. This attempts to partially replicate OvS SLB
>>> Bonding[1] like functionality without requiring significant change
>>> in the Linux bridging code.
>>>
>>> A typical network setup for this feature would be:
>>>
>>> .--------------------------------------------.
>>> | .--------------------. |
>>> | | | |
>>> .-------------------. | |
>>> | | Bond 0 | | | |
>>> | .--'---. .---'--. | | |
>>> .----|-| eth0 |-| eth1 |-|----. .-----+----. .----+------.
>>> | | '------' '------' | | | Switch 1 | | Switch 2 |
>>> | '---,---------------' | | +---+ |
>>> | / | '----+-----' '----+------'
>>> | .---'---. .------. | | |
>>> | | br0 |----| VM 1 | | ~~~~~~~~~~~~~~~~~~~~~
>>> | '-------' '------' | ( )
>>> | | .------. | ( Rest of Network )
>>> | '--------| VM # | | (_____________________)
>>> | '------' |
>>> | Host 1 |
>>> '-----------------------------'
>>>
>>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with
>>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One
>>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a
>>> single box, as depicted. Interfaces eth0 and eth1 provide redundant
>>> connections to the data center with the requirement to use all bandwidth
>>> when the system is functioning normally. Switch 1 and Switch 2 are
>>> physical switches that do not implement any advanced L2 management
>>> features such as MLAG, Cisco's VPC, or LACP.
>>>
>>> Combining this feature with vlan+srcmac hash policy allows a user to
>>> create an access network without the need to use expensive switches that
>>> support features like Cisco's VCP.
>>>
>>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding
>>>
>>> Co-developed-by: Long Xin <lxin@...hat.com>
>>> Signed-off-by: Long Xin <lxin@...hat.com>
>>> Signed-off-by: Jonathan Toppins <jtoppins@...hat.com>
>>> ---
>>>
>>> Notes:
>>> v2:
>>> * dropped needless abstraction functions and put code in module init
>>> * renamed variable "rc" to "ret" to stay consistent with most of the
>>> code
>>> * fixed parameter setting management, when arp-monitor is turned on
>>> this feature will be turned off similar to how miimon and arp-monitor
>>> interact
>>> * renamed bond_xor_recv to bond_mac_filter_recv for a little more
>>> clarity
>>> * it appears the implied default return code for any bonding recv probe
>>> must be `RX_HANDLER_ANOTHER`. Changed the default return code of
>>> bond_mac_filter_recv to use this return value to not break skb
>>> processing when the skb dev is switched to the bond dev:
>>> `skb->dev = bond->dev`
>>>
>>> v3: Nik's comments
>>> * clarified documentation
>>> * fixed inline and basic reverse Christmas tree formatting
>>> * zero'ed entry in mac_create
>>> * removed read_lock taking in bond_mac_filter_recv
>>> * made has_expired() atomic and removed critical sections
>>> surrounding calls to has_expired(), this also removed the
>>> use-after-free that would have occurred:
>>> spin_lock_irqsave(&entry->lock, flags);
>>> if (has_expired(bond, entry))
>>> mac_delete(bond, entry);
>>> spin_unlock_irqrestore(&entry->lock, flags); <---
>>> * moved init/destroy of mac_filter_tbl to bond_open/bond_close
>>> this removed the complex option dependencies, the only behavioural
>>> change the user will see is if the bond is up and mac_filter is
>>> enabled if they try and set arp_interval they will receive -EBUSY
>>> * in bond_changelink moved processing of mac_filter option just below
>>> mode processing
>>>
>>> Documentation/networking/bonding.rst | 20 +++
>>> drivers/net/bonding/Makefile | 2 +-
>>> drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++
>>> drivers/net/bonding/bond_mac_filter.h | 37 +++++
>>> drivers/net/bonding/bond_main.c | 30 ++++
>>> drivers/net/bonding/bond_netlink.c | 13 ++
>>> drivers/net/bonding/bond_options.c | 81 +++++++++--
>>> drivers/net/bonding/bonding_priv.h | 1 +
>>> include/net/bond_options.h | 1 +
>>> include/net/bonding.h | 3 +
>>> include/uapi/linux/if_link.h | 1 +
>>> 11 files changed, 373 insertions(+), 17 deletions(-)
>>> create mode 100644 drivers/net/bonding/bond_mac_filter.c
>>> create mode 100644 drivers/net/bonding/bond_mac_filter.h
>>>
>>
> [snip]
>
> The same problem solved using a few nftables rules (in case you don't want to load eBPF):
> $ nft 'add table netdev nt'
> $ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }'
> $ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }'
> $ nft 'add set netdev nt macset { type ether_addr; flags timeout; }'
> $ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset'
> $ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop'
So I did some testing on this nft solution and it performs largely the
same as the bonding solution except for handling rule1 of the OVS SLB[1]
for multicast[2] and broadcast[3] traffic. If this can be corrected then
I think it would be a possible solution otherwise the bonding solution
fits the requirements better.
-Jon
[1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding
[2]
https://gitlab.com/liali666/virtual-networking/-/blob/master/tests/test-0012-ovs-rule1-multicast
[3]
https://gitlab.com/liali666/virtual-networking/-/blob/master/tests/test-0011-ovs-rule1-broadcast
Powered by blists - more mailing lists