[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a97e6e1e-81bc-4a79-8352-9e4794b0d2ca@kernel.org>
Date: Tue, 14 Oct 2025 11:12:16 +0200
From: Jiri Slaby <jirislaby@...nel.org>
To: Tonghao Zhang <tonghao@...aicloud.com>, netdev@...r.kernel.org
Cc: Jay Vosburgh <jv@...sburgh.net>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch>,
Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu
<mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Nikolay Aleksandrov <razor@...ckwall.org>,
Zengbing Tu <tuzengbing@...iglobal.com>
Subject: Re: [net-next v8 1/3] net: bonding: add broadcast_neighbor option for
802.3ad
On 27. 06. 25, 15:49, Tonghao Zhang wrote:
> Stacking technology is a type of technology used to expand ports on
> Ethernet switches. It is widely used as a common access method in
> large-scale Internet data center architectures. Years of practice
> have proved that stacking technology has advantages and disadvantages
> in high-reliability network architecture scenarios. For instance,
> in stacking networking arch, conventional switch system upgrades
> require multiple stacked devices to restart at the same time.
> Therefore, it is inevitable that the business will be interrupted
> for a while. It is for this reason that "no-stacking" in data centers
> has become a trend. Additionally, when the stacking link connecting
> the switches fails or is abnormal, the stack will split. Although it is
> not common, it still happens in actual operation. The problem is that
> after the split, it is equivalent to two switches with the same
> configuration appearing in the network, causing network configuration
> conflicts and ultimately interrupting the services carried by the
> stacking system.
>
> To improve network stability, "non-stacking" solutions have been
> increasingly adopted, particularly by public cloud providers and
> tech companies like Alibaba, Tencent, and Didi. "non-stacking" is
> a method of mimicing switch stacking that convinces a LACP peer,
> bonding in this case, connected to a set of "non-stacked" switches
> that all of its ports are connected to a single switch
> (i.e., LACP aggregator), as if those switches were stacked. This
> enables the LACP peer's ports to aggregate together, and requires
> (a) special switch configuration, described in the linked article,
> and (b) modifications to the bonding 802.3ad (LACP) mode to send
> all ARP/ND packets across all ports of the active aggregator.
>
> Note that, with multiple aggregators, the current broadcast mode
> logic will send only packets to the selected aggregator(s).
>
> +-----------+ +-----------+
> | switch1 | | switch2 |
> +-----------+ +-----------+
> ^ ^
> | |
> +-----------------+
> | bond4 lacp |
> +-----------------+
> | |
> | NIC1 | NIC2
> +-----------------+
> | server |
> +-----------------+
Hi,
this breaks broadcast bonding in 6.17. Reverting these three (the two
depend on this one) makes 6.17 work again:
2f9afffc399d net: bonding: send peer notify when failure recovery
3d98ee52659c net: bonding: add broadcast_neighbor netlink option
ce7a381697cb net: bonding: add broadcast_neighbor option for 802.3ad
This was reported downstream as an error in our openQA:
https://bugzilla.suse.com/show_bug.cgi?id=1250894
I bisected using this in qemu:
systemctl stop network
ip link del bond0 || true
ip link set dev eth0 down
ip addr flush eth0
ip link add bond0 type bond mode broadcast
ip link set dev eth0 master bond0
ip addr add 10.0.2.15/24 dev bond0
ip link set bond0 up
sleep 1
exec nmap -sS 10.0.2.2/32
Any ideas?
> - https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/
>
> Cc: Jay Vosburgh <jv@...sburgh.net>
> Cc: "David S. Miller" <davem@...emloft.net>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Cc: Jakub Kicinski <kuba@...nel.org>
> Cc: Paolo Abeni <pabeni@...hat.com>
> Cc: Simon Horman <horms@...nel.org>
> Cc: Jonathan Corbet <corbet@....net>
> Cc: Andrew Lunn <andrew+netdev@...n.ch>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Masami Hiramatsu <mhiramat@...nel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Nikolay Aleksandrov <razor@...ckwall.org>
> Signed-off-by: Tonghao Zhang <tonghao@...aicloud.com>
> Signed-off-by: Zengbing Tu <tuzengbing@...iglobal.com>
> ---
> v8: add comments info in bond_option_mode_set, explain why we only
> clear broadcast_neighbor to 0.
> Note that selftest will be post after I post the iproute2 patch about
> this option.
> ---
> Documentation/networking/bonding.rst | 6 +++
> drivers/net/bonding/bond_main.c | 66 +++++++++++++++++++++++++---
> drivers/net/bonding/bond_options.c | 42 ++++++++++++++++++
> include/net/bond_options.h | 1 +
> include/net/bonding.h | 3 ++
> 5 files changed, 112 insertions(+), 6 deletions(-)
>
...
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
...
> @@ -5329,17 +5369,27 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
> return bond_tx_drop(dev, skb);
> }
>
> -/* in broadcast mode, we send everything to all usable interfaces. */
> +/* in broadcast mode, we send everything to all or usable slave interfaces.
> + * under rcu_read_lock when this function is called.
> + */
> static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb,
> - struct net_device *bond_dev)
> + struct net_device *bond_dev,
> + bool all_slaves)
> {
> struct bonding *bond = netdev_priv(bond_dev);
> - struct slave *slave = NULL;
> - struct list_head *iter;
> + struct bond_up_slave *slaves;
> bool xmit_suc = false;
> bool skb_used = false;
> + int slaves_count, i;
>
> - bond_for_each_slave_rcu(bond, slave, iter) {
> + if (all_slaves)
> + slaves = rcu_dereference(bond->all_slaves);
> + else
> + slaves = rcu_dereference(bond->usable_slaves);
> +
> + slaves_count = slaves ? READ_ONCE(slaves->count) : 0;
OK, slaves_count is now 0 (slaves and bond->all_slaves are NULL), but
bond_for_each_slave_rcu() used to yield 1 iface.
Well, bond_update_slave_arr() is not called for broadcast AFAICS.
> + for (i = 0; i < slaves_count; i++) {
> + struct slave *slave = slaves->arr[i];
> struct sk_buff *skb2;
>
> if (!(bond_slave_is_up(slave) && slave->link == BOND_LINK_UP))
> @@ -5577,10 +5627,13 @@ static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev
> case BOND_MODE_ACTIVEBACKUP:
> return bond_xmit_activebackup(skb, dev);
> case BOND_MODE_8023AD:
> + if (bond_should_broadcast_neighbor(skb, dev))
> + return bond_xmit_broadcast(skb, dev, false);
> + fallthrough;
> case BOND_MODE_XOR:
> return bond_3ad_xor_xmit(skb, dev);
> case BOND_MODE_BROADCAST:
> - return bond_xmit_broadcast(skb, dev);
> + return bond_xmit_broadcast(skb, dev, true);
> case BOND_MODE_ALB:
> return bond_alb_xmit(skb, dev);
> case BOND_MODE_TLB:
> @@ -6456,6 +6509,7 @@ static int __init bond_check_params(struct bond_params *params)
> eth_zero_addr(params->ad_actor_system);
> params->ad_user_port_key = ad_user_port_key;
> params->coupled_control = 1;
> + params->broadcast_neighbor = 0;
> if (packets_per_slave > 0) {
> params->reciprocal_packets_per_slave =
> reciprocal_value(packets_per_slave);
--
js
suse labs
Powered by blists - more mailing lists