[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f272ce1-3e13-4f93-9676-33b5435197f0@kernel.org>
Date: Wed, 15 Oct 2025 07:31:21 +0200
From: Jiri Slaby <jirislaby@...nel.org>
To: Tonghao Zhang <tonghao@...aicloud.com>
Cc: netdev@...r.kernel.org, Jay Vosburgh <jv@...sburgh.net>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Jonathan Corbet <corbet@....net>,
Andrew Lunn <andrew+netdev@...n.ch>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Nikolay Aleksandrov <razor@...ckwall.org>,
Zengbing Tu <tuzengbing@...iglobal.com>
Subject: Re: [net-next v8 1/3] net: bonding: add broadcast_neighbor option for
802.3ad
On 14. 10. 25, 14:52, Tonghao Zhang wrote:
>> On Oct 14, 2025, at 17:12, Jiri Slaby <jirislaby@...nel.org> wrote:
>> On 27. 06. 25, 15:49, Tonghao Zhang wrote:
>> this breaks broadcast bonding in 6.17. Reverting these three (the two depend on this one) makes 6.17 work again:
>> 2f9afffc399d net: bonding: send peer notify when failure recovery
>> 3d98ee52659c net: bonding: add broadcast_neighbor netlink option
>> ce7a381697cb net: bonding: add broadcast_neighbor option for 802.3ad
>>
>> This was reported downstream as an error in our openQA:
>> https://bugzilla.suse.com/show_bug.cgi?id=1250894
>>
>> I bisected using this in qemu:
>> systemctl stop network
>> ip link del bond0 || true
>> ip link set dev eth0 down
>> ip addr flush eth0
>> ip link add bond0 type bond mode broadcast
>> ip link set dev eth0 master bond0
>> ip addr add 10.0.2.15/24 dev bond0
>> ip link set bond0 up
>> sleep 1
>> exec nmap -sS 10.0.2.2/32
>>
>> Any ideas?
>> ...
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>> ...
>>> @@ -5329,17 +5369,27 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
>>> return bond_tx_drop(dev, skb);
>>> }
>>> -/* in broadcast mode, we send everything to all usable interfaces. */
>>> +/* in broadcast mode, we send everything to all or usable slave interfaces.
>>> + * under rcu_read_lock when this function is called.
>>> + */
>>> static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb,
>>> - struct net_device *bond_dev)
>>> + struct net_device *bond_dev,
>>> + bool all_slaves)
>>> {
>>> struct bonding *bond = netdev_priv(bond_dev);
>>> - struct slave *slave = NULL;
>>> - struct list_head *iter;
>>> + struct bond_up_slave *slaves;
>>> bool xmit_suc = false;
>>> bool skb_used = false;
>>> + int slaves_count, i;
>>> - bond_for_each_slave_rcu(bond, slave, iter) {
>>> + if (all_slaves)
>>> + slaves = rcu_dereference(bond->all_slaves);
>>> + else
>>> + slaves = rcu_dereference(bond->usable_slaves);
>>> +
>>> + slaves_count = slaves ? READ_ONCE(slaves->count) : 0;
>>
>> OK, slaves_count is now 0 (slaves and bond->all_slaves are NULL), but bond_for_each_slave_rcu() used to yield 1 iface.
>>
>> Well, bond_update_slave_arr() is not called for broadcast AFAICS.
> Thank you for pointing out this issue. We don't need to revert the patch. can you test if the following patch is useful to you. I will add test cases later.
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index a8034a561011..c950e1e7f284 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2384,7 +2384,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> unblock_netpoll_tx();
> }
>
> - if (bond_mode_can_use_xmit_hash(bond))
> + if (bond_mode_can_use_xmit_hash(bond) ||
> + BOND_MODE(bond) == BOND_MODE_BROADCAST)
> bond_update_slave_arr(bond, NULL);
>
> if (!slave_dev->netdev_ops->ndo_bpf ||
> @@ -2560,7 +2561,8 @@ static int __bond_release_one(struct net_device *bond_dev,
>
> bond_upper_dev_unlink(bond, slave);
>
> - if (bond_mode_can_use_xmit_hash(bond))
> + if (bond_mode_can_use_xmit_hash(bond) ||
> + BOND_MODE(bond) == BOND_MODE_BROADCAST)
> bond_update_slave_arr(bond, slave);
This indeed works. What about the other two call locations of
bond_update_slave_arr()?
thanks,
--
js
suse labs
Powered by blists - more mailing lists