lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFoCos-vPLfbGoM1@fedora>
Date: Tue, 24 Jun 2025 01:42:58 +0000
From: Hangbin Liu <liuhangbin@...il.com>
To: Jay Vosburgh <jv@...sburgh.net>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Nikolay Aleksandrov <razor@...ckwall.org>,
	Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org,
	Liang Li <liali@...hat.com>
Subject: Re: [PATCH net] bonding: fix multicast MAC address synchronization

Hi Jay,

Any comments?

Hangbin
On Mon, May 26, 2025 at 08:51:52AM +0000, Hangbin Liu wrote:
> On Fri, May 23, 2025 at 02:03:47PM -0700, Jay Vosburgh wrote:
> > Hangbin Liu <liuhangbin@...il.com> wrote:
> > 
> > >There is a corner case where the NS (Neighbor Solicitation) target is set to
> > >an invalid or unreachable address. In such cases, all the slave links are
> > >marked as down and set to backup. This causes the bond to add multicast MAC
> > >addresses to all slaves.
> > >
> > >However, bond_ab_arp_probe() later tries to activate a carrier on slave and
> > >sets it as active. If we subsequently change or clear the NS targets, the
> > >call to bond_slave_ns_maddrs_del() on this interface will fail because it
> > >is still marked active, and the multicast MAC address will remain.
> > 
> > 	This seems complicated, so, just to make sure I'm clear, the bug
> > being fixed here happens when:
> > 
> > (a) ARP monitor is running with NS target(s), all of which do not
> > solicit a reply (invalid address or unreachable), resulting in all
> > interfaces in the bond being marked down
> > 
> > (b) while in the above state, the ARP monitor will cycle through each
> > interface, making them "active" (active-ish, really, just enough for the
> > ARP mon stuff to work) in turn to check for a response to a probe
> 
> Yes
> 
> > 
> > (c) while the cycling from (b) is occurring, attempts to change a NS
> > target will fail on the interface that happens to be quasi-"active" at
> > the moment.
> 
> Yes, this is because bond_slave_ns_maddrs_del() must ensure the deletion
> happens on a backup slave only. However, during ARP monitor, it set one of
> the slaves to active, this causes the deletion of multicast MAC addresses to
> be skipped on that interface.
> 
> > 	Is my summary correct?
> > 
> > 	Doesn't the failure scenario also require that arp_validate be
> > enabled?  Looking at bond_slave_ns_maddrs_{add,del}, they do nothing if
> > arp_validate is off.
> 
> Yes, it need.
> 
> > 
> > >To fix this issue, move the NS multicast address add/remove logic into
> > >bond_set_slave_state() to ensure multicast MAC addresses are updated
> > >synchronously whenever the slave state changes.
> > 
> > 	Ok, but state change calls happen in a lot more places than the
> > existing bond_hw_addr_swap(), which is only called during change of
> > active for active-backup, balance-alb, and balance-tlb.  Are you sure
> > that something goofy like setting arp_validate and an NS target with the
> > ARP monitor disabled (or in a mode that disallows it) will behave
> > rationally?
> 
> The slave_can_set_ns_maddr() in slave_set_ns_maddrs could check the bond mode
> and if the slave is active. But no arp_interval checking. I can add it in the
> checking to avoid the miss-config. e.g.
> 
> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> index 91893c29b899..21116362cc24 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -1241,6 +1241,7 @@ static int bond_option_arp_ip_targets_set(struct bonding *bond,
>  static bool slave_can_set_ns_maddr(const struct bonding *bond, struct slave *slave)
>  {
>         return BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
> +              bond->params.arp_interval &&
>                !bond_is_active_slave(slave) &&
>                slave->dev->flags & IFF_MULTICAST;
>  }
> 
> > 
> > >Note: The call to bond_slave_ns_maddrs_del() in __bond_release_one() is
> > >kept, as it is still required to clean up multicast MAC addresses when
> > >a slave is removed.
> > >
> > >Fixes: 8eb36164d1a6 ("bonding: add ns target multicast address to slave device")
> > >Reported-by: Liang Li <liali@...hat.com>
> > >Signed-off-by: Hangbin Liu <liuhangbin@...il.com>
> > >---
> > > drivers/net/bonding/bond_main.c | 9 ---------
> > > include/net/bonding.h           | 7 +++++++
> > > 2 files changed, 7 insertions(+), 9 deletions(-)
> > >
> > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > >index 8ea183da8d53..6dde6f870ee2 100644
> > >--- a/drivers/net/bonding/bond_main.c
> > >+++ b/drivers/net/bonding/bond_main.c
> > >@@ -1004,8 +1004,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
> > > 
> > > 		if (bond->dev->flags & IFF_UP)
> > > 			bond_hw_addr_flush(bond->dev, old_active->dev);
> > >-
> > >-		bond_slave_ns_maddrs_add(bond, old_active);
> > > 	}
> > > 
> > > 	if (new_active) {
> > >@@ -1022,8 +1020,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
> > > 			dev_mc_sync(new_active->dev, bond->dev);
> > > 			netif_addr_unlock_bh(bond->dev);
> > > 		}
> > >-
> > >-		bond_slave_ns_maddrs_del(bond, new_active);
> > > 	}
> > > }
> > > 
> > >@@ -2350,11 +2346,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> > > 	bond_compute_features(bond);
> > > 	bond_set_carrier(bond);
> > > 
> > >-	/* Needs to be called before bond_select_active_slave(), which will
> > >-	 * remove the maddrs if the slave is selected as active slave.
> > >-	 */
> > >-	bond_slave_ns_maddrs_add(bond, new_slave);
> > >-
> > > 	if (bond_uses_primary(bond)) {
> > > 		block_netpoll_tx();
> > > 		bond_select_active_slave(bond);
> > >diff --git a/include/net/bonding.h b/include/net/bonding.h
> > >index 95f67b308c19..0041f7a2bd18 100644
> > >--- a/include/net/bonding.h
> > >+++ b/include/net/bonding.h
> > >@@ -385,7 +385,14 @@ static inline void bond_set_slave_state(struct slave *slave,
> > > 	if (slave->backup == slave_state)
> > > 		return;
> > > 
> > >+	if (slave_state == BOND_STATE_ACTIVE)
> > >+		bond_slave_ns_maddrs_del(slave->bond, slave);
> > >+
> > > 	slave->backup = slave_state;
> > >+
> > >+	if (slave_state == BOND_STATE_BACKUP)
> > >+		bond_slave_ns_maddrs_add(slave->bond, slave);
> > 
> > 	This code pattern kind of makes it look like the slave->backup
> > assignment must happen between the two new if blocks.  I don't think
> > that's true, and things would work correctly if the slave->backup
> > assignment happened first (or last).
> 
> The slave->backup assignment must happen between the two if blocks, because
> 
> bond_slave_ns_maddrs_add/del only do the operation on backup slave.
> So if a interface become active, we need to call maddrs_del before it set
> backup state to active. If a interface become backup. We need to call
> maddrs_add after the backup state set to backup.
> 
> I will add a comment in the code.
> 
> Thanks
> Hangbin
> > 
> > 	Assuming I'm correct, could you move the assignment so it's not
> > in the middle?  If, however, it does need to be in the middle, that
> > deserves a comment explaining why.
> > 
> > 	-J
> > 
> > >+
> > > 	if (notify) {
> > > 		bond_lower_state_changed(slave);
> > > 		bond_queue_slave_event(slave);
> > >-- 
> > >2.46.0
> > >
> > 
> > ---
> > 	-Jay Vosburgh, jv@...sburgh.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ