lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aDQrn8EslaWx_jEA@fedora>
Date: Mon, 26 May 2025 08:51:43 +0000
From: Hangbin Liu <liuhangbin@...il.com>
To: Jay Vosburgh <jv@...sburgh.net>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Nikolay Aleksandrov <razor@...ckwall.org>,
	Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org,
	Liang Li <liali@...hat.com>
Subject: Re: [PATCH net] bonding: fix multicast MAC address synchronization

On Fri, May 23, 2025 at 02:03:47PM -0700, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@...il.com> wrote:
> 
> >There is a corner case where the NS (Neighbor Solicitation) target is set to
> >an invalid or unreachable address. In such cases, all the slave links are
> >marked as down and set to backup. This causes the bond to add multicast MAC
> >addresses to all slaves.
> >
> >However, bond_ab_arp_probe() later tries to activate a carrier on slave and
> >sets it as active. If we subsequently change or clear the NS targets, the
> >call to bond_slave_ns_maddrs_del() on this interface will fail because it
> >is still marked active, and the multicast MAC address will remain.
> 
> 	This seems complicated, so, just to make sure I'm clear, the bug
> being fixed here happens when:
> 
> (a) ARP monitor is running with NS target(s), all of which do not
> solicit a reply (invalid address or unreachable), resulting in all
> interfaces in the bond being marked down
> 
> (b) while in the above state, the ARP monitor will cycle through each
> interface, making them "active" (active-ish, really, just enough for the
> ARP mon stuff to work) in turn to check for a response to a probe

Yes

> 
> (c) while the cycling from (b) is occurring, attempts to change a NS
> target will fail on the interface that happens to be quasi-"active" at
> the moment.

Yes, this is because bond_slave_ns_maddrs_del() must ensure the deletion
happens on a backup slave only. However, during ARP monitor, it set one of
the slaves to active, this causes the deletion of multicast MAC addresses to
be skipped on that interface.

> 	Is my summary correct?
> 
> 	Doesn't the failure scenario also require that arp_validate be
> enabled?  Looking at bond_slave_ns_maddrs_{add,del}, they do nothing if
> arp_validate is off.

Yes, it need.

> 
> >To fix this issue, move the NS multicast address add/remove logic into
> >bond_set_slave_state() to ensure multicast MAC addresses are updated
> >synchronously whenever the slave state changes.
> 
> 	Ok, but state change calls happen in a lot more places than the
> existing bond_hw_addr_swap(), which is only called during change of
> active for active-backup, balance-alb, and balance-tlb.  Are you sure
> that something goofy like setting arp_validate and an NS target with the
> ARP monitor disabled (or in a mode that disallows it) will behave
> rationally?

The slave_can_set_ns_maddr() in slave_set_ns_maddrs could check the bond mode
and if the slave is active. But no arp_interval checking. I can add it in the
checking to avoid the miss-config. e.g.

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 91893c29b899..21116362cc24 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1241,6 +1241,7 @@ static int bond_option_arp_ip_targets_set(struct bonding *bond,
 static bool slave_can_set_ns_maddr(const struct bonding *bond, struct slave *slave)
 {
        return BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
+              bond->params.arp_interval &&
               !bond_is_active_slave(slave) &&
               slave->dev->flags & IFF_MULTICAST;
 }

> 
> >Note: The call to bond_slave_ns_maddrs_del() in __bond_release_one() is
> >kept, as it is still required to clean up multicast MAC addresses when
> >a slave is removed.
> >
> >Fixes: 8eb36164d1a6 ("bonding: add ns target multicast address to slave device")
> >Reported-by: Liang Li <liali@...hat.com>
> >Signed-off-by: Hangbin Liu <liuhangbin@...il.com>
> >---
> > drivers/net/bonding/bond_main.c | 9 ---------
> > include/net/bonding.h           | 7 +++++++
> > 2 files changed, 7 insertions(+), 9 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index 8ea183da8d53..6dde6f870ee2 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -1004,8 +1004,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
> > 
> > 		if (bond->dev->flags & IFF_UP)
> > 			bond_hw_addr_flush(bond->dev, old_active->dev);
> >-
> >-		bond_slave_ns_maddrs_add(bond, old_active);
> > 	}
> > 
> > 	if (new_active) {
> >@@ -1022,8 +1020,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
> > 			dev_mc_sync(new_active->dev, bond->dev);
> > 			netif_addr_unlock_bh(bond->dev);
> > 		}
> >-
> >-		bond_slave_ns_maddrs_del(bond, new_active);
> > 	}
> > }
> > 
> >@@ -2350,11 +2346,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> > 	bond_compute_features(bond);
> > 	bond_set_carrier(bond);
> > 
> >-	/* Needs to be called before bond_select_active_slave(), which will
> >-	 * remove the maddrs if the slave is selected as active slave.
> >-	 */
> >-	bond_slave_ns_maddrs_add(bond, new_slave);
> >-
> > 	if (bond_uses_primary(bond)) {
> > 		block_netpoll_tx();
> > 		bond_select_active_slave(bond);
> >diff --git a/include/net/bonding.h b/include/net/bonding.h
> >index 95f67b308c19..0041f7a2bd18 100644
> >--- a/include/net/bonding.h
> >+++ b/include/net/bonding.h
> >@@ -385,7 +385,14 @@ static inline void bond_set_slave_state(struct slave *slave,
> > 	if (slave->backup == slave_state)
> > 		return;
> > 
> >+	if (slave_state == BOND_STATE_ACTIVE)
> >+		bond_slave_ns_maddrs_del(slave->bond, slave);
> >+
> > 	slave->backup = slave_state;
> >+
> >+	if (slave_state == BOND_STATE_BACKUP)
> >+		bond_slave_ns_maddrs_add(slave->bond, slave);
> 
> 	This code pattern kind of makes it look like the slave->backup
> assignment must happen between the two new if blocks.  I don't think
> that's true, and things would work correctly if the slave->backup
> assignment happened first (or last).

The slave->backup assignment must happen between the two if blocks, because

bond_slave_ns_maddrs_add/del only do the operation on backup slave.
So if a interface become active, we need to call maddrs_del before it set
backup state to active. If a interface become backup. We need to call
maddrs_add after the backup state set to backup.

I will add a comment in the code.

Thanks
Hangbin
> 
> 	Assuming I'm correct, could you move the assignment so it's not
> in the middle?  If, however, it does need to be in the middle, that
> deserves a comment explaining why.
> 
> 	-J
> 
> >+
> > 	if (notify) {
> > 		bond_lower_state_changed(slave);
> > 		bond_queue_slave_event(slave);
> >-- 
> >2.46.0
> >
> 
> ---
> 	-Jay Vosburgh, jv@...sburgh.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ