lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <302767.1748034227@famine>
Date: Fri, 23 May 2025 14:03:47 -0700
From: Jay Vosburgh <jv@...sburgh.net>
To: Hangbin Liu <liuhangbin@...il.com>
cc: netdev@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>,
    "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
    Paolo Abeni <pabeni@...hat.com>,
    Nikolay Aleksandrov <razor@...ckwall.org>,
    Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org,
    Liang Li <liali@...hat.com>
Subject: Re: [PATCH net] bonding: fix multicast MAC address synchronization

Hangbin Liu <liuhangbin@...il.com> wrote:

>There is a corner case where the NS (Neighbor Solicitation) target is set to
>an invalid or unreachable address. In such cases, all the slave links are
>marked as down and set to backup. This causes the bond to add multicast MAC
>addresses to all slaves.
>
>However, bond_ab_arp_probe() later tries to activate a carrier on slave and
>sets it as active. If we subsequently change or clear the NS targets, the
>call to bond_slave_ns_maddrs_del() on this interface will fail because it
>is still marked active, and the multicast MAC address will remain.

	This seems complicated, so, just to make sure I'm clear, the bug
being fixed here happens when:

(a) ARP monitor is running with NS target(s), all of which do not
solicit a reply (invalid address or unreachable), resulting in all
interfaces in the bond being marked down

(b) while in the above state, the ARP monitor will cycle through each
interface, making them "active" (active-ish, really, just enough for the
ARP mon stuff to work) in turn to check for a response to a probe

(c) while the cycling from (b) is occurring, attempts to change a NS
target will fail on the interface that happens to be quasi-"active" at
the moment.

	Is my summary correct?

	Doesn't the failure scenario also require that arp_validate be
enabled?  Looking at bond_slave_ns_maddrs_{add,del}, they do nothing if
arp_validate is off.

>To fix this issue, move the NS multicast address add/remove logic into
>bond_set_slave_state() to ensure multicast MAC addresses are updated
>synchronously whenever the slave state changes.

	Ok, but state change calls happen in a lot more places than the
existing bond_hw_addr_swap(), which is only called during change of
active for active-backup, balance-alb, and balance-tlb.  Are you sure
that something goofy like setting arp_validate and an NS target with the
ARP monitor disabled (or in a mode that disallows it) will behave
rationally?

>Note: The call to bond_slave_ns_maddrs_del() in __bond_release_one() is
>kept, as it is still required to clean up multicast MAC addresses when
>a slave is removed.
>
>Fixes: 8eb36164d1a6 ("bonding: add ns target multicast address to slave device")
>Reported-by: Liang Li <liali@...hat.com>
>Signed-off-by: Hangbin Liu <liuhangbin@...il.com>
>---
> drivers/net/bonding/bond_main.c | 9 ---------
> include/net/bonding.h           | 7 +++++++
> 2 files changed, 7 insertions(+), 9 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 8ea183da8d53..6dde6f870ee2 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1004,8 +1004,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
> 
> 		if (bond->dev->flags & IFF_UP)
> 			bond_hw_addr_flush(bond->dev, old_active->dev);
>-
>-		bond_slave_ns_maddrs_add(bond, old_active);
> 	}
> 
> 	if (new_active) {
>@@ -1022,8 +1020,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
> 			dev_mc_sync(new_active->dev, bond->dev);
> 			netif_addr_unlock_bh(bond->dev);
> 		}
>-
>-		bond_slave_ns_maddrs_del(bond, new_active);
> 	}
> }
> 
>@@ -2350,11 +2346,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> 	bond_compute_features(bond);
> 	bond_set_carrier(bond);
> 
>-	/* Needs to be called before bond_select_active_slave(), which will
>-	 * remove the maddrs if the slave is selected as active slave.
>-	 */
>-	bond_slave_ns_maddrs_add(bond, new_slave);
>-
> 	if (bond_uses_primary(bond)) {
> 		block_netpoll_tx();
> 		bond_select_active_slave(bond);
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index 95f67b308c19..0041f7a2bd18 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -385,7 +385,14 @@ static inline void bond_set_slave_state(struct slave *slave,
> 	if (slave->backup == slave_state)
> 		return;
> 
>+	if (slave_state == BOND_STATE_ACTIVE)
>+		bond_slave_ns_maddrs_del(slave->bond, slave);
>+
> 	slave->backup = slave_state;
>+
>+	if (slave_state == BOND_STATE_BACKUP)
>+		bond_slave_ns_maddrs_add(slave->bond, slave);

	This code pattern kind of makes it look like the slave->backup
assignment must happen between the two new if blocks.  I don't think
that's true, and things would work correctly if the slave->backup
assignment happened first (or last).

	Assuming I'm correct, could you move the assignment so it's not
in the middle?  If, however, it does need to be in the middle, that
deserves a comment explaining why.

	-J

>+
> 	if (notify) {
> 		bond_lower_state_changed(slave);
> 		bond_queue_slave_event(slave);
>-- 
>2.46.0
>

---
	-Jay Vosburgh, jv@...sburgh.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ