lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <765825.1752639589@famine>
Date: Tue, 15 Jul 2025 21:19:49 -0700
From: Jay Vosburgh <jv@...sburgh.net>
To: Hangbin Liu <liuhangbin@...il.com>
cc: netdev@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>,
    "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
    Paolo Abeni <pabeni@...hat.com>,
    Nikolay Aleksandrov <razor@...ckwall.org>,
    Simon Horman <horms@...nel.org>, Shuah Khan <shuah@...nel.org>,
    linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net 1/2] bonding: update ntt to true in passive mode

Hangbin Liu <liuhangbin@...il.com> wrote:

>When lacp_active is set to off, the bond operates in passive mode, meaning it
>will only "speak when spoken to." However, the current kernel implementation
>only sends an LACPDU in response when the partner's state changes.
>
>In this situation, once LACP negotiation succeeds, the actor stops sending
>LACPDUs until the partner times out and sends an "expired" LACPDU.
>This leads to endless LACP state flapping.

	From the above, I suspect our implementation isn't compliant to
the standard.  Per IEEE 802.1AX-2014 6.4.1 LACP design elements:

c)	Active or passive participation in LACP is controlled by
	LACP_Activity, an administrative control associated with each
	Aggregation Port, that can take the value Active LACP or Passive
	LACP. Passive LACP indicates the Aggregation Port’s preference
	for not transmitting LACPDUs unless its Partner’s control value
	is Active LACP (i.e., a preference not to speak unless spoken
	to). Active LACP indicates the Aggregation Port’s preference to
	participate in the protocol regardless of the Partner’s control
	value (i.e., a preference to speak regardless).

d)	Periodic transmission of LACPDUs occurs if the LACP_Activity
	control of either the Actor or the Partner is Active LACP. These
	periodic transmissions will occur at either a slow or fast
	transmission rate depending upon the expressed LACP_Timeout
	preference (Long Timeout or Short Timeout) of the Partner
	System.

	Which, in summary, means that if either end (actor or partner)
has LACP_Activity set, both ends must send periodic LACPDUs at the rate
specified by their respective partner's LACP_Timeout rate.

>To avoid this, we need update ntt to true once received an LACPDU from the
>partner, ensuring an immediate reply. With this fix, the link becomes stable
>in most cases, except for one specific scenario:
>
>Actor: lacp_active=off, lacp_rate=slow
>Partner: lacp_active=on, lacp_rate=fast
>
>In this case, the partner expects frequent LACPDUs (every 1 second), but the
>actor only responds after receiving an LACPDU, which, in this setup, the
>partner sends every 30 seconds due to the actor's lacp_rate=slow. By the time
>the actor replies, the partner has already timed out and sent an "expired"
>LACPDU.

	Presuming that I'm correct that we're not implementing 6.4.1 d),
above, correctly, then I don't think this is a proper fix, as it kind of
band-aids over the problem a bit.

	Looking at the code, I suspect the problem revolves around the
"lacp_active" check in ad_periodic_machine():

static void ad_periodic_machine(struct port *port, struct bond_params *bond_params)
{
	periodic_states_t last_state;

	/* keep current state machine state to compare later if it was changed */
	last_state = port->sm_periodic_state;

	/* check if port was reinitialized */
	if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) ||
	    (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) ||
	    !bond_params->lacp_active) {
		port->sm_periodic_state = AD_NO_PERIODIC;
	}

	In the above, because all the tests are chained with ||, the
lacp_active test overrides the two correct-looking
LACP_STATE_LACP_ACTIVITY tests.

	It looks like ad_initialize_port() always sets
LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing
ever clears it.

	Thinking out loud, perhaps this could be fixed by

	a) remove the test of bond_params->lacp_active here, and,

	b) The lacp_active option setting controls whether LACP_ACTIVITY
is set in port->actor_oper_port_state.

	Thoughts?

	-J

>Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>Signed-off-by: Hangbin Liu <liuhangbin@...il.com>
>---
> drivers/net/bonding/bond_3ad.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c6807e473ab7..e001d1c8a49b 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -666,6 +666,8 @@ static void __update_default_selected(struct port *port)
>  */
> static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
> {
>+	struct bonding *bond;
>+
> 	/* validate lacpdu and port */
> 	if (lacpdu && port) {
> 		/* check if any parameter is different then
>@@ -683,6 +685,10 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
> 		   ) {
> 			port->ntt = true;
> 		}
>+
>+		bond = __get_bond_by_port(port);
>+		if (bond && !bond->params.lacp_active)
>+			port->ntt = true;
> 	}
> }
> 
>-- 
>2.46.0
>

---
	-Jay Vosburgh, jv@...sburgh.net


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ