lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <309184.1753351073@vermin>
Date: Thu, 24 Jul 2025 11:57:53 +0200
From: Jay Vosburgh <jv@...sburgh.net>
To: Hangbin Liu <liuhangbin@...il.com>
cc: netdev@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>,
    "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
    Paolo Abeni <pabeni@...hat.com>,
    Nikolay Aleksandrov <razor@...ckwall.org>,
    Simon Horman <horms@...nel.org>, Shuah Khan <shuah@...nel.org>,
    linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net 1/2] bonding: update ntt to true in passive mode

Hangbin Liu <liuhangbin@...il.com> wrote:

>On Tue, Jul 15, 2025 at 09:19:49PM -0700, Jay Vosburgh wrote:
>> 	Presuming that I'm correct that we're not implementing 6.4.1 d),
>> above, correctly, then I don't think this is a proper fix, as it kind of
>> band-aids over the problem a bit.
>> 
>> 	Looking at the code, I suspect the problem revolves around the
>> "lacp_active" check in ad_periodic_machine():
>> 
>> static void ad_periodic_machine(struct port *port, struct bond_params *bond_params)
>> {
>> 	periodic_states_t last_state;
>> 
>> 	/* keep current state machine state to compare later if it was changed */
>> 	last_state = port->sm_periodic_state;
>> 
>> 	/* check if port was reinitialized */
>> 	if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) ||
>> 	    (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) ||
>> 	    !bond_params->lacp_active) {
>> 		port->sm_periodic_state = AD_NO_PERIODIC;
>> 	}
>> 
>> 	In the above, because all the tests are chained with ||, the
>> lacp_active test overrides the two correct-looking
>> LACP_STATE_LACP_ACTIVITY tests.
>> 
>> 	It looks like ad_initialize_port() always sets
>> LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing
>> ever clears it.
>> 
>> 	Thinking out loud, perhaps this could be fixed by
>> 
>> 	a) remove the test of bond_params->lacp_active here, and,
>> 
>> 	b) The lacp_active option setting controls whether LACP_ACTIVITY
>> is set in port->actor_oper_port_state.
>> 
>> 	Thoughts?
>
>Hi Jay,
>
>I did some investigation and testing. In addition to your previous change,
>we also need to initialize the partner's state to 0 in ad_initialize_port_tmpl().
>Otherwise, the check:
>```
>!(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)
>```
>in ad_periodic_machine() will fail even when the actor is in passive mode.

	Agreed; the .port_state in the port_params tmpl should just be
zero; the magic number 1 there now, which is LACP_STATE_LACP_ACTIVITY,
is just wrong.  For the actor side, the lacp_active option will set it
appropriately, and the partner's will be updated by any LACPDUs that
arrive.

>Also, the line:
>```
>port->partner_oper.port_state |= LACP_STATE_LACP_ACTIVITY;
>```
>in ad_rx_machine() should be removed, since we can't assume the partner is in
>active mode. [1]

	Also agreed.

>With these two changes, we can ensure:
>1. In passive mode, the actor will not send LACPDU before receiving any LACPDU from the partner.
>2. Once it receives the partner’s LACPDU, it will start sending periodic LACPDUs as expected.
>
>Do you agree with making these changes? If so, I can post a patch for your review.

	Yes, please post a patch.

>[1] IEEE 8021AX-2020, Figure 6-14—LACP Receive state diagram, the AD_RX_EXPIRED
>statue should be
>```
>Partner_Oper_Port_State.Synchronization = FALSE;
>Partner_Oper_Port_State.Short_Timeout = TRUE;
>Actor_Oper_Port_State.Expired = TRUE;
>LACP_currentWhile = Short_Timeout_Time;
>```

	FWIW, I usually reference the older standards 2008 or 2014, as
the 2020 edition changes a lot of things and bonding isn't necessarily
conformant to those changes (e.g., many of the state machines are
different in large or small ways).  Technically, the bonding
implementation was written to the pre-802.1AX standard when it was still
part of 802.3 (hence the name 802.3ad), clause 43.

	This particular bit (the EXPIRED state actions) is the same,
but, for example, the transition test from EXPIRED to DEFAULTED is
different in the 2014 vs 2020 editions, and we need to be careful not to
implement the state machines piecemeal from different editions of the
standard.

	-J

---
	-Jay Vosburgh, jv@...sburgh.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ