linux-kernel - Re: [PATCH] bonding: Improve the accuracy of LACPDU transmissions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32313811-2215-41c3-852f-8e257487dfb6@gmail.com>
Date: Tue, 24 Jun 2025 17:27:45 -0500
From: Carlos Bilbao <carlos.bilbao.osdev@...il.com>
To: Jay Vosburgh <jv@...sburgh.net>, Tonghao Zhang <tonghao@...aicloud.com>
Cc: carlos.bilbao@...nel.org, andrew+netdev@...n.ch, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org, sforshee@...nel.org,
 bilbao@...edu
Subject: Re: [PATCH] bonding: Improve the accuracy of LACPDU transmissions

Hello both,

On 6/24/25 16:15, Jay Vosburgh wrote:
> Tonghao Zhang <tonghao@...aicloud.com> wrote:
>
>>
>>
>>> 2025年6月19日 03:53，carlos.bilbao@...nel.org 写道：
>>>
>>> From: Carlos Bilbao <carlos.bilbao@...nel.org>
>>>
>>> Improve the timing accuracy of LACPDU transmissions in the bonding 802.3ad
>>> (LACP) driver. The current approach relies on a decrementing counter to
>>> limit the transmission rate. In our experience, this method is susceptible
>>> to delays (such as those caused by CPU contention or soft lockups) which
>>> can lead to accumulated drift in the LACPDU send interval. Over time, this
>>> drift can cause synchronization issues with the top-of-rack (ToR) switch
>>> managing the LAG, manifesting as lag map flapping. This in turn can trigger
>>> temporary interface removal and potential packet loss.
> 	So, you're saying that contention or soft lockups are causing
> the queue_delayed_work() of bond_3ad_state_machine_handler() to be
> scheduled late, and, then, because the LACPDU TX limiter is based on the
> number of state machine executions (which should be every 100ms), it is
> then late sending LACPDUs?
>
> 	If the core problem is that the state machine as a whole isn't
> running regularly, how is doing a clock-based time check reliable?  Or
> should I take the word "improve" from the Subject literally, and assume
> it's making things "better" but not "totally perfect"?
>
> 	Is the sync issue with the TOR due to missing / delayed LACPDUs,
> or is there more to it?  At the fast LACP rate, the periodic timeout is
> 3 seconds for a nominal 1 second LACPDU interval, which is fairly
> generous.
>
>>> This patch improves stability with a jiffies-based mechanism to track and
>>> enforce the minimum transmission interval; keeping track of when the next
>>> LACPDU should be sent.
>>>
>>> Suggested-by: Seth Forshee (DigitalOcean) <sforshee@...nel.org>
>>> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@...nel.org>
>>> ---
>>> drivers/net/bonding/bond_3ad.c | 18 ++++++++----------
>>> include/net/bond_3ad.h         |  5 +----
>>> 2 files changed, 9 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>> index c6807e473ab7..47610697e4e5 100644
>>> --- a/drivers/net/bonding/bond_3ad.c
>>> +++ b/drivers/net/bonding/bond_3ad.c
>>> @@ -1375,10 +1375,12 @@ static void ad_churn_machine(struct port *port)
>>>   */
>>> static void ad_tx_machine(struct port *port)
>>> {
>>> - /* check if tx timer expired, to verify that we do not send more than
>>> - * 3 packets per second
>>> - */
>>> - if (port->sm_tx_timer_counter && !(--port->sm_tx_timer_counter)) {
>>> + unsigned long now = jiffies;
>>> +
>>> + /* Check if enough time has passed since the last LACPDU sent */
>>> + if (time_after_eq(now, port->sm_tx_next_jiffies)) {
>>> + port->sm_tx_next_jiffies += ad_ticks_per_sec / AD_MAX_TX_IN_SECOND;
>>> +
>>> /* check if there is something to send */
>>> if (port->ntt && (port->sm_vars & AD_PORT_LACP_ENABLED)) {
>>> __update_lacpdu_from_port(port);
>>> @@ -1395,10 +1397,6 @@ static void ad_tx_machine(struct port *port)
>>> port->ntt = false;
>>> }
>>> }
>>> - /* restart tx timer(to verify that we will not exceed
>>> - * AD_MAX_TX_IN_SECOND
>>> - */
>>> - port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
>>> }
>>> }
>>>
>>> @@ -2199,9 +2197,9 @@ void bond_3ad_bind_slave(struct slave *slave)
>>> /* actor system is the bond's system */
>>> __ad_actor_update_port(port);
>>> /* tx timer(to verify that no more than MAX_TX_IN_SECOND
>>> - * lacpdu's are sent in one second)
>>> + * lacpdu's are sent in the configured interval (1 or 30 secs))
>>> */
>>> - port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
>>> + port->sm_tx_next_jiffies = jiffies + ad_ticks_per_sec / AD_MAX_TX_IN_SECOND;
>> If CONFIG_HZ is 1000, there is 1000 tick per second, but "ad_ticks_per_sec / AD_MAX_TX_IN_SECOND” == 10/3 == 3, so that means send lacp packets every 3 ticks ?
> 	Agreed, I think the math is off here.
>
> 	ad_ticks_per_sec is 10, it's the number of times the entire LACP
> state machine runs per second.  It is unrelated to jiffies, and can't be
> used directly with jiffy units (the duration of which varies depending
> on what CONFIG_HZ is).  I agree that it's confusingly similar to
> ad_delta_in_ticks, which is measured in jiffy units.
>
> 	You'll probably want to use msecs_to_jiffies() somewhere.


Thank you for taking the time to review my patch!

I agree, the math is off here and I plan to fix that in v2. It shouldn't
be:

port->sm_tx_next_jiffies = jiffies + ad_ticks_per_sec / AD_MAX_TX_IN_SECOND;
port->sm_tx_next_jiffies = jiffies + 3;

... instead, it should be:

port->sm_tx_next_jiffies = jiffies + HZ / AD_MAX_TX_IN_SECOND;

... which, assuming CONFIG_HZ is 1000, it would be:

port->sm_tx_next_jiffies = jiffies + 333;

This math also works in terms of the units too:

HZ = ticks / sec
AD_MAX_X_IN_SECOND = packets / sec

so:

(ticks / sec) / (packets /sec) = ticks / packet


>
> 	How did you test this to insure the TX machine doesn't overrun
> (i.e., exceed AD_MAX_TX_IN_SECOND LACPDU transmissions in one second)?
>
> 	-J
>
>>> __disable_port(port);
>>>
>>> diff --git a/include/net/bond_3ad.h b/include/net/bond_3ad.h
>>> index 2053cd8e788a..956d4cb45db1 100644
>>> --- a/include/net/bond_3ad.h
>>> +++ b/include/net/bond_3ad.h
>>> @@ -231,10 +231,7 @@ typedef struct port {
>>> mux_states_t sm_mux_state; /* state machine mux state */
>>> u16 sm_mux_timer_counter; /* state machine mux timer counter */
>>> tx_states_t sm_tx_state; /* state machine tx state */
>>> - u16 sm_tx_timer_counter; /* state machine tx timer counter
>>> - * (always on - enter to transmit
>>> - *  state 3 time per second)
>>> - */
>>> + unsigned long sm_tx_next_jiffies;/* expected jiffies for next LACPDU sent */
>>> u16 sm_churn_actor_timer_counter;
>>> u16 sm_churn_partner_timer_counter;
>>> u32 churn_actor_count;
>>> -- 
>>> 2.43.0
>>>
>>>
>>>
> ---
> 	-Jay Vosburgh, jv@...sburgh.net
>

Thanks,

Carlos