netdev - Re: [PATCH] bonding: Improve the accuracy of LACPDU transmissions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f34cec6a-67f4-444c-8807-5fbf3c4335b7@gmail.com>
Date: Tue, 24 Jun 2025 17:33:38 -0500
From: Carlos Bilbao <carlos.bilbao.osdev@...il.com>
To: Jay Vosburgh <jv@...sburgh.net>, Tonghao Zhang <tonghao@...aicloud.com>
Cc: carlos.bilbao@...nel.org, andrew+netdev@...n.ch, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org, sforshee@...nel.org,
 bilbao@...edu
Subject: Re: [PATCH] bonding: Improve the accuracy of LACPDU transmissions

On 6/24/25 17:27, Carlos Bilbao wrote:

> Hello both,
>
> On 6/24/25 16:15, Jay Vosburgh wrote:
>> Tonghao Zhang <tonghao@...aicloud.com> wrote:
>>
>>>
>>>
>>>> 2025年6月19日 03:53，carlos.bilbao@...nel.org 写道：
>>>>
>>>> From: Carlos Bilbao <carlos.bilbao@...nel.org>
>>>>
>>>> Improve the timing accuracy of LACPDU transmissions in the bonding 
>>>> 802.3ad
>>>> (LACP) driver. The current approach relies on a decrementing 
>>>> counter to
>>>> limit the transmission rate. In our experience, this method is 
>>>> susceptible
>>>> to delays (such as those caused by CPU contention or soft lockups) 
>>>> which
>>>> can lead to accumulated drift in the LACPDU send interval. Over 
>>>> time, this
>>>> drift can cause synchronization issues with the top-of-rack (ToR) 
>>>> switch
>>>> managing the LAG, manifesting as lag map flapping. This in turn can 
>>>> trigger
>>>> temporary interface removal and potential packet loss.
>>     So, you're saying that contention or soft lockups are causing
>> the queue_delayed_work() of bond_3ad_state_machine_handler() to be
>> scheduled late, and, then, because the LACPDU TX limiter is based on the
>> number of state machine executions (which should be every 100ms), it is
>> then late sending LACPDUs?
>>
>>     If the core problem is that the state machine as a whole isn't
>> running regularly, how is doing a clock-based time check reliable?  Or
>> should I take the word "improve" from the Subject literally, and assume
>> it's making things "better" but not "totally perfect"?
>>
>>     Is the sync issue with the TOR due to missing / delayed LACPDUs,
>> or is there more to it?  At the fast LACP rate, the periodic timeout is
>> 3 seconds for a nominal 1 second LACPDU interval, which is fairly
>> generous.


Also, I didn’t ignore your comments -- I agree there’s a deeper issue at
the state machine level. I’m looking into it and will get back to y'all.


>>
>>>> This patch improves stability with a jiffies-based mechanism to 
>>>> track and
>>>> enforce the minimum transmission interval; keeping track of when 
>>>> the next
>>>> LACPDU should be sent.
>>>>
>>>> Suggested-by: Seth Forshee (DigitalOcean) <sforshee@...nel.org>
>>>> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@...nel.org>
>>>> ---
>>>> drivers/net/bonding/bond_3ad.c | 18 ++++++++----------
>>>> include/net/bond_3ad.h         |  5 +----
>>>> 2 files changed, 9 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/bond_3ad.c 
>>>> b/drivers/net/bonding/bond_3ad.c
>>>> index c6807e473ab7..47610697e4e5 100644
>>>> --- a/drivers/net/bonding/bond_3ad.c
>>>> +++ b/drivers/net/bonding/bond_3ad.c
>>>> @@ -1375,10 +1375,12 @@ static void ad_churn_machine(struct port 
>>>> *port)
>>>>   */
>>>> static void ad_tx_machine(struct port *port)
>>>> {
>>>> - /* check if tx timer expired, to verify that we do not send more 
>>>> than
>>>> - * 3 packets per second
>>>> - */
>>>> - if (port->sm_tx_timer_counter && !(--port->sm_tx_timer_counter)) {
>>>> + unsigned long now = jiffies;
>>>> +
>>>> + /* Check if enough time has passed since the last LACPDU sent */
>>>> + if (time_after_eq(now, port->sm_tx_next_jiffies)) {
>>>> + port->sm_tx_next_jiffies += ad_ticks_per_sec / AD_MAX_TX_IN_SECOND;
>>>> +
>>>> /* check if there is something to send */
>>>> if (port->ntt && (port->sm_vars & AD_PORT_LACP_ENABLED)) {
>>>> __update_lacpdu_from_port(port);
>>>> @@ -1395,10 +1397,6 @@ static void ad_tx_machine(struct port *port)
>>>> port->ntt = false;
>>>> }
>>>> }
>>>> - /* restart tx timer(to verify that we will not exceed
>>>> - * AD_MAX_TX_IN_SECOND
>>>> - */
>>>> - port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
>>>> }
>>>> }
>>>>
>>>> @@ -2199,9 +2197,9 @@ void bond_3ad_bind_slave(struct slave *slave)
>>>> /* actor system is the bond's system */
>>>> __ad_actor_update_port(port);
>>>> /* tx timer(to verify that no more than MAX_TX_IN_SECOND
>>>> - * lacpdu's are sent in one second)
>>>> + * lacpdu's are sent in the configured interval (1 or 30 secs))
>>>> */
>>>> - port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
>>>> + port->sm_tx_next_jiffies = jiffies + ad_ticks_per_sec / 
>>>> AD_MAX_TX_IN_SECOND;
>>> If CONFIG_HZ is 1000, there is 1000 tick per second, but 
>>> "ad_ticks_per_sec / AD_MAX_TX_IN_SECOND” == 10/3 == 3, so that means 
>>> send lacp packets every 3 ticks ?
>>     Agreed, I think the math is off here.
>>
>>     ad_ticks_per_sec is 10, it's the number of times the entire LACP
>> state machine runs per second.  It is unrelated to jiffies, and can't be
>> used directly with jiffy units (the duration of which varies depending
>> on what CONFIG_HZ is).  I agree that it's confusingly similar to
>> ad_delta_in_ticks, which is measured in jiffy units.
>>
>>     You'll probably want to use msecs_to_jiffies() somewhere.
>
>
> Thank you for taking the time to review my patch!
>
> I agree, the math is off here and I plan to fix that in v2. It shouldn't
> be:
>
> port->sm_tx_next_jiffies = jiffies + ad_ticks_per_sec / 
> AD_MAX_TX_IN_SECOND;
> port->sm_tx_next_jiffies = jiffies + 3;
>
> ... instead, it should be:
>
> port->sm_tx_next_jiffies = jiffies + HZ / AD_MAX_TX_IN_SECOND;
>
> ... which, assuming CONFIG_HZ is 1000, it would be:
>
> port->sm_tx_next_jiffies = jiffies + 333;
>
> This math also works in terms of the units too:
>
> HZ = ticks / sec
> AD_MAX_X_IN_SECOND = packets / sec
>
> so:
>
> (ticks / sec) / (packets /sec) = ticks / packet
>
>
>>
>>     How did you test this to insure the TX machine doesn't overrun
>> (i.e., exceed AD_MAX_TX_IN_SECOND LACPDU transmissions in one second)?
>>
>>     -J
>>
>>>> __disable_port(port);
>>>>
>>>> diff --git a/include/net/bond_3ad.h b/include/net/bond_3ad.h
>>>> index 2053cd8e788a..956d4cb45db1 100644
>>>> --- a/include/net/bond_3ad.h
>>>> +++ b/include/net/bond_3ad.h
>>>> @@ -231,10 +231,7 @@ typedef struct port {
>>>> mux_states_t sm_mux_state; /* state machine mux state */
>>>> u16 sm_mux_timer_counter; /* state machine mux timer counter */
>>>> tx_states_t sm_tx_state; /* state machine tx state */
>>>> - u16 sm_tx_timer_counter; /* state machine tx timer counter
>>>> - * (always on - enter to transmit
>>>> - *  state 3 time per second)
>>>> - */
>>>> + unsigned long sm_tx_next_jiffies;/* expected jiffies for next 
>>>> LACPDU sent */
>>>> u16 sm_churn_actor_timer_counter;
>>>> u16 sm_churn_partner_timer_counter;
>>>> u32 churn_actor_count;
>>>> -- 
>>>> 2.43.0
>>>>
>>>>
>>>>
>> ---
>>     -Jay Vosburgh, jv@...sburgh.net
>>
>
> Thanks,
>
> Carlos
>

Thanks,

Carlos