lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250618195309.368645-1-carlos.bilbao@kernel.org>
Date: Wed, 18 Jun 2025 14:53:09 -0500
From: carlos.bilbao@...nel.org
To: jv@...sburgh.net,
	andrew+netdev@...n.ch,
	davem@...emloft.net,
	edumazet@...gle.com,
	kuba@...nel.org,
	pabeni@...hat.com,
	horms@...nel.org,
	netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Cc: sforshee@...nel.org,
	bilbao@...edu,
	Carlos Bilbao <carlos.bilbao@...nel.org>
Subject: [PATCH] bonding: Improve the accuracy of LACPDU transmissions

From: Carlos Bilbao <carlos.bilbao@...nel.org>

Improve the timing accuracy of LACPDU transmissions in the bonding 802.3ad
(LACP) driver. The current approach relies on a decrementing counter to
limit the transmission rate. In our experience, this method is susceptible
to delays (such as those caused by CPU contention or soft lockups) which
can lead to accumulated drift in the LACPDU send interval. Over time, this
drift can cause synchronization issues with the top-of-rack (ToR) switch
managing the LAG, manifesting as lag map flapping. This in turn can trigger
temporary interface removal and potential packet loss.

This patch improves stability with a jiffies-based mechanism to track and
enforce the minimum transmission interval; keeping track of when the next
LACPDU should be sent.

Suggested-by: Seth Forshee (DigitalOcean) <sforshee@...nel.org>
Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@...nel.org>
---
 drivers/net/bonding/bond_3ad.c | 18 ++++++++----------
 include/net/bond_3ad.h         |  5 +----
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c6807e473ab7..47610697e4e5 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1375,10 +1375,12 @@ static void ad_churn_machine(struct port *port)
  */
 static void ad_tx_machine(struct port *port)
 {
-	/* check if tx timer expired, to verify that we do not send more than
-	 * 3 packets per second
-	 */
-	if (port->sm_tx_timer_counter && !(--port->sm_tx_timer_counter)) {
+	unsigned long now = jiffies;
+
+	/* Check if enough time has passed since the last LACPDU sent */
+	if (time_after_eq(now, port->sm_tx_next_jiffies)) {
+		port->sm_tx_next_jiffies += ad_ticks_per_sec / AD_MAX_TX_IN_SECOND;
+
 		/* check if there is something to send */
 		if (port->ntt && (port->sm_vars & AD_PORT_LACP_ENABLED)) {
 			__update_lacpdu_from_port(port);
@@ -1395,10 +1397,6 @@ static void ad_tx_machine(struct port *port)
 				port->ntt = false;
 			}
 		}
-		/* restart tx timer(to verify that we will not exceed
-		 * AD_MAX_TX_IN_SECOND
-		 */
-		port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
 	}
 }
 
@@ -2199,9 +2197,9 @@ void bond_3ad_bind_slave(struct slave *slave)
 		/* actor system is the bond's system */
 		__ad_actor_update_port(port);
 		/* tx timer(to verify that no more than MAX_TX_IN_SECOND
-		 * lacpdu's are sent in one second)
+		 * lacpdu's are sent in the configured interval (1 or 30 secs))
 		 */
-		port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
+		port->sm_tx_next_jiffies = jiffies + ad_ticks_per_sec / AD_MAX_TX_IN_SECOND;
 
 		__disable_port(port);
 
diff --git a/include/net/bond_3ad.h b/include/net/bond_3ad.h
index 2053cd8e788a..956d4cb45db1 100644
--- a/include/net/bond_3ad.h
+++ b/include/net/bond_3ad.h
@@ -231,10 +231,7 @@ typedef struct port {
 	mux_states_t sm_mux_state;	/* state machine mux state */
 	u16 sm_mux_timer_counter;	/* state machine mux timer counter */
 	tx_states_t sm_tx_state;	/* state machine tx state */
-	u16 sm_tx_timer_counter;	/* state machine tx timer counter
-					 * (always on - enter to transmit
-					 *  state 3 time per second)
-					 */
+	unsigned long sm_tx_next_jiffies;/* expected jiffies for next LACPDU sent */
 	u16 sm_churn_actor_timer_counter;
 	u16 sm_churn_partner_timer_counter;
 	u32 churn_actor_count;
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ