linux-kernel - [PATCH net-next 5/5] net/mlx5e: Set default grace period delay for TX and RX reporters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1752768442-264413-6-git-send-email-tariqt@nvidia.com>
Date: Thu, 17 Jul 2025 19:07:22 +0300
From: Tariq Toukan <tariqt@...dia.com>
To: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, Jiri Pirko <jiri@...nulli.us>, Jiri Pirko
	<jiri@...dia.com>
CC: Saeed Mahameed <saeed@...nel.org>, Gal Pressman <gal@...dia.com>, "Leon
 Romanovsky" <leon@...nel.org>, Shahar Shitrit <shshitrit@...dia.com>, "Donald
 Hunter" <donald.hunter@...il.com>, Jonathan Corbet <corbet@....net>, "Brett
 Creeley" <brett.creeley@....com>, Michael Chan <michael.chan@...adcom.com>,
	Pavan Chebbi <pavan.chebbi@...adcom.com>, Cai Huoqing
	<cai.huoqing@...ux.dev>, Tony Nguyen <anthony.l.nguyen@...el.com>, "Przemek
 Kitszel" <przemyslaw.kitszel@...el.com>, Sunil Goutham
	<sgoutham@...vell.com>, Linu Cherian <lcherian@...vell.com>, Geetha sowjanya
	<gakula@...vell.com>, Jerin Jacob <jerinj@...vell.com>, hariprasad
	<hkelam@...vell.com>, "Subbaraya Sundeep" <sbhatta@...vell.com>, Saeed
 Mahameed <saeedm@...dia.com>, "Tariq Toukan" <tariqt@...dia.com>, Mark Bloch
	<mbloch@...dia.com>, Ido Schimmel <idosch@...dia.com>, Petr Machata
	<petrm@...dia.com>, Manish Chopra <manishc@...vell.com>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<linux-doc@...r.kernel.org>, <intel-wired-lan@...ts.osuosl.org>,
	<linux-rdma@...r.kernel.org>
Subject: [PATCH net-next 5/5] net/mlx5e: Set default grace period delay for TX and RX reporters

From: Shahar Shitrit <shshitrit@...dia.com>

System errors can sometimes cause multiple errors to be reported
to the TX reporter at the same time. For instance, lost interrupts
may cause several SQs to time out simultaneously. When dev_watchdog
notifies the driver for that, it iterates over all SQs to trigger
recovery for the timed-out ones, via TX health reporter.
However, grace period allows only one recovery at a time, so only
the first SQ recovers while others remain blocked. Since no further
recoveries are allowed during the grace period, subsequent errors
cause the reporter to enter an ERROR state, requiring manual
intervention.

To address this, set the TX reporter's default grace period
delay to 0.5 second. This allows the reporter to detect and handle
all timed-out SQs within this delay window before initiating the
grace period.

To account for the possibility of a similar issue in the RX reporter,
its default grace period delay is also configured.

Additionally, while here, align the TX definition prefix with the RX,
as these are used only in EN driver.

Signed-off-by: Shahar Shitrit <shshitrit@...dia.com>
Reviewed-by: Jiri Pirko <jiri@...dia.com>
Reviewed-by: Carolina Jubran <cjubran@...dia.com>
Signed-off-by: Tariq Toukan <tariqt@...dia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c | 3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 7 +++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index e106f0696486..feb3f2bce830 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -645,6 +645,7 @@ void mlx5e_reporter_icosq_resume_recovery(struct mlx5e_channel *c)
 }
 
 #define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500
+#define MLX5E_REPORTER_RX_GRACEFUL_PERIOD_DELAY 500
 
 static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
 	.name = "rx",
@@ -652,6 +653,8 @@ static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
 	.diagnose = mlx5e_rx_reporter_diagnose,
 	.dump = mlx5e_rx_reporter_dump,
 	.default_graceful_period = MLX5E_REPORTER_RX_GRACEFUL_PERIOD,
+	.default_graceful_period_delay =
+		MLX5E_REPORTER_RX_GRACEFUL_PERIOD_DELAY,
 };
 
 void mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 6fb0d143ad1b..515b77585926 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -514,14 +514,17 @@ void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq)
 	mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
 }
 
-#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+#define MLX5E_REPORTER_TX_GRACEFUL_PERIOD 500
+#define MLX5E_REPORTER_TX_GRACEFUL_PERIOD_DELAY 500
 
 static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
 		.name = "tx",
 		.recover = mlx5e_tx_reporter_recover,
 		.diagnose = mlx5e_tx_reporter_diagnose,
 		.dump = mlx5e_tx_reporter_dump,
-		.default_graceful_period = MLX5_REPORTER_TX_GRACEFUL_PERIOD,
+		.default_graceful_period = MLX5E_REPORTER_TX_GRACEFUL_PERIOD,
+		.default_graceful_period_delay =
+			MLX5E_REPORTER_TX_GRACEFUL_PERIOD_DELAY,
 };
 
 void mlx5e_reporter_tx_create(struct mlx5e_priv *priv)
-- 
2.31.1