lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1753390134-345154-5-git-send-email-tariqt@nvidia.com>
Date: Thu, 24 Jul 2025 23:48:53 +0300
From: Tariq Toukan <tariqt@...dia.com>
To: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, Jiri Pirko <jiri@...dia.com>, Jiri Pirko
	<jiri@...nulli.us>
CC: Donald Hunter <donald.hunter@...il.com>, Jonathan Corbet <corbet@....net>,
	Brett Creeley <brett.creeley@....com>, Michael Chan
	<michael.chan@...adcom.com>, Pavan Chebbi <pavan.chebbi@...adcom.com>, "Cai
 Huoqing" <cai.huoqing@...ux.dev>, Tony Nguyen <anthony.l.nguyen@...el.com>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>, Sunil Goutham
	<sgoutham@...vell.com>, Linu Cherian <lcherian@...vell.com>, Geetha sowjanya
	<gakula@...vell.com>, Jerin Jacob <jerinj@...vell.com>, hariprasad
	<hkelam@...vell.com>, Subbaraya Sundeep <sbhatta@...vell.com>, Saeed Mahameed
	<saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>, Tariq Toukan
	<tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>, Ido Schimmel
	<idosch@...dia.com>, Petr Machata <petrm@...dia.com>, Manish Chopra
	<manishc@...vell.com>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linux-doc@...r.kernel.org>,
	<intel-wired-lan@...ts.osuosl.org>, <linux-rdma@...r.kernel.org>, "Shahar
 Shitrit" <shshitrit@...dia.com>, Gal Pressman <gal@...dia.com>
Subject: [PATCH net-next V2 4/5] devlink: Make health reporter grace period delay configurable

From: Shahar Shitrit <shshitrit@...dia.com>

Enable configuration of the grace period delay — a time window
starting from the first error recovery, during which the reporter
allows recovery attempts for each reported error.

This feature is helpful when a single underlying issue causes
multiple errors, as it delays the start of the grace period
to allow sufficient time for recovering all related errors.
For example, if multiple TX queues time out simultaneously,
a sufficient grace period delay could allow all affected TX
queues to be recovered within that window. Without this delay,
only the first TX queue that reports a timeout will undergo
recovery, while the remaining TX queues will be blocked once
the grace period begins.

Configuration example:
$ devlink health set pci/0000:00:09.0 reporter tx grace_period_delay 500

Configuration example with ynl:
./tools/net/ynl/pyynl/cli.py \
 --spec Documentation/netlink/specs/devlink.yaml \
 --do health-reporter-set --json '{
  "bus-name": "auxiliary",
  "dev-name": "mlx5_core.eth.0",
  "port-index": 65535,
  "health-reporter-name": "tx",
  "health-reporter-gp-delay": 500
}'

Signed-off-by: Shahar Shitrit <shshitrit@...dia.com>
Reviewed-by: Carolina Jubran <cjubran@...dia.com>
Reviewed-by: Jiri Pirko <jiri@...dia.com>
Signed-off-by: Tariq Toukan <tariqt@...dia.com>
---
 Documentation/netlink/specs/devlink.yaml      |  6 ++++
 .../networking/devlink/devlink-health.rst     |  2 +-
 include/uapi/linux/devlink.h                  |  2 ++
 net/devlink/health.c                          | 29 +++++++++++++++++--
 net/devlink/netlink_gen.c                     |  5 ++--
 5 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index bb87111d5e16..9b996d0abfd3 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -853,6 +853,9 @@ attribute-sets:
         type: nest
         multi-attr: true
         nested-attributes: dl-rate-tc-bws
+      -
+        name: health-reporter-gp-delay
+        type: u64
   -
     name: dl-dev-stats
     subset-of: devlink
@@ -1216,6 +1219,8 @@ attribute-sets:
         name: health-reporter-dump-ts-ns
       -
         name: health-reporter-auto-dump
+      -
+        name: health-reporter-gp-delay
 
   -
     name: dl-attr-stats
@@ -1961,6 +1966,7 @@ operations:
             - health-reporter-graceful-period
             - health-reporter-auto-recover
             - health-reporter-auto-dump
+            - health-reporter-gp-delay
 
     -
       name: health-reporter-recover
diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst
index e0b8cfed610a..07602f678282 100644
--- a/Documentation/networking/devlink/devlink-health.rst
+++ b/Documentation/networking/devlink/devlink-health.rst
@@ -50,7 +50,7 @@ Once an error is reported, devlink health will perform the following actions:
   * Auto recovery attempt is being done. Depends on:
 
     - Auto-recovery configuration
-    - Grace period vs. time passed since last recover
+    - Grace period (and grace period delay)  vs. time passed since last recover
 
 Devlink formatted message
 =========================
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 9fcb25a0f447..a47e7f413511 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -636,6 +636,8 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_RATE_TC_BWS,		/* nested */
 
+	DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY,	/* u64 */
+
 	/* Add new attributes above here, update the spec in
 	 * Documentation/netlink/specs/devlink.yaml and re-generate
 	 * net/devlink/netlink_gen.c.
diff --git a/net/devlink/health.c b/net/devlink/health.c
index a0269975f592..1e9a2d0d0631 100644
--- a/net/devlink/health.c
+++ b/net/devlink/health.c
@@ -113,7 +113,9 @@ __devlink_health_reporter_create(struct devlink *devlink,
 {
 	struct devlink_health_reporter *reporter;
 
-	if (WARN_ON(ops->default_graceful_period && !ops->recover))
+	if (WARN_ON(ops->default_graceful_period_delay &&
+		    !ops->default_graceful_period) ||
+	    WARN_ON(ops->default_graceful_period && !ops->recover))
 		return ERR_PTR(-EINVAL);
 
 	reporter = kzalloc(sizeof(*reporter), GFP_KERNEL);
@@ -293,6 +295,11 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
 	    devlink_nl_put_u64(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,
 			       reporter->graceful_period))
 		goto reporter_nest_cancel;
+	if (reporter->ops->recover &&
+	    devlink_nl_put_u64(msg,
+			       DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY,
+			       reporter->graceful_period_delay))
+		goto reporter_nest_cancel;
 	if (reporter->ops->recover &&
 	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
 		       reporter->auto_recover))
@@ -458,16 +465,32 @@ int devlink_nl_health_reporter_set_doit(struct sk_buff *skb,
 
 	if (!reporter->ops->recover &&
 	    (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] ||
-	     info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]))
+	     info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] ||
+	     info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY]))
 		return -EOPNOTSUPP;
 
 	if (!reporter->ops->dump &&
 	    info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP])
 		return -EOPNOTSUPP;
 
-	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) {
 		reporter->graceful_period =
 			nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]);
+		if (!reporter->graceful_period)
+			reporter->graceful_period_delay = 0;
+	}
+
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY]) {
+		u64 configured_delay =
+			nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY]);
+
+		if (!reporter->graceful_period && configured_delay) {
+			NL_SET_ERR_MSG_MOD(info->extack, "Cannot set grace period delay without a grace period.");
+			return -EINVAL;
+		}
+
+		reporter->graceful_period_delay = configured_delay;
+	}
 
 	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])
 		reporter->auto_recover =
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index d97c326a9045..de3aabba37b5 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -389,7 +389,7 @@ static const struct nla_policy devlink_health_reporter_get_dump_nl_policy[DEVLIN
 };
 
 /* DEVLINK_CMD_HEALTH_REPORTER_SET - do */
-static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP + 1] = {
+static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32, },
@@ -397,6 +397,7 @@ static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLINK_ATT
 	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64, },
 	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8, },
 	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8, },
+	[DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY] = { .type = NLA_U64, },
 };
 
 /* DEVLINK_CMD_HEALTH_REPORTER_RECOVER - do */
@@ -1032,7 +1033,7 @@ const struct genl_split_ops devlink_nl_ops[74] = {
 		.doit		= devlink_nl_health_reporter_set_doit,
 		.post_doit	= devlink_nl_post_doit,
 		.policy		= devlink_health_reporter_set_nl_policy,
-		.maxattr	= DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
+		.maxattr	= DEVLINK_ATTR_HEALTH_REPORTER_GP_DELAY,
 		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
 	{
-- 
2.31.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ