[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1755111349-416632-1-git-send-email-tariqt@nvidia.com>
Date: Wed, 13 Aug 2025 21:55:44 +0300
From: Tariq Toukan <tariqt@...dia.com>
To: Jiri Pirko <jiri@...dia.com>, Jiri Pirko <jiri@...nulli.us>, Eric Dumazet
<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>
CC: Donald Hunter <donald.hunter@...il.com>, Jonathan Corbet <corbet@....net>,
Brett Creeley <brett.creeley@....com>, Michael Chan
<michael.chan@...adcom.com>, Pavan Chebbi <pavan.chebbi@...adcom.com>, "Cai
Huoqing" <cai.huoqing@...ux.dev>, Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>, Sunil Goutham
<sgoutham@...vell.com>, Linu Cherian <lcherian@...vell.com>, Geetha sowjanya
<gakula@...vell.com>, Jerin Jacob <jerinj@...vell.com>, hariprasad
<hkelam@...vell.com>, Subbaraya Sundeep <sbhatta@...vell.com>, Saeed Mahameed
<saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>, Tariq Toukan
<tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>, Ido Schimmel
<idosch@...dia.com>, Petr Machata <petrm@...dia.com>, Manish Chopra
<manishc@...vell.com>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-doc@...r.kernel.org>,
<intel-wired-lan@...ts.osuosl.org>, <linux-rdma@...r.kernel.org>, "Gal
Pressman" <gal@...dia.com>, Dragos Tatulea <dtatulea@...dia.com>, "Shahar
Shitrit" <shshitrit@...dia.com>
Subject: [PATCH net-next V3 0/5] Expose error burst period for devlink health reporter
Hi,
This series by Shahar implements error burst period in devlink health
reporter, and use it in mlx5e driver.
This is V3. Find previous versions here:
https://lore.kernel.org/all/1753390134-345154-1-git-send-email-tariqt@nvidia.com/
https://lore.kernel.org/all/1752768442-264413-1-git-send-email-tariqt@nvidia.com/
See detailed feature description by Shahar below [1].
Regards,
Tariq
V3:
- Rebase.
- Rename feature: graceful period delay -> error burst period.
[1]
Currently, the devlink health reporter initiates the grace period
immediately after recovering an error, which blocks further recovery
attempts until the grace period concludes. Since additional errors are
not generally expected during this short interval, any new error
reported during the grace period is not only rejected but also causes
the reporter to enter an error state that requires manual intervention.
This approach poses a problem in scenarios where a single root cause
triggers multiple related errors in quick succession - for example,
a PCI issue affecting multiple hardware queues. Because these errors
are closely related and occur rapidly, it is more effective to handle
them together rather than handling only the first one reported and
blocking any subsequent recovery attempts. Furthermore, setting the
reporter to an error state in this context can be misleading, as these
multiple errors are manifestations of a single underlying issue, making
it unlike the general case where additional errors are not expected
during the grace period.
To resolve this, introduce a configurable error burst period attribute
to the devlink health reporter. This period starts when the first error
is recovered and lasts for a user-defined duration. Once this error
burst period expires, the grace period begins. After the grace period
ends, a new reported error will start the same flow again.
Timeline summary:
----|--------|------------------------------/----------------------/--
error is error is error burst period grace period
reported recovered (recoveries allowed) (recoveries blocked)
With error burst period, create a time window during which recovery
attempts are permitted, allowing all reported errors to be handled
sequentially before the grace period starts. Once the grace period
begins, it prevents any further error recoveries until it ends.
When error burst period is set to 0, current behavior is preserved.
Design alternatives considered:
1. Recover all queues upon any error:
A brute-force approach that recovers all queues on any error.
While simple, it is overly aggressive and disrupts unaffected queues
unnecessarily. Also, because this is handled entirely within the
driver, it leads to a driver-specific implementation rather than a
generic one.
2. Per-queue reporter:
This design would isolate recovery handling per SQ or RQ, effectively
removing inter-dependencies between queues. While conceptually clean,
it introduces significant scalability challenges as the number of
queues grows, as well as synchronization challenges across multiple
reporters.
3. Error aggregation with delayed handling:
Errors arriving during the grace period are saved and processed after
it ends. While addressing the issue of related errors whose recovery
is aborted as grace period started, this adds complexity due to
synchronization needs and contradicts the assumption that no errors
should occur during a healthy system’s grace period. Also, this
breaks the important role of grace period in preventing an infinite
loop of immediate error detection following recovery. In such cases
we want to stop.
4. Allowing a fixed burst of errors before starting grace period:
Allows a set number of recoveries before the grace period begins.
However, it also requires limiting the error reporting window.
To keep the design simple, the burst threshold becomes redundant.
The error burst period design was chosen for its simplicity and
precision in addressing the problem at hand. It effectively captures
the temporal correlation of related errors and aligns with the original
intent of the grace period as a stabilization window where further
errors are unexpected, and if they do occur, they indicate an abnormal
system state.
Shahar Shitrit (5):
devlink: Move graceful period parameter to reporter ops
devlink: Move health reporter recovery abort logic to a separate
function
devlink: Introduce error burst period for health reporter
devlink: Make health reporter error burst period configurable
net/mlx5e: Set default error burst period for TX and RX reporters
Documentation/netlink/specs/devlink.yaml | 6 +
.../networking/devlink/devlink-health.rst | 2 +-
drivers/net/ethernet/amd/pds_core/main.c | 2 +-
.../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 2 +-
.../net/ethernet/huawei/hinic/hinic_devlink.c | 10 +-
.../net/ethernet/intel/ice/devlink/health.c | 3 +-
.../marvell/octeontx2/af/rvu_devlink.c | 32 +++--
.../mellanox/mlx5/core/diag/reporter_vnic.c | 2 +-
.../mellanox/mlx5/core/en/reporter_rx.c | 12 +-
.../mellanox/mlx5/core/en/reporter_tx.c | 13 ++-
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/health.c | 41 ++++---
drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_devlink.c | 10 +-
drivers/net/netdevsim/health.c | 4 +-
include/net/devlink.h | 15 ++-
include/uapi/linux/devlink.h | 2 +
net/devlink/health.c | 109 +++++++++++++-----
net/devlink/netlink_gen.c | 5 +-
19 files changed, 189 insertions(+), 85 deletions(-)
base-commit: fdbe93b7f0f86c943351ceab26c8fad548869f91
--
2.31.1
Powered by blists - more mailing lists