lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250824084354.533182-1-mbloch@nvidia.com>
Date: Sun, 24 Aug 2025 11:43:49 +0300
From: Mark Bloch <mbloch@...dia.com>
To: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>
CC: Tariq Toukan <tariqt@...dia.com>, Leon Romanovsky <leon@...nel.org>,
	"Saeed Mahameed" <saeedm@...dia.com>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Gal Pressman <gal@...dia.com>, Mark Bloch
	<mbloch@...dia.com>
Subject: [PATCH net-next V4 0/5] Expose burst period for devlink health reporter

Hi,

This series by Shahar implements burst period in devlink health
reporter, and use it in mlx5e driver.

This is V4. Find previous versions here:
v3: https://lore.kernel.org/all/1755111349-416632-1-git-send-email-tariqt@nvidia.com/
v2: https://lore.kernel.org/all/1753390134-345154-1-git-send-email-tariqt@nvidia.com/
v1: https://lore.kernel.org/all/1752768442-264413-1-git-send-email-tariqt@nvidia.com/

Changelog:
v3->v4:
- Renamed error burst period to burst period throughout code and commit
  messages.
- Renamed devlink_health_reporter_burst_period_active() to
  devlink_health_reporter_in_burst().
- Added doc for health-reporter-burst-period attr in devlink.yaml.
– In __devlink_health_reporter_create(), split compound condition into
  two separate WARN_ON() checks for clarity and early return.
- Fixed indentations in devlink_health_reporter_ops doc.

v2->v3:
- Rebase.
- Rename feature: graceful period delay -> error burst period.

v1->v2:
- Rebase.
- Fix long attr name.
- Extend the cover letter.

Shahar writes:
--------------------------------------------------------------------------

Currently, the devlink health reporter initiates the grace period
immediately after recovering an error, which blocks further recovery
attempts until the grace period concludes. Since additional errors
are not generally expected during this short interval, any new error
reported during the grace period is not only rejected but also causes
the reporter to enter an error state that requires manual intervention.

This approach poses a problem in scenarios where a single root cause
triggers multiple related errors in quick succession - for example,
a PCI issue affecting multiple hardware queues. Because these errors
are closely related and occur rapidly, it is more effective to handle
them together rather than handling only the first one reported and
blocking any subsequent recovery attempts. Furthermore, setting the
reporter to an error state in this context can be misleading, as these
multiple errors are manifestations of a single underlying issue, making
it unlike the general case where additional errors are not expected
during the grace period.

To resolve this, introduce a configurable burst period attribute to the
devlink health reporter. This period starts when the first error
is recovered and lasts for a user-defined duration. Once this error
burst period expires, the grace period begins. After the grace period
ends, a new reported error will start the same flow again.

Timeline summary:

----|--------|------------------------------/----------------------/--
error is  error is      burst period             grace period
reported  recovered  (recoveries allowed)     (recoveries blocked)

With burst period, create a time window during which recovery attempts
are permitted, allowing all reported errors to be handled sequentially
before the grace period starts. Once the grace period begins, it
prevents any further error recoveries until it ends.

When burst period is set to 0, current behavior is preserved.

Design alternatives considered:

1. Recover all queues upon any error:
   A brute-force approach that recovers all queues on any error.
   While simple, it is overly aggressive and disrupts unaffected queues
   unnecessarily. Also, because this is handled entirely within the
   driver, it leads to a driver-specific implementation rather than a
   generic one.

2. Per-queue reporter:
   This design would isolate recovery handling per SQ or RQ, effectively
   removing interdependencies between queues. While conceptually clean,
   it introduces significant scalability challenges as the number of
   queues grows, as well as synchronization challenges across multiple
   reporters.

3. Error aggregation with delayed handling:
   Errors arriving during the grace period are saved and processed after
   it ends. While addressing the issue of related errors whose recovery
   is aborted as grace period started, this adds complexity due to
   synchronization needs and contradicts the assumption that no errors
   should occur during a healthy system’s grace period. Also, this
   breaks the important role of grace period in preventing an infinite
   loop of immediate error detection following recovery. In such cases
   we want to stop.

4. Allowing a fixed burst of errors before starting grace period:
   Allows a set number of recoveries before the grace period begins.
   However, it also requires limiting the error reporting window.
   To keep the design simple, the burst threshold becomes redundant.

The burst period design was chosen for its simplicity and precision in
addressing the problem at hand. It effectively captures the temporal
correlation of related errors and aligns with the original intent of
the grace period as a stabilization window where further errors are
unexpected, and if they do occur, they indicate an abnormal system
state.

Shahar Shitrit (5):
  devlink: Move graceful period parameter to reporter ops
  devlink: Move health reporter recovery abort logic to a separate function
  devlink: Introduce burst period for health reporter
  devlink: Make health reporter burst period configurable
  net/mlx5e: Set default burst period for TX and RX reporters

 Documentation/netlink/specs/devlink.yaml      |   7 ++
 .../networking/devlink/devlink-health.rst     |   2 +-
 drivers/net/ethernet/amd/pds_core/main.c      |   2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_devlink.c |   2 +-
 .../net/ethernet/huawei/hinic/hinic_devlink.c |  10 +-
 .../net/ethernet/intel/ice/devlink/health.c   |   3 +-
 .../marvell/octeontx2/af/rvu_devlink.c        |  32 +++--
 .../mellanox/mlx5/core/diag/reporter_vnic.c   |   2 +-
 .../mellanox/mlx5/core/en/reporter_rx.c       |  12 +-
 .../mellanox/mlx5/core/en/reporter_tx.c       |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |  41 ++++---
 drivers/net/ethernet/mellanox/mlxsw/core.c    |   2 +-
 drivers/net/ethernet/qlogic/qed/qed_devlink.c |   9 +-
 drivers/net/netdevsim/health.c                |   4 +-
 include/net/devlink.h                         |  14 ++-
 include/uapi/linux/devlink.h                  |   2 +
 net/devlink/health.c                          | 109 +++++++++++++-----
 net/devlink/netlink_gen.c                     |   5 +-
 19 files changed, 187 insertions(+), 85 deletions(-)


base-commit: b1c92cdf5af3198e8fbc1345a80e2a1dff386c02
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ