lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250817172710.4892-1-k@mgml.me>
Date: Mon, 18 Aug 2025 02:27:07 +0900
From: Kenta Akagi <k@...l.me>
To: Song Liu <song@...nel.org>, Yu Kuai <yukuai3@...wei.com>,
        Mariusz Tkaczyk <mtkaczyk@...nel.org>,
        Guoqing Jiang <jgq516@...il.com>
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        Kenta Akagi <k@...l.me>
Subject: [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails

Changes from V1:
- Avoid setting MD_BROKEN instead of clearing it
- Add pr_crit() when setting MD_BROKEN
- Fix the message may shown after all rdevs failure:
  "Operation continuing on 0 devices"

A failfast bio, for example in the case of nvme-tcp,
will fail immediately if the connection to the target is
lost for a few seconds and the device enters a reconnecting
state - even though it would recover if given a few seconds.
This behavior is exactly as intended by the design of failfast.

However, md treats super_write operations fails with failfast as fatal.
For example, if an initiator - that is, a machine loading the md module -
loses all connections for a few seconds, the array becomes
broken and subsequent write is no longer possible.
This is the issue I am currently facing, and which this patch aims to fix.

The 1st patch changes the behavior on super_write MD_FAILFAST IO failures.
The 2nd and 3rd patches modify the output of pr_crit.

Kenta Akagi (3):
  md/raid1,raid10: don't broken array on failfast metadata write fails
  md/raid1,raid10: Add error message when setting MD_BROKEN
  md/raid1,raid10: Fix: Operation continuing on 0 devices.

 drivers/md/md.c     |  9 ++++++---
 drivers/md/md.h     |  7 ++++---
 drivers/md/raid1.c  | 26 ++++++++++++++++++++------
 drivers/md/raid10.c | 26 ++++++++++++++++++++------
 4 files changed, 50 insertions(+), 18 deletions(-)

-- 
2.50.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ