linux-kernel - [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260105144025.12478-2-k@mgml.me>
Date: Mon,  5 Jan 2026 23:40:24 +0900
From: Kenta Akagi <k@...l.me>
To: Song Liu <song@...nel.org>, Yu Kuai <yukuai@...as.com>,
        Shaohua Li <shli@...com>, Mariusz Tkaczyk <mtkaczyk@...nel.org>,
        Xiao Ni <xni@...hat.com>
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        Kenta Akagi <k@...l.me>
Subject: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast

After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"),
if the error handler is called on the last rdev in RAID1 or RAID10,
the MD_BROKEN flag will be set on that mddev.
When MD_BROKEN is set, write bios to the md will result in an I/O error.

This causes a problem when using FailFast.
The current implementation of FailFast expects the array to continue
functioning without issues even after calling md_error for the last
rdev.  Furthermore, due to the nature of its functionality, FailFast may
call md_error on all rdevs of the md. Even if retrying I/O on an rdev
would succeed, it first calls md_error before retrying.

To fix this issue, this commit ensures that for RAID1 and RAID10, if the
last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev
is off, the MD_BROKEN flag will not be set on that mddev.

This change impacts userspace. After this commit, If the rdev has the
FailFast flag, the mddev never broken even if the failing bio is not
FailFast. However, it's unlikely that any setup using FailFast expects
the array to halt when md_error is called on the last rdev.

Since FailFast is only implemented for RAID1 and RAID10, no changes are
needed for other personalities.

Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10")
Suggested-by: Xiao Ni <xni@...hat.com>
Signed-off-by: Kenta Akagi <k@...l.me>
---
 drivers/md/md.c     | 6 ++++--
 drivers/md/raid1.c  | 8 +++++++-
 drivers/md/raid10.c | 8 +++++++-
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6062e0deb616..f1745f8921fc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3050,7 +3050,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 	if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
 		md_error(rdev->mddev, rdev);
 
-		if (test_bit(MD_BROKEN, &rdev->mddev->flags))
+		if (test_bit(MD_BROKEN, &rdev->mddev->flags) ||
+		    !test_bit(Faulty, &rdev->flags))
 			err = -EBUSY;
 		else
 			err = 0;
@@ -7915,7 +7916,8 @@ static int set_disk_faulty(struct mddev *mddev, dev_t dev)
 		err =  -ENODEV;
 	else {
 		md_error(mddev, rdev);
-		if (test_bit(MD_BROKEN, &mddev->flags))
+		if (test_bit(MD_BROKEN, &mddev->flags) ||
+		    !test_bit(Faulty, &rdev->flags))
 			err = -EBUSY;
 	}
 	rcu_read_unlock();
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 592a40233004..459b34cd358b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1745,6 +1745,10 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev)
  *	- recovery is interrupted.
  *	- &mddev->degraded is bumped.
  *
+ * If the following conditions are met, @mddev never fails:
+ *	- The last In_sync @rdev has the &FailFast flag set.
+ *	- &mddev->fail_last_dev is off.
+ *
  * @rdev is marked as &Faulty excluding case when array is failed and
  * &mddev->fail_last_dev is off.
  */
@@ -1757,7 +1761,9 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
 
 	if (test_bit(In_sync, &rdev->flags) &&
 	    (conf->raid_disks - mddev->degraded) == 1) {
-		set_bit(MD_BROKEN, &mddev->flags);
+		if (!test_bit(FailFast, &rdev->flags) ||
+		    mddev->fail_last_dev)
+			set_bit(MD_BROKEN, &mddev->flags);
 
 		if (!mddev->fail_last_dev) {
 			conf->recovery_disabled = mddev->recovery_disabled;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 14dcd5142eb4..b33149aa5b29 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1989,6 +1989,10 @@ static int enough(struct r10conf *conf, int ignore)
  *	- recovery is interrupted.
  *	- &mddev->degraded is bumped.
  *
+ * If the following conditions are met, @mddev never fails:
+ *	- The last In_sync @rdev has the &FailFast flag set.
+ *	- &mddev->fail_last_dev is off.
+ *
  * @rdev is marked as &Faulty excluding case when array is failed and
  * &mddev->fail_last_dev is off.
  */
@@ -2000,7 +2004,9 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 	spin_lock_irqsave(&conf->device_lock, flags);
 
 	if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) {
-		set_bit(MD_BROKEN, &mddev->flags);
+		if (!test_bit(FailFast, &rdev->flags) ||
+		    mddev->fail_last_dev)
+			set_bit(MD_BROKEN, &mddev->flags);
 
 		if (!mddev->fail_last_dev) {
 			spin_unlock_irqrestore(&conf->device_lock, flags);
-- 
2.50.1