linux-kernel - [PATCH 1/2] md/raid5: skip stripes with bad reads during reshape to avoid stall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250125012702.18773-1-dougvj@dougvj.net>
Date: Fri, 24 Jan 2025 18:26:57 -0700
From: Doug V Johnson <dougvj@...gvj.net>
To: 
Cc: Doug Johnson <dougvj@...il.com>,
	Doug V Johnson <dougvj@...gvj.net>,
	Song Liu <song@...nel.org>,
	Yu Kuai <yukuai3@...wei.com>,
	linux-raid@...r.kernel.org (open list:SOFTWARE RAID (Multiple Disks) SUPPORT),
	linux-kernel@...r.kernel.org (open list)
Subject: [PATCH 1/2] md/raid5: skip stripes with bad reads during reshape to avoid stall

While adding an additional drive to a raid6 array, the reshape stalled
at about 13% complete and any I/O operations on the array hung,
creating an effective soft lock. The kernel reported a hung task in
mdXX_reshape thread and I had to use magic sysrq to recover as systemd
hung as well.

I first suspected an issue with one of the underlying block devices and
as precaution I recovered the data in read only mode to a new array, but
it turned out to be in the RAID layer as I was able to recreate the
issue from a superblock dump in sparse files.

After poking around some I discovered that I had somehow propagated the
bad block list to several devices in the array such that a few blocks
were unreable. The bad read reported correctly in userspace during
recovery, but it wasn't obvious that it was from a bad block list
metadata at the time and instead confirmed my bias suspecting hardware
issues

I was able to reproduce the issue with a minimal test case using small
loopback devices. I put a script for this in a github repository:

https://github.com/dougvj/md_badblock_reshape_stall_test

This patch handles bad reads during a reshape by unmarking the
STRIPE_EXPANDING and STRIPE_EXPAND_READY bits effectively skipping the
stripe and then reports the issue in dmesg.

Signed-off-by: Doug V Johnson <dougvj@...gvj.net>
---
 drivers/md/raid5.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5c79429acc64..0ae9ac695d8e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4987,6 +4987,14 @@ static void handle_stripe(struct stripe_head *sh)
 			handle_failed_stripe(conf, sh, &s, disks);
 		if (s.syncing + s.replacing)
 			handle_failed_sync(conf, sh, &s);
+		if (test_bit(STRIPE_EXPANDING, &sh->state)) {
+			pr_warn_ratelimited("md/raid:%s: read error during reshape at %lu",
+					    mdname(conf->mddev),
+					    (unsigned long)sh->sector);
+			/* Abort the current stripe */
+			clear_bit(STRIPE_EXPANDING, &sh->state);
+			clear_bit(STRIPE_EXPAND_READY, &sh->state);
+		}
 	}

 	/* Now we check to see if any write operations have recently
-- 
2.48.1