lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250125012702.18773-2-dougvj@dougvj.net>
Date: Fri, 24 Jan 2025 18:26:58 -0700
From: Doug V Johnson <dougvj@...gvj.net>
To: 
Cc: Doug Johnson <dougvj@...il.com>,
	Doug V Johnson <dougvj@...gvj.net>,
	Song Liu <song@...nel.org>,
	Yu Kuai <yukuai3@...wei.com>,
	linux-raid@...r.kernel.org (open list:SOFTWARE RAID (Multiple Disks) SUPPORT),
	linux-kernel@...r.kernel.org (open list)
Subject: [PATCH 2/2] md/raid5: warn when failing a read due to bad blocks metadata

It's easy to suspect that there might be some underlying hardware
failures or similar issues when userspace receives a Buffer I/O error
from a raid device.

In order to hopefully send more sysadmins on the right track, lets
report that a read failed at least in part due to bad blocks in the bad
block list on device metadata.

There are real world examples where bad block lists accidentally get
propagated or copied around, so having this warning helps mitigate the
consequences

Signed-off-by: Doug V Johnson <dougvj@...gvj.net>
---
 drivers/md/raid5.c | 10 +++++++++-
 drivers/md/raid5.h |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0ae9ac695d8e..5d80e9bcbd6f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3671,7 +3671,14 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
 			       sh->dev[i].sector + RAID5_STRIPE_SECTORS(conf)) {
 				struct bio *nextbi =
 					r5_next_bio(conf, bi, sh->dev[i].sector);
-
+				/* If we recorded bad blocks from the metadata
+				 * on any of the devices then report this to
+				 * userspace in case anyone might suspect
+				 * something more fundamental instead
+				 */
+				if (s->bad_blocks)
+					pr_warn_ratelimited("%s: read encountered block in device bad block list.",
+							    mdname(conf->mddev));
 				bio_io_error(bi);
 				bi = nextbi;
 			}
@@ -4682,6 +4689,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 		if (rdev) {
 			is_bad = rdev_has_badblock(rdev, sh->sector,
 						   RAID5_STRIPE_SECTORS(conf));
+			s->bad_blocks++;
 			if (s->blocked_rdev == NULL) {
 				if (is_bad < 0)
 					set_bit(BlockedBadBlocks, &rdev->flags);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index eafc6e9ed6ee..c755c321ae36 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -282,7 +282,7 @@ struct stripe_head_state {
 	 * read all devices, just the replacement targets.
 	 */
 	int syncing, expanding, expanded, replacing;
-	int locked, uptodate, to_read, to_write, failed, written;
+	int locked, uptodate, to_read, to_write, failed, written, bad_blocks;
 	int to_fill, compute, req_compute, non_overwrite;
 	int injournal, just_cached;
 	int failed_num[2];
-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ