[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9585300-1666-ca71-8684-8824fe2ddaf1@huaweicloud.com>
Date: Fri, 12 Jul 2024 09:16:58 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Mateusz Jończyk <mat.jonczyk@...pl>,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: stable@...r.kernel.org, Song Liu <song@...nel.org>,
Paul Luse <paul.e.luse@...ux.intel.com>, Xiao Ni <xni@...hat.com>,
Mariusz Tkaczyk <mariusz.tkaczyk@...ux.intel.com>,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] md/raid1: set max_sectors during early return from
choose_slow_rdev()
Hi,
在 2024/07/12 4:23, Mateusz Jończyk 写道:
> Linux 6.9+ is unable to start a degraded RAID1 array with one drive,
> when that drive has a write-mostly flag set. During such an attempt,
> the following assertion in bio_split() is hit:
>
> BUG_ON(sectors <= 0);
>
> Call Trace:
> ? bio_split+0x96/0xb0
> ? exc_invalid_op+0x53/0x70
> ? bio_split+0x96/0xb0
> ? asm_exc_invalid_op+0x1b/0x20
> ? bio_split+0x96/0xb0
> ? raid1_read_request+0x890/0xd20
> ? __call_rcu_common.constprop.0+0x97/0x260
> raid1_make_request+0x81/0xce0
> ? __get_random_u32_below+0x17/0x70
> ? new_slab+0x2b3/0x580
> md_handle_request+0x77/0x210
> md_submit_bio+0x62/0xa0
> __submit_bio+0x17b/0x230
> submit_bio_noacct_nocheck+0x18e/0x3c0
> submit_bio_noacct+0x244/0x670
>
> After investigation, it turned out that choose_slow_rdev() does not set
> the value of max_sectors in some cases and because of it,
> raid1_read_request calls bio_split with sectors == 0.
>
> Fix it by filling in this variable.
>
> This bug was introduced in
> commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> but apparently hidden until
> commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()")
> shortly thereafter.
>
> Cc: stable@...r.kernel.org # 6.9.x+
> Signed-off-by: Mateusz Jończyk <mat.jonczyk@...pl>
> Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> Cc: Song Liu <song@...nel.org>
> Cc: Yu Kuai <yukuai3@...wei.com>
> Cc: Paul Luse <paul.e.luse@...ux.intel.com>
> Cc: Xiao Ni <xni@...hat.com>
> Cc: Mariusz Tkaczyk <mariusz.tkaczyk@...ux.intel.com>
> Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/
>
> --
Thanks for the patch!
Reviewed-by: Yu Kuai <yukuai3@...wei.com>
BTW, do you have plans to add a new test to mdadm tests? I'll
pick it up if you don't, just let me know.
Thanks,
Kuai
>
> Tested on both Linux 6.10 and 6.9.8.
>
> Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems:
> ./test --dev=loop --no-error --raidtype=raid1
> (on 6.9.8 there was one failure, caused by external bitmap support not
> compiled in).
>
> Notes:
> - I was reliably getting deadlocks when adding / removing devices
> on such an array - while the array was loaded with fsstress with 20
> concurrent processes. When the array was idle or loaded with fsstress
> with 8 processes, no such deadlocks happened in my tests.
> This occurred also on unpatched Linux 6.8.0 though, but not on
> 6.1.97-rc1, so this is likely an independent regression (to be
> investigated).
> - I was also getting deadlocks when adding / removing the bitmap on the
> array in similar conditions - this happened on Linux 6.1.97-rc1
> also though. fsstress with 8 concurrent processes did cause it only
> once during many tests.
> - in my testing, there was once a problem with hot adding an
> internal bitmap to the array:
> mdadm: Cannot add bitmap while array is resyncing or reshaping etc.
> mdadm: failed to set internal bitmap.
> even though no such reshaping was happening according to /proc/mdstat.
> This seems unrelated, though.
> ---
> drivers/md/raid1.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 7b8a71ca66dd..82f70a4ce6ed 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -680,6 +680,7 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio,
> len = r1_bio->sectors;
> read_len = raid1_check_read_range(rdev, this_sector, &len);
> if (read_len == r1_bio->sectors) {
> + *max_sectors = read_len;
> update_read_sectors(conf, disk, this_sector, read_len);
> return disk;
> }
>
> base-commit: 256abd8e550ce977b728be79a74e1729438b4948
>
Powered by blists - more mailing lists