[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5da5e4c7-47d7-be71-0724-7b03af33324a@huaweicloud.com>
Date: Thu, 18 Sep 2025 14:58:35 +0800
From: Li Nan <linan666@...weicloud.com>
To: Kenta Akagi <k@...l.me>, linan666@...weicloud.com, song@...nel.org,
yukuai3@...wei.com, mtkaczyk@...nel.org, shli@...com, jgq516@...il.com
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 6/9] md/raid1,raid10: Fix missing retries Failfast
write bios on no-bbl rdevs
在 2025/9/17 21:33, Kenta Akagi 写道:
>
>
> On 2025/09/17 19:06, Li Nan wrote:
>>
>>
>> 在 2025/9/15 11:42, Kenta Akagi 写道:
>>> In the current implementation, write failures are not retried on rdevs
>>> with badblocks disabled. This is because narrow_write_error, which issues
>>> retry bios, immediately returns when badblocks are disabled. As a result,
>>> a single write failure on such an rdev will immediately mark it as Faulty.
>>>
>>
>> IMO, there's no need to add extra logic for scenarios where badblocks
>> is not enabled. Do you have real-world scenarios where badblocks is
>> disabled?
>
> No, badblocks are enabled in my environment.
> I'm fine if it's not added, but I still think it's worth adding WARN_ON like:
>
> @@ -2553,13 +2554,17 @@ static bool narrow_write_error(struct r1bio *r1_bio, int i)
> fail = true;
> + WARN_ON( (bio->bi_opf & MD_FAILFAST) && (rdev->badblocks.shift < 0) );
> if (!narrow_write_error(r1_bio, m))
>
> What do you think?
>
How about this?
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2522,10 +2522,11 @@ static bool narrow_write_error(struct r1bio
*r1_bio, int i)
bool ok = true;
if (rdev->badblocks.shift < 0)
- return false;
+ block_sectors = bdev_logical_block_size(rdev->bdev) >> 9;
+ else
+ block_sectors = roundup(1 << rdev->badblocks.shift,
+ bdev_logical_block_size(rdev->bdev)
>> 9);
- block_sectors = roundup(1 << rdev->badblocks.shift,
- bdev_logical_block_size(rdev->bdev) >> 9);
sector = r1_bio->sector;
sectors = ((sector + block_sectors)
& ~(sector_t)(block_sectors - 1))
rdev_set_badblocks() checks shift, too. rdev is marked to Faulty if setting
badblocks fails.
>
> Thanks,
> Akagi
>
>>> The retry mechanism appears to have been implemented under the assumption
>>> that a bad block is involved in the failure. However, the retry after
>>> MD_FAILFAST write failure depend on this code, and a Failfast write request
>>> may fail for reasons unrelated to bad blocks.
>>>
>>> Consequently, if failfast is enabled and badblocks are disabled on all
>>> rdevs, and all rdevs encounter a failfast write bio failure at the same
>>> time, no retries will occur and the entire array can be lost.
>>>
>>> This commit adds a path in narrow_write_error to retry writes even on rdevs
>>> where bad blocks are disabled, and failed bios marked with MD_FAILFAST will
>>> use this path. For non-failfast cases, the behavior remains unchanged: no
>>> retry writes are attempted to rdevs with bad blocks disabled.
>>>
>>> Fixes: 1919cbb23bf1 ("md/raid10: add failfast handling for writes.")
>>> Fixes: 212e7eb7a340 ("md/raid1: add failfast handling for writes.")
>>> Signed-off-by: Kenta Akagi <k@...l.me>
>>> ---
>>> drivers/md/raid1.c | 44 +++++++++++++++++++++++++++++---------------
>>> drivers/md/raid10.c | 37 ++++++++++++++++++++++++-------------
>>> 2 files changed, 53 insertions(+), 28 deletions(-)
>>> !test_bit(Faulty, &rdev->flags) &&
>>
>> --
>> Thanks,
>> Nan
>>
>>
>
>
>
> .
--
Thanks,
Nan
Powered by blists - more mailing lists