[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0106019957d57c2f-0a8ed8f8-adad-4351-b86e-368f0bbd7069-000000@ap-northeast-1.amazonses.com>
Date: Wed, 17 Sep 2025 13:20:32 +0000
From: Kenta Akagi <k@...l.me>
To: linan666@...weicloud.com, song@...nel.org, yukuai3@...wei.com,
mtkaczyk@...nel.org, shli@...com, jgq516@...il.com
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org, k@...l.me
Subject: Re: [PATCH v4 5/9] md/raid1,raid10: Set R{1,10}BIO_Uptodate when
successful retry of a failed bio
Hi,
Thank you for reviewing.
On 2025/09/17 18:24, Li Nan wrote:
>
>
> 在 2025/9/15 11:42, Kenta Akagi 写道:
>> In the current implementation, when a write bio fails, the retry flow
>> is as follows:
>> * In bi_end_io, e.g. raid1_end_write_request, R1BIO_WriteError is
>> set on the r1bio.
>> * The md thread calls handle_write_finished corresponding to this r1bio.
>> * Inside handle_write_finished, narrow_write_error is invoked.
>> * narrow_write_error rewrites the r1bio on a per-sector basis, marking
>> any failed sectors as badblocks. It returns true if all sectors succeed,
>> or if failed sectors are successfully recorded via rdev_set_badblock.
>> It returns false if rdev_set_badblock fails or if badblocks are disabled.
>> * handle_write_finished faults the rdev if it receives false from
>> narrow_write_error. Otherwise, it does nothing.
>>
>> This can cause a problem where an r1bio that succeeded on retry is
>> incorrectly reported as failed to the higher layer, for example in the
>> following case:
>> * Only one In_sync rdev exists, and
>> * The write bio initially failed but all retries in
>> narrow_write_error succeed.
>>
>> This commit ensures that if a write initially fails but all retries in
>> narrow_write_error succeed, R1BIO_Uptodate or R10BIO_Uptodate is set
>> and the higher layer receives a successful write status.
>>
>> Signed-off-by: Kenta Akagi <k@...l.me>
>> ---
>> drivers/md/raid1.c | 32 ++++++++++++++++++++++++++------
>> drivers/md/raid10.c | 21 +++++++++++++++++++++
>> 2 files changed, 47 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 8fff9dacc6e0..806f5cb33a8e 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -2517,6 +2517,21 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>> }
>> }
>> +/**
>> + * narrow_write_error() - Retry write and set badblock
>> + * @r1_bio: the r1bio containing the write error
>> + * @i: which device to retry
>> + *
>> + * Rewrites the bio, splitting it at the least common multiple of the logical
>> + * block size and the badblock size. Blocks that fail to be written are marked
>> + * as bad. If badblocks are disabled, no write is attempted and false is
>> + * returned immediately.
>> + *
>> + * Return:
>> + * * %true - all blocks were written or marked bad successfully
>> + * * %false - bbl disabled or
>> + * one or more blocks write failed and could not be marked bad
>> + */
>> static bool narrow_write_error(struct r1bio *r1_bio, int i)
>> {
>> struct mddev *mddev = r1_bio->mddev;
>> @@ -2614,9 +2629,9 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>> int m, idx;
>> bool fail = false;
>> - for (m = 0; m < conf->raid_disks * 2 ; m++)
>> + for (m = 0; m < conf->raid_disks * 2 ; m++) {
>> + struct md_rdev *rdev = conf->mirrors[m].rdev;
>> if (r1_bio->bios[m] == IO_MADE_GOOD) {
>> - struct md_rdev *rdev = conf->mirrors[m].rdev;
>> rdev_clear_badblocks(rdev,
>> r1_bio->sector,
>> r1_bio->sectors, 0);
>> @@ -2628,12 +2643,17 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>> */
>> fail = true;
>
> 'fail' should be false when re-write is successful.
Thanks, it seems there is no need to handle it with bio_end_io_list when successful re-write.
I will fix it.
>
>> if (!narrow_write_error(r1_bio, m))
>> - md_error(conf->mddev,
>> - conf->mirrors[m].rdev);
>> + md_error(conf->mddev, rdev);
>> /* an I/O failed, we can't clear the bitmap */
>> - rdev_dec_pending(conf->mirrors[m].rdev,
>> - conf->mddev);
>> + else if (test_bit(In_sync, &rdev->flags) &&
>> + !test_bit(Faulty, &rdev->flags) &&
>> + rdev_has_badblock(rdev,
>> + r1_bio->sector,
>> + r1_bio->sectors) == 0)
>
> Clear badblock and set R10BIO_Uptodate if rdev has badblock.
narrow_write_error returns true when the write succeeds, or when the write
fails but rdev_set_badblocks succeeds. Here, it determines that the re-write
succeeded if there is no badblock in the sector to be written by r1_bio.
So we should not call rdev_clear_badblocks here.
>
>> + set_bit(R1BIO_Uptodate, &r1_bio->state);
>> + rdev_dec_pending(rdev, conf->mddev);
>> }
>> + }
>> if (fail) {
>> spin_lock_irq(&conf->device_lock);
>> list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index b73af94a88b0..21c2821453e1 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -2809,6 +2809,21 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>> }
>> }
>> +/**
>> + * narrow_write_error() - Retry write and set badblock
>> + * @r10_bio: the r10bio containing the write error
>> + * @i: which device to retry
>> + *
>> + * Rewrites the bio, splitting it at the least common multiple of the logical
>> + * block size and the badblock size. Blocks that fail to be written are marked
>> + * as bad. If badblocks are disabled, no write is attempted and false is
>> + * returned immediately.
>> + *
>> + * Return:
>> + * * %true - all blocks were written or marked bad successfully
>> + * * %false - bbl disabled or
>> + * one or more blocks write failed and could not be marked bad
>> + */
>> static bool narrow_write_error(struct r10bio *r10_bio, int i)
>> {
>> struct bio *bio = r10_bio->master_bio;
>> @@ -2975,6 +2990,12 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
>> fail = true;
>> if (!narrow_write_error(r10_bio, m))
>> md_error(conf->mddev, rdev);
>> + else if (test_bit(In_sync, &rdev->flags) &&
>> + !test_bit(Faulty, &rdev->flags) &&
>> + rdev_has_badblock(rdev,
>> + r10_bio->devs[m].addr,
>> + r10_bio->sectors) == 0)
>
> Same as raid1.
I think it is the same as RAID1. should not call rdev_clear_badblocks.
Thanks,
Akagi
>
>> + set_bit(R10BIO_Uptodate, &r10_bio->state);
>> rdev_dec_pending(rdev, conf->mddev);
>> }
>> bio = r10_bio->devs[m].repl_bio;
>
> --
> Thanks,
> Nan
>
>
Powered by blists - more mailing lists