linux-kernel - Re: [PATCH v4 5/9] md/raid1,raid10: Set R{1,10}BIO_Uptodate when successful retry of a failed bio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0106019957d57c2f-0a8ed8f8-adad-4351-b86e-368f0bbd7069-000000@ap-northeast-1.amazonses.com>
Date: Wed, 17 Sep 2025 13:20:32 +0000
From: Kenta Akagi <k@...l.me>
To: linan666@...weicloud.com, song@...nel.org, yukuai3@...wei.com, 
	mtkaczyk@...nel.org, shli@...com, jgq516@...il.com
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org, k@...l.me
Subject: Re: [PATCH v4 5/9] md/raid1,raid10: Set R{1,10}BIO_Uptodate when
 successful retry of a failed bio

Hi,
Thank you for reviewing.

On 2025/09/17 18:24, Li Nan wrote:
> 
> 
> 在 2025/9/15 11:42, Kenta Akagi 写道:
>> In the current implementation, when a write bio fails, the retry flow
>> is as follows:
>> * In bi_end_io, e.g. raid1_end_write_request, R1BIO_WriteError is
>>    set on the r1bio.
>> * The md thread calls handle_write_finished corresponding to this r1bio.
>> * Inside handle_write_finished, narrow_write_error is invoked.
>> * narrow_write_error rewrites the r1bio on a per-sector basis, marking
>>    any failed sectors as badblocks. It returns true if all sectors succeed,
>>    or if failed sectors are successfully recorded via rdev_set_badblock.
>>    It returns false if rdev_set_badblock fails or if badblocks are disabled.
>> * handle_write_finished faults the rdev if it receives false from
>>    narrow_write_error. Otherwise, it does nothing.
>>
>> This can cause a problem where an r1bio that succeeded on retry is
>> incorrectly reported as failed to the higher layer, for example in the
>> following case:
>> * Only one In_sync rdev exists, and
>> * The write bio initially failed but all retries in
>>    narrow_write_error succeed.
>>
>> This commit ensures that if a write initially fails but all retries in
>> narrow_write_error succeed, R1BIO_Uptodate or R10BIO_Uptodate is set
>> and the higher layer receives a successful write status.
>>
>> Signed-off-by: Kenta Akagi <k@...l.me>
>> ---
>>   drivers/md/raid1.c  | 32 ++++++++++++++++++++++++++------
>>   drivers/md/raid10.c | 21 +++++++++++++++++++++
>>   2 files changed, 47 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 8fff9dacc6e0..806f5cb33a8e 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -2517,6 +2517,21 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>>       }
>>   }
>>   +/**
>> + * narrow_write_error() - Retry write and set badblock
>> + * @r1_bio:    the r1bio containing the write error
>> + * @i:        which device to retry
>> + *
>> + * Rewrites the bio, splitting it at the least common multiple of the logical
>> + * block size and the badblock size. Blocks that fail to be written are marked
>> + * as bad. If badblocks are disabled, no write is attempted and false is
>> + * returned immediately.
>> + *
>> + * Return:
>> + * * %true    - all blocks were written or marked bad successfully
>> + * * %false    - bbl disabled or
>> + *          one or more blocks write failed and could not be marked bad
>> + */
>>   static bool narrow_write_error(struct r1bio *r1_bio, int i)
>>   {
>>       struct mddev *mddev = r1_bio->mddev;
>> @@ -2614,9 +2629,9 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>>       int m, idx;
>>       bool fail = false;
>>   -    for (m = 0; m < conf->raid_disks * 2 ; m++)
>> +    for (m = 0; m < conf->raid_disks * 2 ; m++) {
>> +        struct md_rdev *rdev = conf->mirrors[m].rdev;
>>           if (r1_bio->bios[m] == IO_MADE_GOOD) {
>> -            struct md_rdev *rdev = conf->mirrors[m].rdev;
>>               rdev_clear_badblocks(rdev,
>>                            r1_bio->sector,
>>                            r1_bio->sectors, 0);
>> @@ -2628,12 +2643,17 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>>                */
>>               fail = true;
> 
> 'fail' should be false when re-write is successful.

Thanks, it seems there is no need to handle it with bio_end_io_list when successful re-write.
I will fix it.
> 
>>               if (!narrow_write_error(r1_bio, m))
>> -                md_error(conf->mddev,
>> -                     conf->mirrors[m].rdev);
>> +                md_error(conf->mddev, rdev);
>>                   /* an I/O failed, we can't clear the bitmap */
>> -            rdev_dec_pending(conf->mirrors[m].rdev,
>> -                     conf->mddev);
>> +            else if (test_bit(In_sync, &rdev->flags) &&
>> +                 !test_bit(Faulty, &rdev->flags) &&
>> +                 rdev_has_badblock(rdev,
>> +                           r1_bio->sector,
>> +                           r1_bio->sectors) == 0)
> 
> Clear badblock and set R10BIO_Uptodate if rdev has badblock.

narrow_write_error returns true when the write succeeds, or when the write
fails but rdev_set_badblocks succeeds. Here, it determines that the re-write
succeeded if there is no badblock in the sector to be written by r1_bio.
So we should not call rdev_clear_badblocks here.

> 
>> +                set_bit(R1BIO_Uptodate, &r1_bio->state);
>> +            rdev_dec_pending(rdev, conf->mddev);
>>           }
>> +    }
>>       if (fail) {
>>           spin_lock_irq(&conf->device_lock);
>>           list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index b73af94a88b0..21c2821453e1 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -2809,6 +2809,21 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>>       }
>>   }
>>   +/**
>> + * narrow_write_error() - Retry write and set badblock
>> + * @r10_bio:    the r10bio containing the write error
>> + * @i:        which device to retry
>> + *
>> + * Rewrites the bio, splitting it at the least common multiple of the logical
>> + * block size and the badblock size. Blocks that fail to be written are marked
>> + * as bad. If badblocks are disabled, no write is attempted and false is
>> + * returned immediately.
>> + *
>> + * Return:
>> + * * %true    - all blocks were written or marked bad successfully
>> + * * %false    - bbl disabled or
>> + *          one or more blocks write failed and could not be marked bad
>> + */
>>   static bool narrow_write_error(struct r10bio *r10_bio, int i)
>>   {
>>       struct bio *bio = r10_bio->master_bio;
>> @@ -2975,6 +2990,12 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
>>                   fail = true;
>>                   if (!narrow_write_error(r10_bio, m))
>>                       md_error(conf->mddev, rdev);
>> +                else if (test_bit(In_sync, &rdev->flags) &&
>> +                     !test_bit(Faulty, &rdev->flags) &&
>> +                     rdev_has_badblock(rdev,
>> +                               r10_bio->devs[m].addr,
>> +                               r10_bio->sectors) == 0)
> 
> Same as raid1.

I think it is the same as RAID1. should not call rdev_clear_badblocks.


Thanks,
Akagi

> 
>> +                    set_bit(R10BIO_Uptodate, &r10_bio->state);
>>                   rdev_dec_pending(rdev, conf->mddev);
>>               }
>>               bio = r10_bio->devs[m].repl_bio;
> 
> -- 
> Thanks,
> Nan
> 
>