linux-kernel - Re: [PATCH v2 7/7] md/raid10: Handle bio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0ecb0b1-7963-42ed-a26d-9b155884abb6@oracle.com>
Date: Tue, 29 Oct 2024 12:05:19 +0000
From: John Garry <john.g.garry@...cle.com>
To: Yu Kuai <yukuai1@...weicloud.com>, axboe@...nel.dk, song@...nel.org,
        hch@....de
Cc: martin.petersen@...cle.com, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org, hare@...e.de,
        Johannes.Thumshirn@....com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v2 7/7] md/raid10: Handle bio_split() errors

On 29/10/2024 11:55, Yu Kuai wrote:
> Hi,
> 
> 在 2024/10/28 23:27, John Garry 写道:
>> Add proper bio_split() error handling. For any error, call
>> raid_end_bio_io() and return. Except for discard, where we end the bio
>> directly.
>>
>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>> ---
>>   drivers/md/raid10.c | 47 ++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 46 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index f3bf1116794a..9c56b27b754a 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -1159,6 +1159,7 @@ static void raid10_read_request(struct mddev 
>> *mddev, struct bio *bio,
>>       int slot = r10_bio->read_slot;
>>       struct md_rdev *err_rdev = NULL;
>>       gfp_t gfp = GFP_NOIO;
>> +    int error;
>>       if (slot >= 0 && r10_bio->devs[slot].rdev) {
>>           /*
>> @@ -1206,6 +1207,10 @@ static void raid10_read_request(struct mddev 
>> *mddev, struct bio *bio,
>>       if (max_sectors < bio_sectors(bio)) {
>>           struct bio *split = bio_split(bio, max_sectors,
>>                             gfp, &conf->bio_split);
>> +        if (IS_ERR(split)) {
>> +            error = PTR_ERR(split);
>> +            goto err_handle;
>> +        }
>>           bio_chain(split, bio);
>>           allow_barrier(conf);
>>           submit_bio_noacct(bio);
>> @@ -1236,6 +1241,12 @@ static void raid10_read_request(struct mddev 
>> *mddev, struct bio *bio,
>>       mddev_trace_remap(mddev, read_bio, r10_bio->sector);
>>       submit_bio_noacct(read_bio);
>>       return;
>> +err_handle:
>> +    atomic_dec(&rdev->nr_pending);
> 
> I just realized that for the raid1 patch, this is missed. read_balance()
> from raid1 will increase nr_pending as well. :(

hmmm... I have the rdev_dec_pending() call for raid1 at the error label, 
which does the appropriate nr_pending dec, right? Or not?

> 
>> +
>> +    bio->bi_status = errno_to_blk_status(error);
>> +    set_bit(R10BIO_Uptodate, &r10_bio->state);
>> +    raid_end_bio_io(r10_bio);
>>   }
>>   static void raid10_write_one_disk(struct mddev *mddev, struct r10bio 
>> *r10_bio,
>> @@ -1347,9 +1358,10 @@ static void raid10_write_request(struct mddev 
>> *mddev, struct bio *bio,
>>                    struct r10bio *r10_bio)
>>   {
>>       struct r10conf *conf = mddev->private;
>> -    int i;
>> +    int i, k;
>>       sector_t sectors;
>>       int max_sectors;
>> +    int error;
>>       if ((mddev_is_clustered(mddev) &&
>>            md_cluster_ops->area_resyncing(mddev, WRITE,
>> @@ -1482,6 +1494,10 @@ static void raid10_write_request(struct mddev 
>> *mddev, struct bio *bio,
>>       if (r10_bio->sectors < bio_sectors(bio)) {
>>           struct bio *split = bio_split(bio, r10_bio->sectors,
>>                             GFP_NOIO, &conf->bio_split);
>> +        if (IS_ERR(split)) {
>> +            error = PTR_ERR(split);
>> +            goto err_handle;
>> +        }
>>           bio_chain(split, bio);
>>           allow_barrier(conf);
>>           submit_bio_noacct(bio);
>> @@ -1503,6 +1519,25 @@ static void raid10_write_request(struct mddev 
>> *mddev, struct bio *bio,
>>               raid10_write_one_disk(mddev, r10_bio, bio, true, i);
>>       }
>>       one_write_done(r10_bio);
>> +    return;
>> +err_handle:
>> +    for (k = 0;  k < i; k++) {
>> +        struct md_rdev *rdev, *rrdev;
>> +
>> +        rdev = conf->mirrors[k].rdev;
>> +        rrdev = conf->mirrors[k].replacement;
> 
> This looks wrong, r10_bio->devs[k].devnum should be used to deference
> rdev from mirrors.

ok

>> +
>> +        if (rdev)
>> +            rdev_dec_pending(conf->mirrors[k].rdev, mddev);
>> +        if (rrdev)
>> +            rdev_dec_pending(conf->mirrors[k].rdev, mddev);
> 
> This is not correct for now, for the case that rdev is all BB in the
> write range, continue will be reached in the loop and rrdev is skipped(
> This doesn't look correct to skip rrdev). However, I'll suggest to use:
> 
> int d = r10_bio->devs[k].devnum;
> if (r10_bio->devs[k].bio == NULL)

eh, should this be:
if (r10_bio->devs[k].bio != NULL)

>      rdev_dec_pending(conf->mirrors[d].rdev);
> if (r10_bio->devs[k].repl_bio == NULL)
>      rdev_dec_pending(conf->mirrors[d].replacement);
> 



> 
>> +        r10_bio->devs[k].bio = NULL;
>> +        r10_bio->devs[k].repl_bio = NULL;
>> +    }
>> +
>> +    bio->bi_status = errno_to_blk_status(error);
>> +    set_bit(R10BIO_Uptodate, &r10_bio->state);
>> +    raid_end_bio_io(r10_bio);

Thanks,
John