linux-kernel - Re: [PATCH v2 6/7] md/raid1: Handle bio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6325d143-b8f2-7287-201b-d3a2e53a556b@huaweicloud.com>
Date: Tue, 29 Oct 2024 19:30:06 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: John Garry <john.g.garry@...cle.com>, Yu Kuai <yukuai1@...weicloud.com>,
 axboe@...nel.dk, song@...nel.org, hch@....de
Cc: martin.petersen@...cle.com, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org, hare@...e.de,
 Johannes.Thumshirn@....com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v2 6/7] md/raid1: Handle bio_split() errors

Hi,

在 2024/10/29 16:45, John Garry 写道:
> On 29/10/2024 03:48, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/10/28 23:27, John Garry 写道:
>>> Add proper bio_split() error handling. For any error, call
>>> raid_end_bio_io() and return.
>>>
>>> For the case of an in the write path, we need to undo the increment in
>>> the rdev panding count and NULLify the r1_bio->bios[] pointers.
>>>
>>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>>> ---
>>>   drivers/md/raid1.c | 32 ++++++++++++++++++++++++++++++--
>>>   1 file changed, 30 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 6c9d24203f39..a10018282629 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -1322,7 +1322,7 @@ static void raid1_read_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       const enum req_op op = bio_op(bio);
>>>       const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC;
>>>       int max_sectors;
>>> -    int rdisk;
>>> +    int rdisk, error;
>>>       bool r1bio_existed = !!r1_bio;
>>>       /*
>>> @@ -1383,6 +1383,11 @@ static void raid1_read_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       if (max_sectors < bio_sectors(bio)) {
>>>           struct bio *split = bio_split(bio, max_sectors,
>>>                             gfp, &conf->bio_split);
>>> +
>>> +        if (IS_ERR(split)) {
>>> +            error = PTR_ERR(split);
>>> +            goto err_handle;
>>> +        }
>>>           bio_chain(split, bio);
>>>           submit_bio_noacct(bio);
>>>           bio = split;
>>> @@ -1410,6 +1415,12 @@ static void raid1_read_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       read_bio->bi_private = r1_bio;
>>>       mddev_trace_remap(mddev, read_bio, r1_bio->sector);
>>>       submit_bio_noacct(read_bio);
>>> +    return;
>>> +
>>> +err_handle:
>>> +    bio->bi_status = errno_to_blk_status(error);
>>> +    set_bit(R1BIO_Uptodate, &r1_bio->state);
>>> +    raid_end_bio_io(r1_bio);
>>>   }
>>>   static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>>> @@ -1417,7 +1428,7 @@ static void raid1_write_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>   {
>>>       struct r1conf *conf = mddev->private;
>>>       struct r1bio *r1_bio;
>>> -    int i, disks;
>>> +    int i, disks, k, error;
>>>       unsigned long flags;
>>>       struct md_rdev *blocked_rdev;
>>>       int first_clone;
>>> @@ -1576,6 +1587,11 @@ static void raid1_write_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       if (max_sectors < bio_sectors(bio)) {
>>>           struct bio *split = bio_split(bio, max_sectors,
>>>                             GFP_NOIO, &conf->bio_split);
>>> +
>>> +        if (IS_ERR(split)) {
>>> +            error = PTR_ERR(split);
>>> +            goto err_handle;
>>> +        }
>>>           bio_chain(split, bio);
>>>           submit_bio_noacct(bio);
>>>           bio = split;
>>> @@ -1660,6 +1676,18 @@ static void raid1_write_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       /* In case raid1d snuck in to freeze_array */
>>>       wake_up_barrier(conf);
>>> +    return;
>>> +err_handle:
>>> +    for (k = 0; k < i; k++) {
>>> +        if (r1_bio->bios[k]) {
>>> +            rdev_dec_pending(conf->mirrors[k].rdev, mddev);
>>> +            r1_bio->bios[k] = NULL;
>>> +        }
>>> +    }
>>> +
>>> +    bio->bi_status = errno_to_blk_status(error);
>>> +    set_bit(R1BIO_Uptodate, &r1_bio->state);
>>> +    raid_end_bio_io(r1_bio);
> 
> Hi Kuai,
> 
>>
>> Looks good that error code is passed to orig bio. However,
>> I really think badblocks should be handled somehow, it just doesn't make
>> sense to return IO error to filesystems or user if one underlying disk
>> contain BB, while others are good.
> 
> Please be aware that this change is not for handling splits in atomic 
> writes. It is for situation when split fails for whatever reason - 
> likely a software bug.
> 
> For when atomic writes are supported for raid1, my plan is that an 
> atomic write over a region which covers a BB will error, i.e. goto 
> err_handle, like:
> 
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1514,6 +1514,12 @@ static void raid1_write_request(struct mddev 
> *mddev, struct bio *bio,
>                   break;
>               }
> 
> +            if (is_bad && bio->bi_opf & REQ_ATOMIC) {
> +                /* We just cannot atomically write this ... */
> +                err = -EIO;
> +                goto err_handle;
> +            }
> +
>               if (is_bad && first_bad <= r1_bio->sector) {
> 
> 
> I just think that if we try to write a region atomically which contains 
> BBs then we should error. Indeed, as I mentioned previously, I really 
> don't expect BBs on devices which support atomic writes. But we should 
> still handle it.
> 
Agreed.

> OTOH, if we did want to handle atomic writes to regions with BBs, we 
> could make a bigger effort and write the disks which don't have BBs 
> atomically (so that we don't split for those good disks). But this is 
> too complicated and does not achieve much.

Agreed.

> 
>>
>> Or is it guaranteed that IO error by atomic write won't hurt anyone,
>> user will handle this error and retry with non atomic write?
> 
> Yes, I think that the user could retry non-atomically for the same 
> write. Maybe returning a special error code could be useful for this.

And can you update the above error path comment when you support raid1
and raid10?

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
>