[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6325d143-b8f2-7287-201b-d3a2e53a556b@huaweicloud.com>
Date: Tue, 29 Oct 2024 19:30:06 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: John Garry <john.g.garry@...cle.com>, Yu Kuai <yukuai1@...weicloud.com>,
axboe@...nel.dk, song@...nel.org, hch@....de
Cc: martin.petersen@...cle.com, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org, hare@...e.de,
Johannes.Thumshirn@....com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v2 6/7] md/raid1: Handle bio_split() errors
Hi,
在 2024/10/29 16:45, John Garry 写道:
> On 29/10/2024 03:48, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/10/28 23:27, John Garry 写道:
>>> Add proper bio_split() error handling. For any error, call
>>> raid_end_bio_io() and return.
>>>
>>> For the case of an in the write path, we need to undo the increment in
>>> the rdev panding count and NULLify the r1_bio->bios[] pointers.
>>>
>>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>>> ---
>>> drivers/md/raid1.c | 32 ++++++++++++++++++++++++++++++--
>>> 1 file changed, 30 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 6c9d24203f39..a10018282629 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -1322,7 +1322,7 @@ static void raid1_read_request(struct mddev
>>> *mddev, struct bio *bio,
>>> const enum req_op op = bio_op(bio);
>>> const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC;
>>> int max_sectors;
>>> - int rdisk;
>>> + int rdisk, error;
>>> bool r1bio_existed = !!r1_bio;
>>> /*
>>> @@ -1383,6 +1383,11 @@ static void raid1_read_request(struct mddev
>>> *mddev, struct bio *bio,
>>> if (max_sectors < bio_sectors(bio)) {
>>> struct bio *split = bio_split(bio, max_sectors,
>>> gfp, &conf->bio_split);
>>> +
>>> + if (IS_ERR(split)) {
>>> + error = PTR_ERR(split);
>>> + goto err_handle;
>>> + }
>>> bio_chain(split, bio);
>>> submit_bio_noacct(bio);
>>> bio = split;
>>> @@ -1410,6 +1415,12 @@ static void raid1_read_request(struct mddev
>>> *mddev, struct bio *bio,
>>> read_bio->bi_private = r1_bio;
>>> mddev_trace_remap(mddev, read_bio, r1_bio->sector);
>>> submit_bio_noacct(read_bio);
>>> + return;
>>> +
>>> +err_handle:
>>> + bio->bi_status = errno_to_blk_status(error);
>>> + set_bit(R1BIO_Uptodate, &r1_bio->state);
>>> + raid_end_bio_io(r1_bio);
>>> }
>>> static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>>> @@ -1417,7 +1428,7 @@ static void raid1_write_request(struct mddev
>>> *mddev, struct bio *bio,
>>> {
>>> struct r1conf *conf = mddev->private;
>>> struct r1bio *r1_bio;
>>> - int i, disks;
>>> + int i, disks, k, error;
>>> unsigned long flags;
>>> struct md_rdev *blocked_rdev;
>>> int first_clone;
>>> @@ -1576,6 +1587,11 @@ static void raid1_write_request(struct mddev
>>> *mddev, struct bio *bio,
>>> if (max_sectors < bio_sectors(bio)) {
>>> struct bio *split = bio_split(bio, max_sectors,
>>> GFP_NOIO, &conf->bio_split);
>>> +
>>> + if (IS_ERR(split)) {
>>> + error = PTR_ERR(split);
>>> + goto err_handle;
>>> + }
>>> bio_chain(split, bio);
>>> submit_bio_noacct(bio);
>>> bio = split;
>>> @@ -1660,6 +1676,18 @@ static void raid1_write_request(struct mddev
>>> *mddev, struct bio *bio,
>>> /* In case raid1d snuck in to freeze_array */
>>> wake_up_barrier(conf);
>>> + return;
>>> +err_handle:
>>> + for (k = 0; k < i; k++) {
>>> + if (r1_bio->bios[k]) {
>>> + rdev_dec_pending(conf->mirrors[k].rdev, mddev);
>>> + r1_bio->bios[k] = NULL;
>>> + }
>>> + }
>>> +
>>> + bio->bi_status = errno_to_blk_status(error);
>>> + set_bit(R1BIO_Uptodate, &r1_bio->state);
>>> + raid_end_bio_io(r1_bio);
>
> Hi Kuai,
>
>>
>> Looks good that error code is passed to orig bio. However,
>> I really think badblocks should be handled somehow, it just doesn't make
>> sense to return IO error to filesystems or user if one underlying disk
>> contain BB, while others are good.
>
> Please be aware that this change is not for handling splits in atomic
> writes. It is for situation when split fails for whatever reason -
> likely a software bug.
>
> For when atomic writes are supported for raid1, my plan is that an
> atomic write over a region which covers a BB will error, i.e. goto
> err_handle, like:
>
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1514,6 +1514,12 @@ static void raid1_write_request(struct mddev
> *mddev, struct bio *bio,
> break;
> }
>
> + if (is_bad && bio->bi_opf & REQ_ATOMIC) {
> + /* We just cannot atomically write this ... */
> + err = -EIO;
> + goto err_handle;
> + }
> +
> if (is_bad && first_bad <= r1_bio->sector) {
>
>
> I just think that if we try to write a region atomically which contains
> BBs then we should error. Indeed, as I mentioned previously, I really
> don't expect BBs on devices which support atomic writes. But we should
> still handle it.
>
Agreed.
> OTOH, if we did want to handle atomic writes to regions with BBs, we
> could make a bigger effort and write the disks which don't have BBs
> atomically (so that we don't split for those good disks). But this is
> too complicated and does not achieve much.
Agreed.
>
>>
>> Or is it guaranteed that IO error by atomic write won't hurt anyone,
>> user will handle this error and retry with non atomic write?
>
> Yes, I think that the user could retry non-atomically for the same
> write. Maybe returning a special error code could be useful for this.
And can you update the above error path comment when you support raid1
and raid10?
Thanks,
Kuai
>
> Thanks,
> John
>
> .
>
Powered by blists - more mailing lists