[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6148a744-e62c-45f6-b273-772aaf51a2df@oracle.com>
Date: Mon, 23 Sep 2024 10:21:23 +0100
From: John Garry <john.g.garry@...cle.com>
To: Yu Kuai <yukuai1@...weicloud.com>, axboe@...nel.dk, hch@....de
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org, martin.petersen@...cle.com,
"yangerkun@...wei.com" <yangerkun@...wei.com>,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
On 23/09/2024 09:18, Yu Kuai wrote:
>>>
>>> We need a new branch in read_balance() to choose a rdev with full copy.
>>
>> Sure, I do realize that the mirror'ing personalities need more
>> sophisticated error handling changes (than what I presented).
>>
>> However, in raid1_read_request() we do the read_balance() and then the
>> bio_split() attempt. So what are you suggesting we do for the
>> bio_split() error? Is it to retry without the bio_split()?
>>
>> To me bio_split() should not fail. If it does, it is likely ENOMEM or
>> some other bug being exposed, so I am not sure that retrying with
>> skipping bio_split() is the right approach (if that is what you are
>> suggesting).
>
> bio_split_to_limits() is already called from md_submit_bio(), so here
> bio should only be splitted because of badblocks or resync. We have to
> return error for resync, however, for badblocks, we can still try to
> find a rdev without badblocks so bio_split() is not needed. And we need
> to retry and inform read_balance() to skip rdev with badblocks in this
> case.
>
> This can only happen if the full copy only exist in slow disks. This
> really is corner case, and this is not related to your new error path by
> atomic write. I don't mind this version for now, just something
> I noticed if bio_spilit() can fail.
Are you saying that some improvement needs to be made to the current
code for badblocks handling, like initially try to skip bio_split()?
Apart from that, what about the change in raid10_write_request(), w.r.t
error handling?
There, for an error in bio_split(), I think that we need to do some
tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending
when looping conf->copies
BTW, feel free to comment in patch 6/6 for that.
Thanks,
John
Powered by blists - more mailing lists