[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea19f2f4-32e8-e551-c59d-19185da1be0a@huaweicloud.com>
Date: Thu, 24 Oct 2024 10:10:28 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: John Garry <john.g.garry@...cle.com>, Geoff Back <geoff@...onlair.co.uk>,
Yu Kuai <yukuai1@...weicloud.com>, axboe@...nel.dk, hch@....de
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org, martin.petersen@...cle.com,
"yangerkun@...wei.com" <yangerkun@...wei.com>,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
Hi,
在 2024/10/23 20:11, John Garry 写道:
> On 23/10/2024 12:46, Geoff Back wrote:
>>>> Yes, raid1/raid10 write are the same. If you want to enable atomic
>>>> write
>>>> for raid1/raid10, you must add a new branch to handle badblocks now,
>>>> otherwise, as long as one copy contain any badblocks, atomic write will
>>>> fail while theoretically I think it can work.
>>> Can you please expand on what you mean by this last sentence, "I think
>>> it can work".
I mean in this case, for the write IO, there is no need to split this IO
for the underlying disks that doesn't have BB, hence atomic write can
still work. Currently solution is to split the IO to the range that all
underlying disks doesn't have BB.
>>>
>>> Indeed, IMO, chance of encountering a device with BBs and supporting
>>> atomic writes is low, so no need to try to make it work (if it were
>>> possible) - I think that we just report EIO.
If you want this, then make sure raid will set fail fast together with
atomic write. This way disk will just faulty with IO error instead of
marking with BB, hence make sure there are no BBs.
>>>
>>> Thanks,
>>> John
>>>
>>>
>> Hi all,
>>
>> Looking at this from a different angle: what does the bad blocks system
>> actually gain in modern environments? All the physical storage devices
>> I can think of (including all HDDs and SSDs, NVME or otherwise) have
>> internal mechanisms for remapping faulty blocks, and therefore
>> unrecoverable blocks don't become visible to the Linux kernel level
>> until after the physical storage device has exhausted its internal
>> supply of replacement blocks. At that point the physical device is
>> already catastrophically failing, and in the case of SSDs will likely
>> have already transitioned to a read-only state. Using bad-blocks at the
>> kernel level to map around additional faulty blocks at this point does
>> not seem to me to have any benefit, and the device is unlikely to be
>> even marginally usable for any useful length of time at that point
>> anyway.
>>
>> It seems to me that the bad-blocks capability is a legacy from the
>> distant past when HDDs did not do internal block remapping and hence the
>> kernel could usefully keep a disk usable by mapping out individual
>> blocks in software.
>> If this is the case and there isn't some other way that bad-blocks is
>> still beneficial, might it be better to drop it altogether rather than
>> implementing complex code to work around its effects?
No, we can't just kill it, unless all the disks behaves like:
never return IO error if the disk is still accessible, and once IO error
is returned, the disk is totally unusable.(This is what failfast means
in raid).
Thanks,
Kuai
>
> I am not proposing to drop it. That is another topic.
>
> I am just saying that I don't expect BBs for a device which supports
> atomic writes. As such, the solution for that case is simple - for an
> atomic write which cover BBs in any rdev, then just error that write.
>
>>
>> Of course I'm happy to be corrected if there's still a real benefit to
>> having it, just because I can't see one doesn't mean there isn't one.
>
> I don't know if there is really a BB support benefit for modern devices
> at all.
>
> Thanks,
> John
>
>
> .
>
Powered by blists - more mailing lists