lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be465913-80c7-762a-51f1-56021aa323dd@huaweicloud.com>
Date: Mon, 23 Sep 2024 17:38:24 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: John Garry <john.g.garry@...cle.com>, Yu Kuai <yukuai1@...weicloud.com>,
 axboe@...nel.dk, hch@....de
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-raid@...r.kernel.org, martin.petersen@...cle.com,
 "yangerkun@...wei.com" <yangerkun@...wei.com>,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors

Hi,

在 2024/09/23 17:21, John Garry 写道:
> On 23/09/2024 09:18, Yu Kuai wrote:
>>>>
>>>> We need a new branch in read_balance() to choose a rdev with full copy.
>>>
>>> Sure, I do realize that the mirror'ing personalities need more 
>>> sophisticated error handling changes (than what I presented).
>>>
>>> However, in raid1_read_request() we do the read_balance() and then 
>>> the bio_split() attempt. So what are you suggesting we do for the 
>>> bio_split() error? Is it to retry without the bio_split()?
>>>
>>> To me bio_split() should not fail. If it does, it is likely ENOMEM or 
>>> some other bug being exposed, so I am not sure that retrying with 
>>> skipping bio_split() is the right approach (if that is what you are 
>>> suggesting).
>>
>> bio_split_to_limits() is already called from md_submit_bio(), so here
>> bio should only be splitted because of badblocks or resync. We have to
>> return error for resync, however, for badblocks, we can still try to
>> find a rdev without badblocks so bio_split() is not needed. And we need
>> to retry and inform read_balance() to skip rdev with badblocks in this
>> case.
>>
>> This can only happen if the full copy only exist in slow disks. This
>> really is corner case, and this is not related to your new error path by
>> atomic write. I don't mind this version for now, just something
>> I noticed if bio_spilit() can fail.
> 
> Are you saying that some improvement needs to be made to the current 
> code for badblocks handling, like initially try to skip bio_split()?
> 
> Apart from that, what about the change in raid10_write_request(), w.r.t 
> error handling?
> 
> There, for an error in bio_split(), I think that we need to do some 
> tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
> when looping conf->copies
> 
> BTW, feel free to comment in patch 6/6 for that.

Yes, raid1/raid10 write are the same. If you want to enable atomic write
for raid1/raid10, you must add a new branch to handle badblocks now,
otherwise, as long as one copy contain any badblocks, atomic write will
fail while theoretically I think it can work.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ