linux-kernel - Re: [PATCH md-6.12 0/7] md: enhance faulty chekcing for blocked handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2c218cb8-c553-92f4-b203-54f02328d781@huaweicloud.com>
Date: Thu, 10 Oct 2024 20:38:35 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Mariusz Tkaczyk <mariusz.tkaczyk@...ux.intel.com>,
 Yu Kuai <yukuai1@...weicloud.com>
Cc: mariusz.tkaczyk@...el.com, song@...nel.org, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH md-6.12 0/7] md: enhance faulty chekcing for blocked
 handling

Hi,

在 2024/10/09 15:14, Mariusz Tkaczyk 写道:
> On Fri, 30 Aug 2024 15:27:14 +0800
> Yu Kuai <yukuai1@...weicloud.com> wrote:
> 
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> The lifetime of badblocks:
>>
>> - IO error, and decide to record badblocks, and record sb_flags;
>> - write IO found rdev has badblocks and not yet acknowledged, then this
>> IO is blocked;
>> - daemon found sb_flags is set, update superblock and flush badblocks;
>> - write IO continue;
>>
>> Main idea is that badblocks will be set in memory fist, before badblocks
>> are acknowledged, new write request must be blocked to prevent reading
>> old data after power failure, and this behaviour is not necessary if rdev
>> is faulty in the first place.
>>
>> Yu Kuai (7):
>>    md: add a new helper rdev_blocked()
>>    md: don't wait faulty rdev in md_wait_for_blocked_rdev()
>>    md: don't record new badblocks for faulty rdev
>>    md/raid1: factor out helper to handle blocked rdev from
>>      raid1_write_request()
>>    md/raid1: don't wait for Faulty rdev in wait_blocked_rdev()
>>    md/raid10: don't wait for Faulty rdev in wait_blocked_rdev()
>>    md/raid5: don't set Faulty rdev for blocked_rdev
>>
>>   drivers/md/md.c     |  8 +++--
>>   drivers/md/md.h     | 24 +++++++++++++++
>>   drivers/md/raid1.c  | 75 +++++++++++++++++++++++----------------------
>>   drivers/md/raid10.c | 40 +++++++++++-------------
>>   drivers/md/raid5.c  | 13 ++++----
>>   5 files changed, 92 insertions(+), 68 deletions(-)
>>
> 
> 
> Hi,
> We tested this patchset.
> 
> mdmon rework:
> https://github.com/md-raid-utilities/mdadm/pull/66
> 
> Kernel build torvalds/linux.git master:
> commit e32cde8d2bd7d251a8f9b434143977ddf13dcec6
> 
> I applied this patchset on top of that.
> 
> My tests proved that:
> - If only mdmon PR is applied - hangs are reproducible.
> - If only this patchset is applied - hangs are reproducible.
> - If both kernel patchset and mdmon rework are applied- hangs are not
>    reproducible (at least until now).
> 
> It was tricky topic (I needed to deal with weird issues related to shared
> descriptors in mdmon).
> 
> What the most important- there is no regression detected.

Good to here that, I'll send a V2 then. Usually this set will land in
v6.13, because this doesn't look like a fix in kernel. :)

Thanks,
Kuai

> 
> Thanks,
> Mariusz
> 
> .
>