lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Sep 2023 11:08:12 +0200
From:   Donald Buczek <buczek@...gen.mpg.de>
To:     Dragan Stancevic <dragan@...ncevic.com>,
        Yu Kuai <yukuai1@...weicloud.com>, song@...nel.org
Cc:     guoqing.jiang@...ux.dev, it+raid@...gen.mpg.de,
        linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
        msmith626@...il.com, "yangerkun@...wei.com" <yangerkun@...wei.com>
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle"
 transition

On 9/5/23 3:54 PM, Dragan Stancevic wrote:
> On 9/4/23 22:50, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/08/30 9:36, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2023/08/29 4:32, Dragan Stancevic 写道:
>>>
>>>> Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it.
>>
>> oops, I forgot that you need to backport this patch first to reporduce
>> this problem:
>>
>> https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/
>>
>> The patch fix the deadlock as well, but it introduce some regressions.

We've just got an unplanned lock up on "check" to "idle" transition with 6.1.52 after a few hours on a backup server. For the last 2 1/2 years we used the patch I originally proposed with multiple kernel versions [1]. But this no longer seems to be valid or maybe its even destructive in combination with the other changes.

But I totally lost track of the further development. As I understood, there are patches queue up in mainline, which might go into 6.1, too, but have not landed there which should fix the problem?

Can anyone give me exact references to the patches I'd need to apply to 6.1.52, so that I could probably fix my problem and also test the patches for you those on production systems with a load which tends to run into that problem easily?

Thanks

  Donald

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/

> Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :)
> 
> 
> 
>>
>> Thanks,
>> Kuai
>>
>>>>
>>>> It seems that 6.1 has some other code that prevents this from happening.
>>>>
>>>
>>> I see that there are lots of patches for raid456 between 5.10 and 6.1,
>>> however, I remember that I used to reporduce the deadlock after 6.1, and
>>> it's true it's not easy to reporduce, see below:
>>>
>>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>>>
>>> My guess is that 6.1 is harder to reporduce than 5.10 due to some
>>> changes inside raid456.
>>>
>>> By the way, raid10 had a similiar deadlock, and can be fixed the same
>>> way, so it make sense to backport these patches.
>>>
>>> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
>>>
>>> Thanks,
>>> Kuai
>>>
>>>
>>>> On 5.10 I can reproduce it within minutes to an hour.
>>>>
>>>
>>> .
>>>
>>
> 


-- 
Donald Buczek
buczek@...gen.mpg.de
Tel: +49 30 8413 1433

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ