linux-kernel - Re: [PATCH -next 2/3] md/raid10: convert resync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cd0170a7-86ab-ebb3-0aed-93b2e18be432@deltatee.com>
Date:   Thu, 1 Sep 2022 18:56:06 -0600
From:   Logan Gunthorpe <logang@...tatee.com>
To:     Guoqing Jiang <guoqing.jiang@...ux.dev>,
        Yu Kuai <yukuai1@...weicloud.com>, song@...nel.org
Cc:     linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        yukuai3@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock



On 2022-09-01 18:49, Guoqing Jiang wrote:
> 
> 
> On 9/2/22 2:41 AM, Logan Gunthorpe wrote:
>> Hi,
>>
>> On 2022-08-29 07:15, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@...wei.com>
>>>
>>> Currently, wait_barrier() will hold 'resync_lock' to read
>>> 'conf->barrier',
>>> and io can't be dispatched until 'barrier' is dropped.
>>>
>>> Since holding the 'barrier' is not common, convert 'resync_lock' to use
>>> seqlock so that holding lock can be avoided in fast path.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> I've found some lockdep issues starting with this patch in md-next while
>> running mdadm tests (specifically 00raid10 when run about 10 times in a
>> row).
>>
>> I've seen a couple different lock dep errors. The first seems to be
>> reproducible on this patch, then it possibly changes to the second on
>> subsequent patches. Not sure exactly.
> 
> That's why I said "try mdadm test suites too to avoid regression." ...

You may have to run it multiple times, a single run tends not to catch
all errors. I had to loop the noted test 10 times to be sure I hit this
every time when I did the simple bisect.

And ensure that all the debug options are on when you run it (take a
look at the Kernel Hacking section in menuconfig). You won't hit this
bug without at least CONFIG_PROVE_LOCKING=y.

Logan