lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7edd5299-6b12-b8f1-046b-bccc8b0799b6@suse.com>
Date:   Mon, 26 Oct 2020 13:06:12 +0000
From:   Filipe Manana <fdmanana@...e.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, Jan Kara <jack@...e.cz>,
        David Sterba <dsterba@...e.com>
Subject: Re: possible lockdep regression introduced by 4d004099a668 ("lockdep:
 Fix lockdep recursion")



On 26/10/20 12:55, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 11:56:03AM +0000, Filipe Manana wrote:
>>> That smells like the same issue reported here:
>>>
>>>   https://lkml.kernel.org/r/20201022111700.GZ2651@hirez.programming.kicks-ass.net
>>>
>>> Make sure you have commit:
>>>
>>>   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
>>>
>>> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
>>
>> Yes, CONFIG_DEBUG_PREEMPT is enabled.
> 
> Bummer :/
> 
>> I'll try with that commit and let you know, however it's gonna take a
>> few hours to build a kernel and run all fstests (on that test box it
>> takes over 3 hours) to confirm that fixes the issue.
> 
> *ouch*, 3 hours is painful. How long to make it sick with the current
> kernel? quicker I would hope?

If generic/068 triggers the bug, than it's about 1 hour. If that passes,
which rarely happens, then have to wait to get into generic/390, which
is over 2 hours.

It sucks that running those tests alone never trigger the issue, but
running all fstests (first btrfs specific ones, followed by the generic
ones) reliably triggers the bug, almost always at generic/068, when that
passes, it's triggered by generic/390. To confirm everything is ok, I
let all tests run (last generic is 612).


> 
>> Thanks for the quick reply!
> 
> Anyway, I don't think that commit can actually explain the issue :/
> 
> The false positive on lockdep_assert_held() happens when the recursion
> count is !0, however we _should_ be having IRQs disabled when
> lockdep_recursion > 0, so that should never be observable.
> 
> My hope was that DEBUG_PREEMPT would trigger on one of the
> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
> then be a clear violation.
> 
> And you're seeing this on x86, right?

Right.
It's in a qemu vm on x86, with '-cpu host' passed to qemu and kvm enabled.

Thanks.


> 
> Let me puzzle moar..
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ