[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7edd5299-6b12-b8f1-046b-bccc8b0799b6@suse.com>
Date: Mon, 26 Oct 2020 13:06:12 +0000
From: Filipe Manana <fdmanana@...e.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Jan Kara <jack@...e.cz>,
David Sterba <dsterba@...e.com>
Subject: Re: possible lockdep regression introduced by 4d004099a668 ("lockdep:
Fix lockdep recursion")
On 26/10/20 12:55, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 11:56:03AM +0000, Filipe Manana wrote:
>>> That smells like the same issue reported here:
>>>
>>> https://lkml.kernel.org/r/20201022111700.GZ2651@hirez.programming.kicks-ass.net
>>>
>>> Make sure you have commit:
>>>
>>> f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
>>>
>>> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
>>
>> Yes, CONFIG_DEBUG_PREEMPT is enabled.
>
> Bummer :/
>
>> I'll try with that commit and let you know, however it's gonna take a
>> few hours to build a kernel and run all fstests (on that test box it
>> takes over 3 hours) to confirm that fixes the issue.
>
> *ouch*, 3 hours is painful. How long to make it sick with the current
> kernel? quicker I would hope?
If generic/068 triggers the bug, than it's about 1 hour. If that passes,
which rarely happens, then have to wait to get into generic/390, which
is over 2 hours.
It sucks that running those tests alone never trigger the issue, but
running all fstests (first btrfs specific ones, followed by the generic
ones) reliably triggers the bug, almost always at generic/068, when that
passes, it's triggered by generic/390. To confirm everything is ok, I
let all tests run (last generic is 612).
>
>> Thanks for the quick reply!
>
> Anyway, I don't think that commit can actually explain the issue :/
>
> The false positive on lockdep_assert_held() happens when the recursion
> count is !0, however we _should_ be having IRQs disabled when
> lockdep_recursion > 0, so that should never be observable.
>
> My hope was that DEBUG_PREEMPT would trigger on one of the
> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
> then be a clear violation.
>
> And you're seeing this on x86, right?
Right.
It's in a qemu vm on x86, with '-cpu host' passed to qemu and kvm enabled.
Thanks.
>
> Let me puzzle moar..
>
Powered by blists - more mailing lists