[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171208152717.fx5w66wvyrfx6vrz@thunk.org>
Date: Fri, 8 Dec 2017 10:27:17 -0500
From: Theodore Ts'o <tytso@....edu>
To: Matthew Wilcox <willy@...radead.org>
Cc: Dave Chinner <david@...morbit.com>,
Matthew Wilcox <mawilcox@...rosoft.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
Jens Axboe <axboe@...nel.dk>,
Rehas Sachdeva <aquannie@...il.com>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org,
linux-f2fs-devel@...ts.sourceforge.net,
linux-nilfs@...r.kernel.org, linux-btrfs@...r.kernel.org,
linux-xfs@...r.kernel.org, linux-usb@...r.kernel.org,
linux-kernel@...r.kernel.org, mingo@...nel.org,
byungchul.park@....com
Subject: Re: Lockdep is less useful than it was
On Thu, Dec 07, 2017 at 02:38:03PM -0800, Matthew Wilcox wrote:
> I think it was a mistake to force these on for everybody; they have a
> much higher false-positive rate than the rest of lockdep, so as you say
> forcing them on leads to fewer people using *any* of lockdep.
>
> The bug you're hitting isn't Byungchul's fault; it's an annotation
> problem. The same kind of annotation problem that we used to have with
> dozens of other places in the kernel which are now fixed.
The question is whose responsibility is it to annotate the code? On
another thread it was suggested it was ext4's responsibility to add
annotations to avoid the false positives --- never the mind the fact
that every single file system is going to have add annotations.
Also note that the documentation for how to add annotations is
***horrible***. It's mostly, "try to figure out how other people
added magic cargo cult code which is not well defined (look at the
definitions of lockdep_set_class, lockdep_set_class_and_name,
lockdep_set_class_and_subclass, and lockdep_set_subclass, and weep) in
other subsystems and hope and pray it works for you."
And the explanation when there are failures, either false positives,
or not, are completely opaque. For example:
[ 16.190198] ext4lazyinit/648 is trying to acquire lock:
[ 16.191201] ((gendisk_completion)1 << part_shift(NUMA_NO_NODE)){+.+.}, at: [<8a1ebe9d>] wait_for_completion_io+0x12/0x20
Just try to tell me that:
((gendisk_completion)1 << part_shift(NUMA_NO_NODE)){+.+.}
is human comprehensible with a straight face. And since the messages
don't even include the subclass/class/name key annotations, as lockdep
tries to handle things that are more and more complex, I'd argue it
has already crossed the boundary where unless you are a lockdep
developer, good luck trying to understand what is going on or how to
add annotations.
So if you are adding complexity to the kernel with the argument,
"lockdep will save us", I'm with Dave --- it's just not a believable
argument.
Cheers,
- Ted
Powered by blists - more mailing lists