[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201112061532.GA14554@X58A-UD3R>
Date: Thu, 12 Nov 2020 15:15:32 +0900
From: Byungchul Park <byungchul.park@....com>
To: Ingo Molnar <mingo@...nel.org>
Cc: torvalds@...ux-foundation.org, peterz@...radead.org,
mingo@...hat.com, will@...nel.org, linux-kernel@...r.kernel.org,
tglx@...utronix.de, rostedt@...dmis.org, joel@...lfernandes.org,
alexander.levin@...rosoft.com, daniel.vetter@...ll.ch,
chris@...is-wilson.co.uk, duyuyang@...il.com,
johannes.berg@...el.com, tj@...nel.org, tytso@....edu,
willy@...radead.org, david@...morbit.com, amir73il@...il.com,
bfields@...ldses.org, gregkh@...uxfoundation.org,
kernel-team@....com
Subject: Re: [RFC] Are you good with Lockdep?
On Wed, Nov 11, 2020 at 11:54:41AM +0100, Ingo Molnar wrote:
> > We cannot get reported other than the first one.
>
> Correct. Experience has shown that the overwhelming majority of
> lockdep reports are single-cause and single-report.
>
> This is an optimal approach, because after a decade of exorcising
> locking bugs from the kernel, lockdep is currently, most of the time,
I also think Lockdep has been doing great job exorcising almost all
locking bugs so far. Respect it.
> in 'steady-state', with there being no reports for the overwhelming
> majority of testcases, so the statistical probability of there being
> just one new report is by far the highest.
This is true if Lockdep is only for checking if maintainers' tree are
ok and if we totally ignore how a tool could help folks in the middle of
development esp. when developing something complicated wrt.
synchronization.
But I don't agree if a tool could help while developing something that
could introduce many dependency issues.
> If on the other hand there's some bug in lockdep itself that causes
> excessive false positives, it's better to limit the number of reports
> to one per bootup, so that it's not seen as a nuisance debugging
> facility.
>
> Or if lockdep gets extended that causes multiple previously unreported
> (but very much real) bugs to be reported, it's *still* better to
> handle them one by one: because lockdep doesn't know whether it's real
Why do you think we cannot handle them one by one with multi-reporting?
We can handle them with the first one as we do with single-reporting.
And also that's how we work, for example, when building the kernel or
somethinig.
> > So the one who has introduced the first one should fix it as soon
> > as possible so that the other problems can be reported and fixed.
> > It will get even worse if it's a false positive because it's
> > worth nothing but only preventing reporting real ones.
>
> Since kernel development is highly distributed, and 90%+ of new
> commits get created in dozens of bigger and hundreds of smaller
> maintainer topic trees, the chance of getting two independent locking
> bugs in the same tree without the first bug being found & fixed is
> actually pretty low.
Again, this is true if Lockdep is for checking maintainers' tree only.
> linux-next offers several weeks/months advance integration testing to
> see whether the combination of maintainer trees causes
> problems/warnings.
Good for us.
> > That's why kernel developers are so sensitive to Lockdep's false
> > positive reporting - I would, too. But precisely speaking, it's a
> > problem of how Lockdep was designed and implemented, not false
> > positive itself. Annoying false positives - as WARN()'s messages are
> > annoying - should be fixed but we don't have to be as sensitive as we
> > are now if the tool keeps normally working even after reporting.
>
> I disagree, and even for WARN()s we are seeing a steady movement
> towards WARN_ON_ONCE(): exactly because developers are usually
> interested in the first warning primarily.
>
> Followup warnings are even marked 'tainted' by the kernel - if a bug
> happened we cannot trust the state of the kernel anymore, even if it
> seems otherwise functional. This is doubly true for lockdep, where
I definitely think so. Already tainted kernel is not the kernel we can
trust anymore. Again, IMO, a tool should help us not only for checking
almost final trees but also in developing something. No?
> But for lockdep there's another concern: we do occasionally report
> bugs in locking facilities themselves. In that case it's imperative
> for all lockdep activity to cease & desist, so that we are able to get
> a log entry out before the kernel goes down potentially.
Sure. Makes sense.
> I.e. there's a "race to log the bug as quickly as possible", which is
> the other reason we shut down lockdep immediately. But once shut down,
Not sure I understand this part.
> all the lockdep data structures are hopelessly out of sync and it
> cannot be restarted reasonably.
Is it about tracking IRQ and IRQ-enabled state? That's exactly what I'd
like to point out. Or is there something else?
> Not sure I understand the "problem 2)" outlined here, but I'm looking
> forward to your patchset!
Thank you for the response.
Thanks,
Byungchul
Powered by blists - more mailing lists