linux-kernel - Re: [RFC] Are you good with Lockdep?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201112061532.GA14554@X58A-UD3R>
Date:   Thu, 12 Nov 2020 15:15:32 +0900
From:   Byungchul Park <byungchul.park@....com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     torvalds@...ux-foundation.org, peterz@...radead.org,
        mingo@...hat.com, will@...nel.org, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, rostedt@...dmis.org, joel@...lfernandes.org,
        alexander.levin@...rosoft.com, daniel.vetter@...ll.ch,
        chris@...is-wilson.co.uk, duyuyang@...il.com,
        johannes.berg@...el.com, tj@...nel.org, tytso@....edu,
        willy@...radead.org, david@...morbit.com, amir73il@...il.com,
        bfields@...ldses.org, gregkh@...uxfoundation.org,
        kernel-team@....com
Subject: Re: [RFC] Are you good with Lockdep?

On Wed, Nov 11, 2020 at 11:54:41AM +0100, Ingo Molnar wrote:
> > We cannot get reported other than the first one.
> 
> Correct. Experience has shown that the overwhelming majority of 
> lockdep reports are single-cause and single-report.
> 
> This is an optimal approach, because after a decade of exorcising 
> locking bugs from the kernel, lockdep is currently, most of the time, 

I also think Lockdep has been doing great job exorcising almost all
locking bugs so far. Respect it.

> in 'steady-state', with there being no reports for the overwhelming 
> majority of testcases, so the statistical probability of there being 
> just one new report is by far the highest.

This is true if Lockdep is only for checking if maintainers' tree are
ok and if we totally ignore how a tool could help folks in the middle of
development esp. when developing something complicated wrt.
synchronization.

But I don't agree if a tool could help while developing something that
could introduce many dependency issues.

> If on the other hand there's some bug in lockdep itself that causes 
> excessive false positives, it's better to limit the number of reports 
> to one per bootup, so that it's not seen as a nuisance debugging 
> facility.
> 
> Or if lockdep gets extended that causes multiple previously unreported 
> (but very much real) bugs to be reported, it's *still* better to 
> handle them one by one: because lockdep doesn't know whether it's real 

Why do you think we cannot handle them one by one with multi-reporting?
We can handle them with the first one as we do with single-reporting.
And also that's how we work, for example, when building the kernel or
somethinig.

> >    So the one who has introduced the first one should fix it as soon 
> >    as possible so that the other problems can be reported and fixed. 
> >    It will get even worse if it's a false positive because it's 
> >    worth nothing but only preventing reporting real ones.
> 
> Since kernel development is highly distributed, and 90%+ of new 
> commits get created in dozens of bigger and hundreds of smaller 
> maintainer topic trees, the chance of getting two independent locking 
> bugs in the same tree without the first bug being found & fixed is 
> actually pretty low.

Again, this is true if Lockdep is for checking maintainers' tree only.

> linux-next offers several weeks/months advance integration testing to 
> see whether the combination of maintainer trees causes 
> problems/warnings.

Good for us.

> >    That's why kernel developers are so sensitive to Lockdep's false
> >    positive reporting - I would, too. But precisely speaking, it's a
> >    problem of how Lockdep was designed and implemented, not false
> >    positive itself. Annoying false positives - as WARN()'s messages are
> >    annoying - should be fixed but we don't have to be as sensitive as we
> >    are now if the tool keeps normally working even after reporting.
> 
> I disagree, and even for WARN()s we are seeing a steady movement 
> towards WARN_ON_ONCE(): exactly because developers are usually 
> interested in the first warning primarily.
> 
> Followup warnings are even marked 'tainted' by the kernel - if a bug 
> happened we cannot trust the state of the kernel anymore, even if it 
> seems otherwise functional. This is doubly true for lockdep, where 

I definitely think so. Already tainted kernel is not the kernel we can
trust anymore. Again, IMO, a tool should help us not only for checking
almost final trees but also in developing something. No?

> But for lockdep there's another concern: we do occasionally report 
> bugs in locking facilities themselves. In that case it's imperative 
> for all lockdep activity to cease & desist, so that we are able to get 
> a log entry out before the kernel goes down potentially.

Sure. Makes sense.

> I.e. there's a "race to log the bug as quickly as possible", which is 
> the other reason we shut down lockdep immediately. But once shut down, 

Not sure I understand this part.

> all the lockdep data structures are hopelessly out of sync and it 
> cannot be restarted reasonably.

Is it about tracking IRQ and IRQ-enabled state? That's exactly what I'd
like to point out. Or is there something else?

> Not sure I understand the "problem 2)" outlined here, but I'm looking 
> forward to your patchset!

Thank you for the response.

Thanks,
Byungchul