lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9/dTp/hQ5btWTEH@rowland.harvard.edu>
Date:   Sun, 5 Feb 2023 11:46:06 -0500
From:   Alan Stern <stern@...land.harvard.edu>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        USB list <linux-usb@...r.kernel.org>,
        Hillf Danton <hdanton@...a.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sun, Feb 05, 2023 at 10:31:56AM +0900, Tetsuo Handa wrote:
> On 2023/02/05 5:01, Alan Stern wrote:
> > On Sun, Feb 05, 2023 at 02:09:40AM +0900, Tetsuo Handa wrote:
> >> That is a declaration that driver developers are allowed to take it for granted
> >> that driver callback functions can behave as if dev->mutex is not held. 
> > 
> > No it isn't.  It is a declaration that driver developers must be extra 
> > careful because lockdep is unable to detect locking errors involving 
> > dev->mutex.
> 
> Driver developers are not always familiar with locks used by driver core,
> like your
> 
>   It's hard to figure out what's wrong from looking at the syzbot report.
>   What makes you think it is connected with dev->mutex?

You didn't answer this question.

>   At first glance, it seems that the ath6kl driver is trying to flush a
>   workqueue while holding a lock or mutex that is needed by one of the
>   jobs in the workqueue.  That's obviously never going to work, no matter
>   what sort of lockdep validation gets used.
> 
> comment indicates that you did not notice that dev->mutex was connected to
> this problem which involved ath6kl driver code and ath9k driver code and
> driver core code.

Of course I didn't.  There isn't enough information in the syzbot log 
for someone to recognize the connection if they aren't already familiar 
with the code in question.

> Core developers can't assume that driver developers are extra careful, as
> well as driver developers can't assume that core developers are familiar
> with locks used by individual drivers. We need to fill the gap.

Agreed.

> >> Some developers test their changes with lockdep enabled, and believe that their
> >> changes are correct because lockdep did not complain.
> >> https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174 is an example.
> > 
> > How do you know developers are making this mistake?  That example 
> > doesn't show anything of the sort; the commit which introduced the bug 
> > says nothing about lockdep.
> 
> The commit which introduced the bug cannot say something about lockdep, for
> lockdep validation is disabled and nobody noticed the possibility of deadlock
> until syzbot reports it as hung. Since the possibility of deadlock cannot be
> noticed until syzbot reports it as hung,

That isn't true at all.  There are lots of occasions when people realize 
that a deadlock might occur without seeing a report from lockdep or 
syzbot.  You just aren't aware of these occasions because the developer 
then fixes the code before submitting it.  But if you search through the 
mailing list archives, I'm sure you'll find plenty of examples where 
somebody criticizes a proposed patch on the grounds that it can cause a 
deadlock.

>  I assume that there are many similar
> deadlocks in the kernel that involves dev->mutex. How do you teach developers
> that they are making this mistake, without keeping lockdep validation enabled?

There probably are many similar deadlocks in the kernel.  There probably 
also are deadlocks (not involving dev->mutex) which lockdep could catch, 
but hasn't because the right combination of conditions hasn't occurred.

You teach developers about this the same way you teach them about 
anything else: Publishing information, talking to people, and putting 
comments in the kernel source code.

> By keeping lockdep validation disabled, you are declaring that driver developers
> need not to worry about dev->mutex rather than declaring that driver developers
> need to worry about dev->mutex.

That is a very peculiar thing to say.  How do you think people managed 
to deal with deadlocks in the kernel before lockdep was developed?  Do 
you think they said: "My testing didn't reveal any deadlocks, so the 
code must be perfect"?

Of course they didn't.  And now people simply need to realize that 
lockdep isn't perfect either.

And by the way, by disabling lockdep validation I am declaraing that 
enabling it would cause an overwhelming number of false positives, 
rendering lockdep useless (as you found out when you tried).  Not that 
driver developers don't have to worry about dev->mutex.

Alan Stern

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ