lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 25 Mar 2022 22:16:29 +0100
From:   Johannes Berg <>
To:     William McVicker <>
Cc:     Jakub Kicinski <>,,
        Marek Szyprowski <>,
        Kalle Valo <>,
        "David S. Miller" <>,,
        Amitkumar Karwar <>,
        Ganapathi Bhat <>,
        Xinming Hu <>,,
        Paolo Abeni <>
Subject: Re: [BUG] deadlock in nl80211_vendor_cmd

On Fri, 2022-03-25 at 20:36 +0000, William McVicker wrote:
> I found that my wlan driver is using the vendor commands to create/delete NAN
> interfaces for this Android feature called Wi-Fi aware [1]. Basically, this
> features allows users to discover other nearby devices and allows them to
> connect directly with one another over a local network. 

Wait, why is it doing that? We actually support a NAN interface type
upstream :) It's not really quite fully fleshed out, but it could be?
Probably should be?

> Thread 1                         Thread 2
>  nl80211_pre_doit():
>    rtnl_lock()
>    wiphy_lock()                   nl80211_pre_doit():
>                                     rtnl_lock() // blocked by Thread 1
>  nl80211_vendor_cmd():
>    doit()
>      cfg80211_unregister_netdevice()
>    rtnl_unlock():
>      netdev_run_todo():
>        __rtnl_unlock()
>                                     <got RTNL lock>
>                                     wiphy_lock() // blocked by Thread 1
>        rtnl_lock(); // DEADLOCK
>  nl80211_post_doit():
>    wiphy_unlock();

Right, this is what I had discussed in my other mails.

Basically, you're actually doing (some form of) unregister_netdevice()
before rtnl_unlock().

Clearly this isn't possible in cfg80211 itself.

However, I couldn't entirely discount the possibility that this is

Thread 1                   Thread 2
 // list not empty now    
 // because of thread 2     rtnl_lock()


Given my other discussion with Jakub though, it seems that we can indeed
make sure that this cannot happen, and then this scenario is impossible
without the unregistration you're doing.

> Since I'm unlocking the RTNL inside nl80211_vendor_cmd() after calling doit()
> instead of waiting till post_doit(), I get into the situation you mentioned
> where the net_todo_list is not empty when calling rtnl_unlock. So I decided to
> drop the rtnl_unlock() in nl80211_vendor_cmd() and defer that until
> nl80211_post_doit() after calling wiphy_unlock(). With this change, I haven't
> been able to reproduce the deadlock. So it's possible that we aren't actually
> able to hit this deadlock in nl80211_pre_doit() with the existing code since,
> as you mentioned, one wouldn't be able to call unregister_netdevice() without
> having the RTNL lock.

Right, this is why I said earlier that actually adding a flag for vendor
commands to get the RTNL would be more complex - you'd have to basically
open-code pre_doit() and post_doit() in there and check the sub-command
flag at the very beginning and very end.


Powered by blists - more mailing lists