[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <976e8cf697c7e5bc3a752e758a484b69a058710a.camel@sipsolutions.net>
Date: Fri, 25 Mar 2022 22:16:29 +0100
From: Johannes Berg <johannes@...solutions.net>
To: William McVicker <willmcvicker@...gle.com>
Cc: Jakub Kicinski <kuba@...nel.org>, linux-wireless@...r.kernel.org,
Marek Szyprowski <m.szyprowski@...sung.com>,
Kalle Valo <kvalo@...eaurora.org>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Amitkumar Karwar <amitkarwar@...il.com>,
Ganapathi Bhat <ganapathi.bhat@....com>,
Xinming Hu <huxinming820@...il.com>, kernel-team@...roid.com,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [BUG] deadlock in nl80211_vendor_cmd
On Fri, 2022-03-25 at 20:36 +0000, William McVicker wrote:
>
> I found that my wlan driver is using the vendor commands to create/delete NAN
> interfaces for this Android feature called Wi-Fi aware [1]. Basically, this
> features allows users to discover other nearby devices and allows them to
> connect directly with one another over a local network.
>
Wait, why is it doing that? We actually support a NAN interface type
upstream :) It's not really quite fully fleshed out, but it could be?
Probably should be?
> Thread 1 Thread 2
> nl80211_pre_doit():
> rtnl_lock()
> wiphy_lock() nl80211_pre_doit():
> rtnl_lock() // blocked by Thread 1
> nl80211_vendor_cmd():
> doit()
> cfg80211_unregister_netdevice()
> rtnl_unlock():
> netdev_run_todo():
> __rtnl_unlock()
> <got RTNL lock>
> wiphy_lock() // blocked by Thread 1
> rtnl_lock(); // DEADLOCK
> nl80211_post_doit():
> wiphy_unlock();
Right, this is what I had discussed in my other mails.
Basically, you're actually doing (some form of) unregister_netdevice()
before rtnl_unlock().
Clearly this isn't possible in cfg80211 itself.
However, I couldn't entirely discount the possibility that this is
possible:
Thread 1 Thread 2
rtnl_lock()
unregister_netdevice()
__rtnl_unlock()
rtnl_lock()
wiphy_lock()
netdev_run_todo()
__rtnl_unlock()
// list not empty now
// because of thread 2 rtnl_lock()
rtnl_lock()
wiphy_lock()
** DEADLOCK **
Given my other discussion with Jakub though, it seems that we can indeed
make sure that this cannot happen, and then this scenario is impossible
without the unregistration you're doing.
> Since I'm unlocking the RTNL inside nl80211_vendor_cmd() after calling doit()
> instead of waiting till post_doit(), I get into the situation you mentioned
> where the net_todo_list is not empty when calling rtnl_unlock. So I decided to
> drop the rtnl_unlock() in nl80211_vendor_cmd() and defer that until
> nl80211_post_doit() after calling wiphy_unlock(). With this change, I haven't
> been able to reproduce the deadlock. So it's possible that we aren't actually
> able to hit this deadlock in nl80211_pre_doit() with the existing code since,
> as you mentioned, one wouldn't be able to call unregister_netdevice() without
> having the RTNL lock.
>
Right, this is why I said earlier that actually adding a flag for vendor
commands to get the RTNL would be more complex - you'd have to basically
open-code pre_doit() and post_doit() in there and check the sub-command
flag at the very beginning and very end.
johannes
Powered by blists - more mailing lists