[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220325134040.0d98835b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Fri, 25 Mar 2022 13:40:40 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Johannes Berg <johannes@...solutions.net>
Cc: William McVicker <willmcvicker@...gle.com>,
linux-wireless@...r.kernel.org,
Marek Szyprowski <m.szyprowski@...sung.com>,
Kalle Valo <kvalo@...eaurora.org>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Amitkumar Karwar <amitkarwar@...il.com>,
Ganapathi Bhat <ganapathi.bhat@....com>,
Xinming Hu <huxinming820@...il.com>, kernel-team@...roid.com,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [BUG] deadlock in nl80211_vendor_cmd
On Fri, 25 Mar 2022 18:01:30 +0100 Johannes Berg wrote:
> That's not a bad idea, but I think I wouldn't want to backport that, so
> separately :) I don't think that fundamentally changes the locking
> properties though.
>
>
> Couple of more questions I guess: First, are we assuming that the
> cfg80211 code *is* actually broken, even if it looks like nothing can
> cause the situation, due to the empty todo list?
Right.
> Given that we have rtnl_lock_unregistering() (and also
> rtnl_lock_unregistering_all()), it looks like we *do* in fact at least
> not want to make an assumption that no user of __rtnl_unlock() can have
> added a todo item.
>
> I mean, there's technically yet *another* thing we could do - something
> like this:
>
> [this doesn't compile, need to suitably make net_todo_list non-static]
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -95,6 +95,7 @@ void __rtnl_unlock(void)
>
> defer_kfree_skb_list = NULL;
>
> + WARN_ON(!list_empty(&net_todo_list));
> mutex_unlock(&rtnl_mutex);
>
> while (head) {
Yeah, I think we could do that.
> and actually that would allow us to get rid of rtnl_lock_unregistering()
> and rtnl_lock_unregistering_all() simply because we'd actually guarantee
> the invariant that when the RTNL is freshly locked, the list is empty
> (by guaranteeing that it's always empty when it's unlocked, since it can
> only be added to under RTNL).
TBH I don't know what you mean by rtnl_lock_unregistering(), I don't
have that in my tree. rtnl_lock_unregistering_all() can't hurt the case
we're talking about AFAICT.
Eric removed some of the netns / loopback dependencies in net-next,
make sure you pull!
> With some suitable commentary, that might also be a reasonable thing?
> __rtnl_unlock() is actually rather pretty rare, and not exported.
The main use for it seems to be re-locking before loading a module,
which TBH I have no idea why, is it just a cargo cult or a historical
thing :S I don't see how letting netdevs leave before _loading_
a module makes any difference whatsoever.
> However, if you don't like that ...
>
> I've been testing with this patch, to make lockdep complain:
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9933,6 +9933,11 @@ void netdev_run_todo(void)
> if (!list_empty(&list))
> rcu_barrier();
>
> +#ifdef CONFIG_LOCKDEP
> + rtnl_lock();
> + __rtnl_unlock();
> +#endif
> +
> while (!list_empty(&list)) {
> struct net_device *dev
> = list_first_entry(&list, struct net_device, todo_list);
>
>
> That causes lockdep to complain for cfg80211 even if the list *is* in
> fact empty.
>
> Would you be open to adding something like that? Perhaps if I don't just
> do the easy rtnl_lock/unlock, but try to find the corresponding lockdep-
> only things to do there, to cause lockdep to do things without really
> locking? OTOH, the locking overhead of the RTNL we just unlocked is
> probably minimal, vs. the actual work *lockdep* is doing to track all
> this ...
The WARN_ON() you suggested up front make perfect sense to me.
You can also take the definition of net_unlink_todo() out of
netdevice.h while at it because o_0
Powered by blists - more mailing lists