lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 25 Mar 2022 22:25:05 +0100
From:   Johannes Berg <johannes@...solutions.net>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     William McVicker <willmcvicker@...gle.com>,
        linux-wireless@...r.kernel.org,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Kalle Valo <kvalo@...eaurora.org>,
        "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
        Amitkumar Karwar <amitkarwar@...il.com>,
        Xinming Hu <huxinming820@...il.com>, kernel-team@...roid.com,
        Paolo Abeni <pabeni@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Cong Wang <cwang@...pensource.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [BUG] deadlock in nl80211_vendor_cmd

On Fri, 2022-03-25 at 13:40 -0700, Jakub Kicinski wrote:
> On Fri, 25 Mar 2022 18:01:30 +0100 Johannes Berg wrote:
> > That's not a bad idea, but I think I wouldn't want to backport that, so
> > separately :) I don't think that fundamentally changes the locking
> > properties though.
> > 
> > 
> > Couple of more questions I guess: First, are we assuming that the
> > cfg80211 code *is* actually broken, even if it looks like nothing can
> > cause the situation, due to the empty todo list?
> 
> Right.

I guess that the below is basically saying "it's not really broken" :)

> > Given that we have rtnl_lock_unregistering() (and also
> > rtnl_lock_unregistering_all()), it looks like we *do* in fact at least
> > not want to make an assumption that no user of __rtnl_unlock() can have
> > added a todo item.
> > 
> > I mean, there's technically yet *another* thing we could do - something
> > like this:
> > 
> > [this doesn't compile, need to suitably make net_todo_list non-static]
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -95,6 +95,7 @@ void __rtnl_unlock(void)
> >  
> >         defer_kfree_skb_list = NULL;
> >  
> > +       WARN_ON(!list_empty(&net_todo_list));
> >         mutex_unlock(&rtnl_mutex);
> >  
> >         while (head) {
> 
> Yeah, I think we could do that.

It seems that would be simpler. Even if I might eventually still want to
do the cfg80211 change, but it would give us some confidence that this
really cannot be happening anywhere.


> TBH I don't know what you mean by rtnl_lock_unregistering(), I don't
> have that in my tree. rtnl_lock_unregistering_all() can't hurt the case
> we're talking about AFAICT.
> 
> Eric removed some of the netns / loopback dependencies in net-next, 
> make sure you pull!

Sorry, yeah, I was looking at an older tree where I was testing on in
the simulation environment - this disappeared in commit 8a4fc54b07d7
("net: get rid of rtnl_lock_unregistering()").

> > With some suitable commentary, that might also be a reasonable thing?
> > __rtnl_unlock() is actually rather pretty rare, and not exported.
> 
> The main use for it seems to be re-locking before loading a module,
> which TBH I have no idea why, is it just a cargo cult or a historical
> thing :S  I don't see how letting netdevs leave before _loading_ 
> a module makes any difference whatsoever.

Indeed.


> The WARN_ON() you suggested up front make perfect sense to me.
> You can also take the definition of net_unlink_todo() out of
> netdevice.h while at it because o_0

Heh indeed, what?

But (and now I'll CC even more people...) if we can actually have an
invariant that while RTNL is unlocked the todo list is empty, then we
also don't need rtnl_lock_unregistering_all(), and can remove the
netdev_unregistering_wq, etc., no?

IOW, I'm not sure why we needed commit 50624c934db1 ("net: Delay
default_device_exit_batch until no devices are unregistering v2"), but I
also have little doubt that we did.

Ah, no. This isn't about locking in this case, it's literally about
ensuring that free_netdev() has been called in netdev_run_todo()?

Which we don't care about in cfg80211 - we just care about the list
being empty so there's no chance we'll reacquire the RTNL.

johannes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ