[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220209175558.3117342d@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Wed, 9 Feb 2022 17:55:58 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Xin Long <lucien.xin@...il.com>
Cc: network dev <netdev@...r.kernel.org>, davem@...emloft.net,
Eric Dumazet <edumazet@...gle.com>,
Ziyang Xuan <william.xuanziyang@...wei.com>
Subject: Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
On Wed, 9 Feb 2022 03:19:56 -0500 Xin Long wrote:
> Shuang Li reported an QinQ issue by simply doing:
>
> # ip link add dummy0 type dummy
> # ip link add link dummy0 name dummy0.1 type vlan id 1
> # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
> # rmmod 8021q
>
> unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
How about we put this in a selftest under tools/testing/selftests/net/
or tools/testing/selftests/drivers/net/ ?
> When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
> and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
> before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
>
> When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
> NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
> due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
> netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
> release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
> vlan_dev_free().
>
> This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
> vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
> vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
> netdev_wait_allrefs().
>
> Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")
As far as I understand this is pretty much a revert of the previous fix.
Note that netdevice_event_work_handler() as seen in the backtrace in the
commit message of the fix in question is called from a workqueue, so the
ordering of netdev notifications saves us from nothing here. We can't
start freeing state until all refs are gone.
I think better fix would be to rewrite netdev_run_todo() to free the
netdevs in any order they become ready. That's gonna solve any
dependency problems and may even speed things up.
Powered by blists - more mailing lists