[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <308992f9-5c45-ea0c-7265-82c6976be45f@huawei.com>
Date: Sat, 16 Mar 2019 13:24:39 +0800
From: Zhiqiang Liu <liuzhiqiang26@...wei.com>
To: Stefano Brivio <sbrivio@...hat.com>,
Eric Dumazet <eric.dumazet@...il.com>
CC: David Miller <davem@...emloft.net>, <petrm@...lanox.com>,
<idosch@...lanox.com>, <sd@...asysnail.net>,
<mousuanming@...wei.com>, <netdev@...r.kernel.org>,
<mingfangsen@...wei.com>, <zhoukang7@...wei.com>,
<wangxiaogang3@...wei.com>
Subject: Re: [PATCH v2] vxlan: remove the redundant gro_cells_destroy()
calling.
:
> On Fri, 15 Mar 2019 14:26:10 -0700
> Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>> On 03/15/2019 02:08 PM, Stefano Brivio wrote:
>>> On Fri, 15 Mar 2019 11:56:01 -0700
>>> Eric Dumazet <eric.dumazet@...il.com> wrote:
>>>
>>>> On 03/15/2019 11:02 AM, David Miller wrote:
>>>>> From: Eric Dumazet <eric.dumazet@...il.com>
>>>>> Date: Fri, 15 Mar 2019 09:06:25 -0700
>>>>>
>>>>>>
>>>>>>
>>>>>> On 03/15/2019 08:28 AM, Stefano Brivio wrote:
>>>>>>> On Fri, 15 Mar 2019 23:18:52 +0800
>>>>>>> Zhiqiang Liu <liuzhiqiang26@...wei.com> wrote:
>>>>>>>
>>>>>>>> In vxlan_destroy_tunnels func, unregister_netdevice_queue is called after
>>>>>>>> gro_cells_destroy func. However, in unregister_netdevice_queue func, the
>>>>>>>> gro_cells_destroy func will also call the gro_cells_destroy func as the
>>>>>>>> following routine:
>>>>>>>> unregister_netdevice_many() -> rollback_registered_many()
>>>>>>>> -> ndo_uninit() -> gro_cells_destroy()
>>>>>>>>
>>>>>>>
>>>>>>> NACK, please read my and Eric's comments to v1 -- giving me more than 23
>>>>>>> minutes to answer would have been a nice touch as well :)
>>>>>>>
>>>>>>
>>>>>> Sorry for the confusion, I forgot to add the question marks to my sentences.
>>>>>>
>>>>>> In fact, this is a bug fix, that we missed in the previous fix.
>>>>>>
>>>>>> Technically the bug is older.
>>>>>
>>>>> Please elaborate.
>>>>>
>>>>
>>>> Commit ad6c9986bcb62
>>>> ("vxlan: Fix GRO cells race condition between receive and link delete")
>>>>
>>>> fixed a race condition for the typical case a vxlan device is dismantled from the
>>>> current netns.
>>>>
>>>> But if a netns is dismantled, we call vxlan_destroy_tunnels()
>>>> to schedule a unregister_netdevice_queue() of all the vxlan tunnels
>>>> that are related to this netns.
>>>
>>> Won't that happen via ops_exit_list() only after synchronize_rcu() is
>>> called by cleanup_net(), though? Is there another path I missed?
>>
>> Just look at vxlan_destroy_tunnels().
>>
>> The call to gro_cells_destroy(&vxlan->gro_cells);
>> is done _before_
>> unregister_netdevice_queue(vxlan->dev, head);
>>
>> So packets can still fly, the RCU grace period has not yet started.
>
> Wait, what... :/ thanks for pointing that out, I guess it was too
> obvious for me to notice.
>
> Zhiqiang, could you maybe update the commit message with these two bits
> of information (the real issue explained by Eric, and the different
> Fixes: tag), and post v3?
>
> This would be an actual fix and not a clean-up, so it doesn't need to
> wait for net-next to re-open.
Thanks for your reply.
I have updated the commit message as suggested by Eric. Even though I have read
Documentation/networking/netdev-FAQ.rst as you mentioned. I am now still a little
confused about the subject-prefix of v3 (net or net-next). And David Miller saied
the net-next tree is CLOSED.
Could you help me check whether the following v3 patch is ok?
Subject: [PATCH net v3] vxlan: remove the redundant gro_cells_destroy() calling.
OR
Subject: [PATCH net-next v3] vxlan: remove the redundant gro_cells_destroy() calling.
Commit ad6c9986bcb62 ("vxlan: Fix GRO cells race condition between
receive and link delete") fixed a race condition for the typical case a vxlan
device is dismantled from the current netns. But if a netns is dismantled,
vxlan_destroy_tunnels() is called to schedule a unregister_netdevice_queue()
of all the vxlan tunnels that are related to this netns.
In vxlan_destroy_tunnels(), gro_cells_destroy() is called and finished before
unregister_netdevice_queue(). This means that the gro_cells_destroy() call is
done too soon, for the same reasons explained in above commit.
So we need to fully respect the RCU rules, and thus must remove the
gro_cells_destroy() call or risk use after-free.
Fixes: 58ce31cca1ff ("vxlan: GRO support at tunnel layer")
Signed-off-by: Suanming.Mou <mousuanming@...wei.com>
Suggested-by: Eric Dumazet <eric.dumazet@...il.com>
Reviewed-by: Stefano Brivio <sbrivio@...hat.com>
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@...wei.com>
---
V1->V2:
- update the commit message suggeted by Eric Dumazet
- update Fixes: tag
drivers/net/vxlan.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 077f1b9f2761..d76dfed8d9bb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -4335,10 +4335,8 @@ static void vxlan_destroy_tunnels(struct net *net, struct list_head *head)
/* If vxlan->dev is in the same netns, it has already been added
* to the list by the previous loop.
*/
- if (!net_eq(dev_net(vxlan->dev), net)) {
- gro_cells_destroy(&vxlan->gro_cells);
+ if (!net_eq(dev_net(vxlan->dev), net))
unregister_netdevice_queue(vxlan->dev, head);
- }
}
for (h = 0; h < PORT_HASH_SIZE; ++h)
--
1.7.12.4
Powered by blists - more mailing lists