[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51C9A4E3.2060906@6wind.com>
Date: Tue, 25 Jun 2013 16:10:43 +0200
From: Nicolas Dichtel <nicolas.dichtel@...nd.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: netdev@...r.kernel.org, davem@...emloft.net, bcrl@...ck.org,
ravi.mlists@...il.com
Subject: Re: [RFC PATCH net-next 2/2] sit: add support of x-netns
Le 25/06/2013 00:42, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@...nd.com> writes:
>
>> Le 24/06/2013 21:28, Eric W. Biederman a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@...nd.com> writes:
>>>
>>>> This patch allows to switch the netns when packet is encapsulated or
>>>> decapsulated. In other word, the encapsulated packet is received in a netns,
>>>> where the lookup is done to find the tunnel. Once the tunnel is found, the
>>>> packet is decapsulated and injecting into the corresponding interface which
>>>> stands to another netns.
>>>>
>>>> When one of the two netns is removed, the tunnel is destroyed.
>>>
>>> I don't see any fundamental problems with this code. There are bugs
>>> with the cleanup noted below.
>>>
>>> The primary sit interface is marked as NETNS_LOCAL which is good. A
>>> comment might be nice explaining the reasonsing but for code
>>> archeologists.
>> Ok.
>>
>>>
>>> Conditionally calling dev_cleanup_skb bugs me. The extra conditional
>>> looks like a maintenance hazard. Unless I have missed some subtle
>>> detail either we don't need the cleanup at all or actually it is a bug
>>> that we aren't scrubbing our packets as they progress through tunnels
>>> even in the same network namespace.
>>>
>>> Can we just make that function the skb scrubbing needed for packets to
>>> traverse a tunnel?
>>>
>>> My concern going into this was that we would get code that would break
>>> because it would not be tested enough. If we can remove the conditional
>>> from dev_cleanup_skb we won't have any code that is conditionally run
>>> and the logic looks simple enough not to bitrot in routine maintenance.
>> My idea was to have the same level of cleanup/scrubbing that when a packet is
>> sent from a netns to another netns through a veth. I cannot use
>> dev_forward_skb() because this function expects to have an ethernet header, it's
>> why I split it in the patch #1.
>>
>> If we leave all information attached to the skb, we may have, for example, an
>> skb with a conntrack from netns1 and a netdevice from netns2. It seems not safe,
>> but maybe I'm wrong. And in fact, the conntrack will not be created in the
>> second netns (nf_conntrack_in() => skb->nfct is not null and not a template =>
>> stats ignore++).
>> Another example is a socket from a netns and the netdevice or conntrack from
>> another netns.
>
> All of that I agree with.
>
> I just don't see any need to make that scrubbing/cleaning of the packet
> conditional.
>
> Semantically going through a tunnel is the same as crossing between
> network namespaces. So you can change
>
>>>> + if (tunnel->net != dev_net(tunnel->dev))
>>>> + dev_cleanup_skb(skb);
>
> to just:
>
> dev_cleanup_skb(skb);
>
>> I was thinking that when a packet enter a namespace, it must not be associated
>> to any object from the previous namespace, it should be like if we just receive
>> it on the host.
>
> Overall agree. Tunnels have the same properties.
>
> Which leads me to conclude either we are missing something or the
> current tunnel code is mildly buggy because it does not do this level of
> scrubbing.
I'm afraid to break an existing scenario, but you're probably right. Let's
remove this test.
Nicolas
>
> Eric
>
>>>> -static void __net_exit sit_destroy_tunnels(struct sit_net *sitn, struct list_head *head)
>>>> +static void __net_exit sit_destroy_tunnels(struct net *net,
>>>> + struct list_head *head)
>>>> {
>>>> - int prio;
>>>> + struct net_device *dev, *aux;
>>>>
>>>> - for (prio = 1; prio < 4; prio++) {
>>>> - int h;
>>>> - for (h = 0; h < HASH_SIZE; h++) {
>>>> - struct ip_tunnel *t;
>>>> -
>>>> - t = rtnl_dereference(sitn->tunnels[prio][h]);
>>>> - while (t != NULL) {
>>>> - unregister_netdevice_queue(t->dev, head);
>>>> - t = rtnl_dereference(t->next);
>>>> - }
>>>> - }
>>>> - }
>>>> + for_each_netdev_safe(net, dev, aux)
>>>> + if (dev->rtnl_link_ops &&
>>>> + !strcmp(dev->rtnl_link_ops->kind, "sit"))
>>>> + unregister_netdevice_queue(dev, head);
>>>
>>> This entire idiom change is a bit ugly, and it is wrong.
>>>
>>> You need to look for two classes of tunnels to take down. Tunnels that
>>> originate in net and tunnels whose netdevice is in net.
>>>
>>> For tunnles that reside in net you should be able to just compare the
>>> rtnl_link_ops pointer, rather than an ascii name.
>>>
>>> Tunnels that originate in this network namespace most definitely need to
>>> be taken down as among other things you wisely do not keep a reference
>>> count on the originating network namespace.
>> Yes sure. My beta version was doing the right things, but I change this code
>> before sending the patch :/
>
> Bahahaha! The dangers of the last minute cleanup.
>
> Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists