netdev - Re: [RFC PATCH net-next 2/2] sit: add support of x-netns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 25 Jun 2013 16:10:43 +0200
From:	Nicolas Dichtel <nicolas.dichtel@...nd.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	netdev@...r.kernel.org, davem@...emloft.net, bcrl@...ck.org,
	ravi.mlists@...il.com
Subject: Re: [RFC PATCH net-next 2/2] sit: add support of x-netns

Le 25/06/2013 00:42, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@...nd.com> writes:
>
>> Le 24/06/2013 21:28, Eric W. Biederman a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@...nd.com> writes:
>>>
>>>> This patch allows to switch the netns when packet is encapsulated or
>>>> decapsulated. In other word, the encapsulated packet is received in a netns,
>>>> where the lookup is done to find the tunnel. Once the tunnel is found, the
>>>> packet is decapsulated and injecting into the corresponding interface which
>>>> stands to another netns.
>>>>
>>>> When one of the two netns is removed, the tunnel is destroyed.
>>>
>>> I don't see any fundamental problems with this code.  There are bugs
>>> with the cleanup noted below.
>>>
>>> The primary sit interface is marked as NETNS_LOCAL which is good.  A
>>> comment might be nice explaining the reasonsing but for code
>>> archeologists.
>> Ok.
>>
>>>
>>> Conditionally calling dev_cleanup_skb bugs me.  The extra conditional
>>> looks like a maintenance hazard.   Unless I have missed some subtle
>>> detail either we don't need the cleanup at all or actually it is a bug
>>> that we aren't scrubbing our packets as they progress through tunnels
>>> even in the same network namespace.
>>>
>>> Can we just make that function the skb scrubbing needed for packets to
>>> traverse a tunnel?
>>>
>>> My concern going into this was that we would get code that would break
>>> because it would not be tested enough.  If we can remove the conditional
>>> from dev_cleanup_skb we won't have any code that is conditionally run
>>> and the logic looks simple enough not to bitrot in routine maintenance.
>> My idea was to have the same level of cleanup/scrubbing that when a packet is
>> sent from a netns to another netns through a veth. I cannot use
>> dev_forward_skb() because this function expects to have an ethernet header, it's
>> why I split it in the patch #1.
>>
>> If we leave all information attached to the skb, we may have, for example, an
>> skb with a conntrack from netns1 and a netdevice from netns2. It seems not safe,
>> but maybe I'm wrong. And in fact, the conntrack will not be created in the
>> second netns (nf_conntrack_in() => skb->nfct is not null and not a template =>
>> stats ignore++).
>> Another example is a socket from a netns and the netdevice or conntrack from
>> another netns.
>
> All of that I agree with.
>
> I just don't see any need to make that scrubbing/cleaning of the packet
> conditional.
>
> Semantically going through a tunnel is the same as crossing between
> network namespaces.  So you can change
>
>>>> +	if (tunnel->net != dev_net(tunnel->dev))
>>>> +		dev_cleanup_skb(skb);
>
> to just:
>
> 	dev_cleanup_skb(skb);
>
>> I was thinking that when a packet enter a namespace, it must not be associated
>> to any object from the previous namespace, it should be like if we just receive
>> it on the host.
>
> Overall agree.  Tunnels have the same properties.
>
> Which leads me to conclude either we are missing something or the
> current tunnel code is mildly buggy because it does not do this level of
> scrubbing.
I'm afraid to break an existing scenario, but you're probably right. Let's 
remove this test.


Nicolas

>
> Eric
>
>>>> -static void __net_exit sit_destroy_tunnels(struct sit_net *sitn, struct list_head *head)
>>>> +static void __net_exit sit_destroy_tunnels(struct net *net,
>>>> +					   struct list_head *head)
>>>>    {
>>>> -	int prio;
>>>> +	struct net_device *dev, *aux;
>>>>
>>>> -	for (prio = 1; prio < 4; prio++) {
>>>> -		int h;
>>>> -		for (h = 0; h < HASH_SIZE; h++) {
>>>> -			struct ip_tunnel *t;
>>>> -
>>>> -			t = rtnl_dereference(sitn->tunnels[prio][h]);
>>>> -			while (t != NULL) {
>>>> -				unregister_netdevice_queue(t->dev, head);
>>>> -				t = rtnl_dereference(t->next);
>>>> -			}
>>>> -		}
>>>> -	}
>>>> +	for_each_netdev_safe(net, dev, aux)
>>>> +		if (dev->rtnl_link_ops &&
>>>> +		    !strcmp(dev->rtnl_link_ops->kind, "sit"))
>>>> +			unregister_netdevice_queue(dev, head);
>>>
>>> This entire idiom change is a bit ugly, and it is wrong.
>>>
>>> You need to look for two classes of tunnels to take down.  Tunnels that
>>> originate in net and tunnels whose netdevice is in net.
>>>
>>> For tunnles that reside in net you should be able to just compare the
>>> rtnl_link_ops pointer, rather than an ascii name.
>>>
>>> Tunnels that originate in this network namespace most definitely need to
>>> be taken down as among other things you wisely do not keep a reference
>>> count on the originating network namespace.
>> Yes sure. My beta version was doing the right things, but I change this code
>> before sending the patch :/
>
> Bahahaha!  The dangers of the last minute cleanup.
>
> Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html