netdev - Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160110104949.GE1190@pox.localdomain>
Date:	Sun, 10 Jan 2016 11:49:49 +0100
From:	Thomas Graf <tgraf@...g.ch>
To:	Hannes Frederic Sowa <hannes@...essinduktion.org>
Cc:	Jesse Gross <jesse@...nel.org>, David Wragg <david@...ve.works>,
	David Miller <davem@...emloft.net>, dev@...nvswitch.org,
	Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created
 vxlan devices

On 01/08/16 at 10:29pm, Hannes Frederic Sowa wrote:
> On 07.01.2016 19:40, Thomas Graf wrote:
> >I think you are worried about an ICMP error from a hop which does not
> >decrement TTL. I think that's a good point and I think we should only
> >send an ICMP error if the TTL is decremented in the action list of
> >the flow for which we have seen a MTU based drop (or TTL=0).
> 
> Also agreed, ovs must act in routing mode but at the same time must have an
> IP address on the path. I think this is actually the problem.
> 
> Currently we have no way to feedback an error in current configurations with
> ovs sitting in another namespace for e.g. docker containers:
> 
> We traverse a net namespace so we drop skb->sk, we don't hold any socket
> reference to enqueue an PtB error to the original socket.
> 
> We mostly use netif_rx_internal queues the socket on the backlog, so we
> can't signal an error over the callstack either.
> 
> And ovs does not necessarily have an ip address as the first hop of the
> namespace or the virtual machine, so it cannot know a valid ip address with
> which to reply, no?

[your last statement moved up here:]
> If we are doing L3 forwarding into a tunnel, this is absolutely correct and
> can be easily done.

OK, I can see where you are going with this. I was assuming pure
virtual networks due to the contexts of these patches.

So an ICMP is always UDP encapsulated or directly delivered to a veth or
tap which runs in its own netns or is a VM of which the IP stack
operates exclusively in the context of the virtual network. The stack of
the OVS host never gets to see the actual ICMPs and rp_filter never gets
into play.

In such a context, the virtual router IPs are typically programmed
into the flow table because they are only valid in the virtual network
context, assigning them to the OVS bridge would be wrong as it
represents the underlay context.

The virtual router address is known in the flow context of the virtual
network though and can be given to the icmp_send variant.

Can you elaborate a bit on your container scenario, is it ovs running
in host netns with veth pairs bridging into container netns?

Shouldn't that be solved with the above as the ICMPs sent back in
return by the local OVS are perfectly valid in the IP stack context of
the container netns?