[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56902A3D.4090508@stressinduktion.org>
Date: Fri, 8 Jan 2016 22:29:33 +0100
From: Hannes Frederic Sowa <hannes@...essinduktion.org>
To: Thomas Graf <tgraf@...g.ch>
Cc: Jesse Gross <jesse@...nel.org>, David Wragg <david@...ve.works>,
David Miller <davem@...emloft.net>, dev@...nvswitch.org,
Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created
vxlan devices
On 07.01.2016 19:40, Thomas Graf wrote:
> On 01/07/16 at 06:50pm, Hannes Frederic Sowa wrote:
>> On 07.01.2016 18:21, Thomas Graf wrote:
>>> On 01/07/16 at 08:35am, Jesse Gross wrote:
>>>> On Thu, Jan 7, 2016 at 3:49 AM, Thomas Graf <tgraf@...g.ch> wrote:
>>>>> A simple start could be to add a new return code for > MTU drops in
>>>>> the dev_queue_xmit() path and check for NET_XMIT_DROP_MTU in
>>>>> ovs_vport_send() and emit proper ICMPs.
>>>>
>>>> That could be interesting. The problem in the past was making sure
>>>> that ICMPs that are generated fit in the virtual network appropriately
>>>> - right addresses, etc. This requires either spoofing addresses or
>>>> some additional knowledge about the topology that we don't currently
>>>> have in the kernel.
>>>
>>> Are you worried about emitting an ICMP with a source which is not
>>> a local host address?
>>
>> We have uRPF enabled for IPv4 by default on all kernels. Thus if we generate
>> an IPv4 ICMP packet back with an error message it must have a source address
>> which the receiving kernel considers valid. Valid means that sending to the
>> source address would have used the same outgoing interface the ICMP error
>> came in from.
>
> Agreed. I think this is given though as we would reverse the addresses
> as icmp_send() already does:
>
> saddr = iph->daddr;
>
>>> Can't we just use icmp_send() in the context of the inner header and
>>> feed it to the flow table to send it back? It should be the same as
>>> for ip_forward().
>>
>> The bridge's ip address often has no valid path as seen from the end host
>> system receiving the icmp error, because the openvswitch is not really part
>> of the L3 forwarding chain.
>
> I don't think the IP of the bridge ever comes into play. It shouldn't.
> I'm not even sure what could be considered the address of the bridge
> ;-)
Yes, exactly. :)
>
>> Faking the address from the packet (e.g. using the destination address of
>> the original packet) will make traceroute go nuts.
>
> I think you are worried about an ICMP error from a hop which does not
> decrement TTL. I think that's a good point and I think we should only
> send an ICMP error if the TTL is decremented in the action list of
> the flow for which we have seen a MTU based drop (or TTL=0).
Also agreed, ovs must act in routing mode but at the same time must have
an IP address on the path. I think this is actually the problem.
Currently we have no way to feedback an error in current configurations
with ovs sitting in another namespace for e.g. docker containers:
We traverse a net namespace so we drop skb->sk, we don't hold any socket
reference to enqueue an PtB error to the original socket.
We mostly use netif_rx_internal queues the socket on the backlog, so we
can't signal an error over the callstack either.
And ovs does not necessarily have an ip address as the first hop of the
namespace or the virtual machine, so it cannot know a valid ip address
with which to reply, no?
> I don't really see a difference between ip_forward(), some
> sophisticated tc action or OVS. As soon as they decremented TTL and
> perform L3 forwarding, then they should send out ICMP errors to allow
> for proper PMTU.
Yes, but depending on the ip configuration, those icmps will then be
dropped in the reverse path filter.
>> Normally ethernet devices don't return icmp error messages. E.g. broken
>> jumbo frame configuration just leads to silent packet loss because the
>> packet is discarded before a router can handle it. Thus it would be best in
>> case of local ovs installation if the error is already transported back to
>> the client application via the network call stack. This might be very
>> difficult in case we enqueue the packet to a backlog queue and reschedule
>> softirqs. Probably we need some way of faking source addresses from bridges
>> now.... :/
>
> I think the major complications comes from the assumption that OVS is
> a bridge. This is not necessarily the case as stated above. If a flow
> is doing L3 forwarding, we should send ICMPs as expected from a
> router.
If we are doing L3 forwarding into a tunnel, this is absolutely correct
and can be easily done.
Bye,
Hannes
Powered by blists - more mailing lists