[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9fcc388-4722-02c6-e3dc-b68893c6cc15@stressinduktion.org>
Date: Fri, 19 Aug 2016 11:20:40 +0200
From: Hannes Frederic Sowa <hannes@...essinduktion.org>
To: Shmulik Ladkani <shmulik.ladkani@...il.com>
Cc: David Miller <davem@...emloft.net>, wenxu@...oud.cn,
kuznet@....inr.ac.ru, jmorris@...ei.org, kaber@...sh.net,
yoshfuji@...ux-ipv6.org, netdev@...r.kernel.org,
wenx05124561@....com
Subject: Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen
exceeds MTU, allow segmentation for gre tunneled skbs
On 19.08.2016 09:26, Shmulik Ladkani wrote:
> On Mon, 15 Aug 2016 14:16:39 +0300 Shmulik Ladkani <shmulik.ladkani@...il.com> wrote:
>> On Fri, 12 Aug 2016 13:11:50 +0200, hannes@...essinduktion.org wrote:
>>> I really would not like to see this expanded to gre and other protocols.
>>> All switches drop packets where the packets are exceeding the MTU,
>>> bridges and also openvswitch should behave the same.
>>>
>>> Unfortunately we already had this loophole in the kernel that vxlan udp
>>> output path could fragment the packet again, even in case of switches.
>>> But this stopped working for GSO packets, which violates another rule in
>>> the kernel, GSO should always be transparent and user space should never
>>> have to care if a packet is GSO or not.
>>>
>>> Because we couldn't a) roll back the change that we fragment packets in
>>> UDP output paths and b) should not violate GSO transparency rule, I
>>> strongly believed it would be better too only change the kernel in a way
>>> that it transparently works with GSO, too. If we argue that a VTEP is
>>> its own UDP endpoint which is set up after the bridge, I still can sleep
>>> well. :)
>>>
>>> My understanding was that GRE failed consistently, GSO as well as
>>> non-GSO packets are dropped, which would be the correct behavior for me.
>>> I don't want to change this. A good argument against this would be if we
>>> violate the GSO transparency rule again. But when I looked into the code
>>> I couldn't see that.
>>
>> I completely agree with your arguments.
>>
>> I think we may run into the same GSO vs Non-GSO anomaly if one uses
>> a "nopmtudisc" tunnel, or a gre tunnel in "collect_md" mode, where the
>> encapsulating iphdr 'df' is derived from 'tun_flags&TUNNEL_DONT_FRAGMENT'
>> (e.g. in case DF is not set).
>>
>> I suspect OvS's vport-gre does exactly that, so I assume this is the
>> reason why the change was suggested.
>>
>> Maybe we can change our criteria in the following manner:
>>
>> - if (skb_iif && proto == IPPROTO_UDP) {
>> + if (skb_iif && !(df & htons(IP_DF))) {
>> IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
>>
>> This way, any tunnel explicitly providing DF is NOT allowed to
>> further fragment the resulting segments (leading to tx segments being
>> dropped).
>> Others tunnels, that do not care (e.g. vxlan and geneve, and probably
>> ovs vport-gre, or other ovs encap vports, in df_default=false mode),
>> will behave same for gso and non-gso.
>>
>> WDYT? Am I missing something here?
>>
>
> ping..
I am really not sure...
Probably we have no other choice. Bridges caring about df bit set is
rather unusual. I wonder if it might not be more sensitive to actually
add a sysctl for that or make it depending on some per-tunnel
configuration which can be updated with netlink?
Bye,
Hannes
Powered by blists - more mailing lists