[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <057f100e-2b80-f831-0a22-8d2dfe5529bd@ucloud.cn>
Date: Thu, 29 Oct 2020 10:30:50 +0800
From: wenxu <wenxu@...oud.cn>
To: Jakub Kicinski <kuba@...nel.org>, David Ahern <dsahern@...il.com>
Cc: netdev@...r.kernel.org, Stefano Brivio <sbrivio@...hat.com>,
David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without
TUNNEL_DONT_FRAGMENT flags
On 10/27/2020 11:55 PM, Jakub Kicinski wrote:
> On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:
>>> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
>>> support for directly bridged IP packets")? Sounds like non-UDP tunnels
>>> need the same treatment to make PMTUD work.
>>>
>>> RFC2003 seems to clearly forbid ignoring the inner DF:
>> I was looking at this patch Sunday night. To me it seems odd that
>> packets flowing through the overlay affect decisions in the underlay
>> which meant I agree with the proposed change.
> The RFC was probably written before we invented terms like underlay
> and overlay, and still considered tunneling to be an inefficient hack ;)
>
>> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
>> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
>> based on the inner header? Or, if you agree with RFC 2003 and the DF
>> should be propagated outer to inner, then it seems like the df reset
>> needs to be moved up before the call to tnl_update_pmtu
> Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
> PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
> generated based on the inner headers.
>
> We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt,
> but IMHO the default should not be violating the RFC.
If we add TUNNEL_IGNORE_DF to lwt, the two IGNORE_DF and DONT_FRAGMENT
flags should not coexist ? Or DONT_FRAGMENT is prior to the IGNORE_DF?
Also there is inconsistent in the kernel for the tunnel device. For geneve and
vxlan tunnel (don't send tunnel with ip_md_tunnel_xmit) in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT .
And this is also the some behavior for gre device before switching to use
ip_md_tunnel_xmit as the following patch.
962924f ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit
Powered by blists - more mailing lists