[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20201029181419.0931f7ab@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
Date: Thu, 29 Oct 2020 18:14:19 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: wenxu <wenxu@...oud.cn>
Cc: David Ahern <dsahern@...il.com>, netdev@...r.kernel.org,
Stefano Brivio <sbrivio@...hat.com>,
David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without
TUNNEL_DONT_FRAGMENT flags
On Thu, 29 Oct 2020 10:30:50 +0800 wenxu wrote:
> On 10/27/2020 11:55 PM, Jakub Kicinski wrote:
> > On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:
> >>> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
> >>> support for directly bridged IP packets")? Sounds like non-UDP tunnels
> >>> need the same treatment to make PMTUD work.
> >>>
> >>> RFC2003 seems to clearly forbid ignoring the inner DF:
> >> I was looking at this patch Sunday night. To me it seems odd that
> >> packets flowing through the overlay affect decisions in the underlay
> >> which meant I agree with the proposed change.
> > The RFC was probably written before we invented terms like underlay
> > and overlay, and still considered tunneling to be an inefficient hack ;)
> >
> >> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
> >> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
> >> based on the inner header? Or, if you agree with RFC 2003 and the DF
> >> should be propagated outer to inner, then it seems like the df reset
> >> needs to be moved up before the call to tnl_update_pmtu
> > Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
> > PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
> > generated based on the inner headers.
> >
> > We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt,
> > but IMHO the default should not be violating the RFC.
>
> If we add TUNNEL_IGNORE_DF to lwt, the two IGNORE_DF and DONT_FRAGMENT
>
> flags should not coexist ? Or DONT_FRAGMENT is prior to the IGNORE_DF?
>
>
> Also there is inconsistent in the kernel for the tunnel device. For geneve and
>
> vxlan tunnel (don't send tunnel with ip_md_tunnel_xmit) in the lwt mode set
>
> the outer df only based TUNNEL_DONT_FRAGMENT .
>
> And this is also the some behavior for gre device before switching to use
> ip_md_tunnel_xmit as the following patch.
>
> 962924f ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit
Ah, that's a lot more convincing, I was looking at the Fixes tag you
provided, but it seems like Fixes should really point at the commit you
mention here.
Please mention the change in GRE behavior and the discrepancy between
handling of DF by different tunnels in the commit message and repost.
Powered by blists - more mailing lists