[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <437077cc-8c3c-79de-3475-6c717001d8ae@gmail.com>
Date: Tue, 4 Aug 2020 07:54:37 -0600
From: David Ahern <dsahern@...il.com>
To: Stefano Brivio <sbrivio@...hat.com>,
"David S. Miller" <davem@...emloft.net>
Cc: Florian Westphal <fw@...len.de>, Aaron Conole <aconole@...hat.com>,
Numan Siddique <nusiddiq@...hat.com>,
Jakub Kicinski <kuba@...nel.org>,
Pravin B Shelar <pshelar@....org>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
Nikolay Aleksandrov <nikolay@...ulusnetworks.com>,
Lourdes Pedrajas <lu@...o.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2 2/6] tunnels: PMTU discovery support for
directly bridged IP packets
On 8/3/20 11:53 PM, Stefano Brivio wrote:
> It's currently possible to bridge Ethernet tunnels carrying IP
> packets directly to external interfaces without assigning them
> addresses and routes on the bridged network itself: this is the case
> for UDP tunnels bridged with a standard bridge or by Open vSwitch.
>
> PMTU discovery is currently broken with those configurations, because
> the encapsulation effectively decreases the MTU of the link, and
> while we are able to account for this using PMTU discovery on the
> lower layer, we don't have a way to relay ICMP or ICMPv6 messages
> needed by the sender, because we don't have valid routes to it.
>
> On the other hand, as a tunnel endpoint, we can't fragment packets
> as a general approach: this is for instance clearly forbidden for
> VXLAN by RFC 7348, section 4.3:
>
> VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may
> fragment encapsulated VXLAN packets due to the larger frame size.
> The destination VTEP MAY silently discard such VXLAN fragments.
>
> The same paragraph recommends that the MTU over the physical network
> accomodates for encapsulations, but this isn't a practical option for
> complex topologies, especially for typical Open vSwitch use cases.
>
> Further, it states that:
>
> Other techniques like Path MTU discovery (see [RFC1191] and
> [RFC1981]) MAY be used to address this requirement as well.
>
> Now, PMTU discovery already works for routed interfaces, we get
> route exceptions created by the encapsulation device as they receive
> ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
> we already rebuild those messages with the appropriate MTU and route
> them back to the sender.
>
> Add the missing bits for bridged cases:
>
> - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
> to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
> RFC 4443 section 2.4 for ICMPv6. This function is already called by
> UDP tunnels
>
> - a new function generating those ICMP or ICMPv6 replies. We can't
> reuse icmp_send() and icmp6_send() as we don't see the sender as a
> valid destination. This doesn't need to be generic, as we don't
> cover any other type of ICMP errors given that we only provide an
> encapsulation function to the sender
>
> While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
> we might receive GSO buffers here, and the passed headroom already
> includes the inner MAC length, so we don't have to account for it
> a second time (that would imply three MAC headers on the wire, but
> there are just two).
>
> This issue became visible while bridging IPv6 packets with 4500 bytes
> of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
> bytes of encapsulation headroom, we would advertise MTU as 3950, and
> we would reject fragmented IPv6 datagrams of 3958 bytes size on the
> wire. We're exclusively dealing with network MTU here, though, so we
> could get Ethernet frames up to 3964 octets in that case.
>
> v2:
> - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
> - split IPv4/IPv6 functions (David Ahern)
>
> Signed-off-by: Stefano Brivio <sbrivio@...hat.com>
> ---
> drivers/net/bareudp.c | 5 +-
> drivers/net/geneve.c | 5 +-
> drivers/net/vxlan.c | 4 +-
> include/net/dst.h | 10 --
> include/net/ip_tunnels.h | 2 +
> net/ipv4/ip_tunnel_core.c | 244 ++++++++++++++++++++++++++++++++++++++
> 6 files changed, 254 insertions(+), 16 deletions(-)
>
Much easier to follow
Reviewed-by: David Ahern <dsahern@...il.com>
Powered by blists - more mailing lists