[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220518210548.2296546-1-zenczykowski@gmail.com>
Date: Wed, 18 May 2022 14:05:48 -0700
From: Maciej Żenczykowski <zenczykowski@...il.com>
To: Maciej Żenczykowski <maze@...gle.com>
Cc: Linux Network Development Mailing List <netdev@...r.kernel.org>,
Lorenzo Colitti <lorenzo@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Lina Wang <lina.wang@...iatek.com>,
Steffen Klassert <steffen.klassert@...unet.com>
Subject: [PATCH] xfrm: do not set IPv4 DF flag when encapsulating IPv6 frames <= 1280 bytes.
From: Maciej Żenczykowski <maze@...gle.com>
One may want to have DF set on large packets to support discovering
path mtu and limiting the size of generated packets (hence not
setting the XFRM_STATE_NOPMTUDISC tunnel flag), while still
supporting networks that are incapable of carrying even minimal
sized IPv6 frames (post encapsulation).
Having IPv4 Don't Frag bit set on encapsulated IPv6 frames that
are not larger than the minimum IPv6 mtu of 1280 isn't useful,
because the resulting ICMP Fragmentation Required error isn't
actionable (even assuming you receive it) because IPv6 will not
drop it's path mtu below 1280 anyway. While the IPv4 stack
could prefrag the packets post encap, this requires the ICMP
error to be successfully delivered and causes a loss of the
original IPv6 frame (thus requiring a retransmit and latency
hit). Luckily with IPv4 if we simply don't set the DF flag,
we'll just make further fragmenting the packets some other
router's problems.
We'll still learn the correct IPv4 path mtu through encapsulation
of larger IPv6 frames.
I'm still not convinced this patch is entirely sufficient to make
everything happy... but I don't see how it could possibly
make things worse.
See also recent:
4ff2980b6bd2 'xfrm: fix tunnel model fragmentation behavior'
and friends
Bug: 203183943
Cc: Lorenzo Colitti <lorenzo@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>
Cc: Lina Wang <lina.wang@...iatek.com>
Cc: Steffen Klassert <steffen.klassert@...unet.com>
Signed-off-by: Maciej Zenczykowski <maze@...gle.com>
---
net/xfrm/xfrm_output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index d4935b3b9983..555ab35cd119 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -273,6 +273,7 @@ static int xfrm4_beet_encap_add(struct xfrm_state *x, struct sk_buff *skb)
*/
static int xfrm4_tunnel_encap_add(struct xfrm_state *x, struct sk_buff *skb)
{
+ bool small_ipv6 = (skb->protocol == htons(ETH_P_IPV6)) && (skb->len <= IPV6_MIN_MTU);
struct dst_entry *dst = skb_dst(skb);
struct iphdr *top_iph;
int flags;
@@ -303,7 +304,7 @@ static int xfrm4_tunnel_encap_add(struct xfrm_state *x, struct sk_buff *skb)
if (flags & XFRM_STATE_NOECN)
IP_ECN_clear(top_iph);
- top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ?
+ top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) || small_ipv6 ?
0 : (XFRM_MODE_SKB_CB(skb)->frag_off & htons(IP_DF));
top_iph->ttl = ip4_dst_hoplimit(xfrm_dst_child(dst));
--
2.36.1.124.g0e6072fb45-goog
Powered by blists - more mailing lists