[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617144017.82931-3-maxim@isovalent.com>
Date: Tue, 17 Jun 2025 16:40:01 +0200
From: Maxim Mikityanskiy <maxtram95@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>,
Nikolay Aleksandrov <razor@...ckwall.org>
Cc: netdev@...r.kernel.org,
Maxim Mikityanskiy <maxim@...valent.com>
Subject: [PATCH RFC net-next 02/17] net/ipv6: Drop HBH for BIG TCP on TX side
From: Maxim Mikityanskiy <maxim@...valent.com>
BIG TCP IPv6 inserts a hop-by-hop extension header to indicate the real
IPv6 payload length when it doesn't fit into the 16-bit field in the
IPv6 header itself. While it helps tools parse the packet, it also
requires every driver that supports TSO and BIG TCP to remove this
8-byte extension header. It might not sound that bad until we try to
apply it to tunneled traffic. Currently, the drivers don't attempt to
strip HBH if skb->encapsulation = 1. Moreover, trying to do so would
require dissecting different tunnel protocols and making corresponding
adjustments on case-by-case basis, which would slow down the fastpath
(potentially also requiring adjusting checksums in outer headers).
At the same time, BIG TCP IPv4 doesn't insert any extra headers and just
calculates the payload length from skb->len, significantly simplifying
implementing BIG TCP for tunnels.
Stop inserting HBH when building BIG TCP GSO SKBs.
Signed-off-by: Maxim Mikityanskiy <maxim@...valent.com>
---
include/linux/ipv6.h | 1 -
net/ipv6/ip6_output.c | 20 +++-----------------
2 files changed, 3 insertions(+), 18 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index bb53eca1f218..87d3723ce520 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -174,7 +174,6 @@ struct inet6_skb_parm {
#define IP6SKB_L3SLAVE 64
#define IP6SKB_JUMBOGRAM 128
#define IP6SKB_SEG6 256
-#define IP6SKB_FAKEJUMBO 512
#define IP6SKB_MULTIPATH 1024
};
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7bd29a9ff0db..446de2f37d8c 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -185,8 +185,7 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
static int ip6_finish_output_gso(struct net *net, struct sock *sk,
struct sk_buff *skb, unsigned int mtu)
{
- if (!(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
- !skb_gso_validate_network_len(skb, mtu))
+ if (!skb_gso_validate_network_len(skb, mtu))
return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
return ip6_finish_output2(net, sk, skb);
@@ -273,8 +272,6 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
struct dst_entry *dst = skb_dst(skb);
struct net_device *dev = dst->dev;
struct inet6_dev *idev = ip6_dst_idev(dst);
- struct hop_jumbo_hdr *hop_jumbo;
- int hoplen = sizeof(*hop_jumbo);
unsigned int head_room;
struct ipv6hdr *hdr;
u8 proto = fl6->flowi6_proto;
@@ -282,7 +279,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
int hlimit = -1;
u32 mtu;
- head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev);
+ head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dev);
if (opt)
head_room += opt->opt_nflen + opt->opt_flen;
@@ -309,19 +306,8 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
&fl6->saddr);
}
- if (unlikely(seg_len > IPV6_MAXPLEN)) {
- hop_jumbo = skb_push(skb, hoplen);
-
- hop_jumbo->nexthdr = proto;
- hop_jumbo->hdrlen = 0;
- hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
- hop_jumbo->tlv_len = 4;
- hop_jumbo->jumbo_payload_len = htonl(seg_len + hoplen);
-
- proto = IPPROTO_HOPOPTS;
+ if (unlikely(seg_len > IPV6_MAXPLEN))
seg_len = 0;
- IP6CB(skb)->flags |= IP6SKB_FAKEJUMBO;
- }
skb_push(skb, sizeof(struct ipv6hdr));
skb_reset_network_header(skb);
--
2.49.0
Powered by blists - more mailing lists