[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923134742.1399800-1-maxtram95@gmail.com>
Date: Tue, 23 Sep 2025 16:47:25 +0300
From: Maxim Mikityanskiy <maxtram95@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>,
Nikolay Aleksandrov <razor@...ckwall.org>
Cc: netdev@...r.kernel.org,
tcpdump-workers@...ts.tcpdump.org,
Guy Harris <gharris@...ic.net>,
Michael Richardson <mcr@...delman.ca>,
Denis Ovsienko <denis@...ienko.info>,
Xin Long <lucien.xin@...il.com>,
Maxim Mikityanskiy <maxim@...valent.com>
Subject: [PATCH net-next 00/17] BIG TCP for UDP tunnels
From: Maxim Mikityanskiy <maxim@...valent.com>
This series consists adds support for BIG TCP IPv4/IPv6 workloads for vxlan
and geneve. It consists of two parts:
01-11: Remove hop-by-hop header for BIG TCP IPv6 to align with BIG TCP IPv4
12-17: Fix up things that prevent BIG TCP from working with tunnels.
There are a few places that make assumptions about skb->len being
smaller than 64k and/or that store it in 16-bit fields, trimming the
length. The first step to enable BIG TCP with VXLAN and GENEVE tunnels
is to patch those places to handle bigger lengths properly (patches
12-17). This is enough to make IPv4 in IPv4 work with BIG TCP, but when
either the outer or the inner protocol is IPv6, the current BIG TCP code
inserts a hop-by-hop extension header that stores the actual 32-bit
length of the packet. This additional hop-by-hop header turns out to be
problematic for encapsulated cases, because:
1. The drivers don't strip it, and they'd all need to know the structure
of each tunnel protocol in order to strip it correctly.
2. Even if (1) is implemented, it would be an additional performance
penalty per aggregated packet.
3. The skb_gso_validate_network_len check is skipped in
ip6_finish_output_gso when IP6SKB_FAKEJUMBO is set, but it seems that it
would make sense to do the actual validation, just taking into account
the length of the HBH header. When the support for tunnels is added, it
becomes trickier, because there may be one or two HBH headers, depending
on whether it's IPv6 in IPv6 or not.
At the same time, having an HBH header to store the 32-bit length is not
strictly necessary, as BIG TCP IPv4 doesn't do anything like this and
just restores the length from skb->len. The same thing can be done for
BIG TCP IPv6 (patches 01-11). Removing HBH from BIG TCP would allow to
simplify the implementation significantly, and align it with BIG TCP IPv4.
A trivial tcpdump PR for IPv6 is pending here [0]. While the tcpdump
commiters seem actively contributing code to the repository, it
appears community PRs are stuck for a long time (?). We checked
with Xin Long with regards to BIG TCP IPv4, and it turned out only
GUESS_TSO was added to the Fedora distro spec file CFLAGS definition
back then. In any case we have Cc'ed Guy Harris et al (tcpdump maintainer/
committer) here just in case to see if he could help out with unblocking [0].
Thanks all!
[0] https://github.com/the-tcpdump-group/tcpdump/pull/1329
Daniel Borkmann (1):
geneve: Enable BIG TCP packets
Maxim Mikityanskiy (16):
net/ipv6: Introduce payload_len helpers
net/ipv6: Drop HBH for BIG TCP on TX side
net/ipv6: Drop HBH for BIG TCP on RX side
net/ipv6: Remove jumbo_remove step from TX path
net/mlx5e: Remove jumbo_remove step from TX path
net/mlx4: Remove jumbo_remove step from TX path
ice: Remove jumbo_remove step from TX path
bnxt_en: Remove jumbo_remove step from TX path
gve: Remove jumbo_remove step from TX path
net: mana: Remove jumbo_remove step from TX path
net/ipv6: Remove HBH helpers
net: Enable BIG TCP with partial GSO
udp: Support gro_ipv4_max_size > 65536
udp: Validate UDP length in udp_gro_receive
udp: Set length in UDP header to 0 for big GSO packets
vxlan: Enable BIG TCP packets
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 21 -----
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 3 -
drivers/net/ethernet/intel/ice/ice_txrx.c | 3 -
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 42 ++--------
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 75 +++---------------
drivers/net/ethernet/microsoft/mana/mana_en.c | 3 -
drivers/net/geneve.c | 2 +
drivers/net/vxlan/vxlan_core.c | 2 +
include/linux/ipv6.h | 21 ++++-
include/net/ipv6.h | 79 -------------------
include/net/netfilter/nf_tables_ipv6.h | 4 +-
net/bridge/br_netfilter_ipv6.c | 2 +-
net/bridge/netfilter/nf_conntrack_bridge.c | 4 +-
net/core/dev.c | 6 +-
net/core/gro.c | 2 -
net/core/skbuff.c | 10 +--
net/ipv4/udp.c | 5 +-
net/ipv4/udp_offload.c | 12 ++-
net/ipv4/udp_tunnel_core.c | 2 +-
net/ipv6/ip6_input.c | 2 +-
net/ipv6/ip6_offload.c | 36 +--------
net/ipv6/ip6_output.c | 20 +----
net/ipv6/ip6_udp_tunnel.c | 2 +-
net/ipv6/output_core.c | 7 +-
net/netfilter/ipvs/ip_vs_xmit.c | 2 +-
net/netfilter/nf_conntrack_ovs.c | 2 +-
net/netfilter/nf_log_syslog.c | 2 +-
net/sched/sch_cake.c | 2 +-
28 files changed, 84 insertions(+), 289 deletions(-)
--
2.50.1
Powered by blists - more mailing lists