[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617144017.82931-1-maxim@isovalent.com>
Date: Tue, 17 Jun 2025 16:39:59 +0200
From: Maxim Mikityanskiy <maxtram95@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>,
Nikolay Aleksandrov <razor@...ckwall.org>
Cc: netdev@...r.kernel.org,
Maxim Mikityanskiy <maxim@...valent.com>
Subject: [PATCH RFC net-next 00/17] BIG TCP for UDP tunnels
This series consists of two parts that will be submitted separately:
01-11: Remove hop-by-hop header for BIG TCP IPv6.
12-17: Fix up things that prevent BIG TCP from working with tunnels.
I kept them both here for the sake of big picture.
There are a few places that make assumptions about skb->len being
smaller than 64k and/or that store it in 16-bit fields, trimming the
length. The first step to enable BIG TCP with VXLAN and GENEVE tunnels
is to patch those places to handle bigger lengths properly (patches
12-17). This is enough to make IPv4 in IPv4 work with BIG TCP, but when
either the outer or the inner protocol is IPv6, the current BIG TCP code
inserts a hop-by-hop extension header that stores the actual 32-bit
length of the packet.
This additional hop-by-hop header turns out problematic for encapsulated
cases, because:
1. The drivers don't strip it, and they'd all need to know the structure
of each tunnel protocol in order to strip it correctly.
2. Even if (1) is implemented, it would be an additional performance
penalty per aggregated packet.
3. The skb_gso_validate_network_len check is skipped in
ip6_finish_output_gso when IP6SKB_FAKEJUMBO is set, but it seems that it
would make sense to do the actual validation, just taking into account
the length of the HBH header. When the support for tunnels is added, it
becomes trickier, because there may be one or two HBH headers, depending
on whether it's IPv6 in IPv6 or not.
At the same time, having an HBH header to store the 32-bit length is not
strictly necessary, as BIG TCP IPv4 doesn't do anything like this and
just restores the length from skb->len. The same thing can be done for
BIG TCP IPv6 (patches 01-11).
The only reason why we keep inserting HBH seems to be for the tools that
parse the packets, but the above drawbacks seem to outweigh this, and
the tools can be patched (like they need to, in order to be able to
parse BIG TCP IPv4 now). I have a patch for tcpdump.
Removing HBH from BIG TCP would allow to simplify the implementation
significantly, and align it with BIG TCP IPv4.
Daniel Borkmann (1):
geneve: Enable BIG TCP packets
Maxim Mikityanskiy (16):
net/ipv6: Introduce payload_len helpers
net/ipv6: Drop HBH for BIG TCP on TX side
net/ipv6: Drop HBH for BIG TCP on RX side
net/ipv6: Remove jumbo_remove step from TX path
net/mlx5e: Remove jumbo_remove step from TX path
net/mlx4: Remove jumbo_remove step from TX path
ice: Remove jumbo_remove step from TX path
bnxt_en: Remove jumbo_remove step from TX path
gve: Remove jumbo_remove step from TX path
net: mana: Remove jumbo_remove step from TX path
net/ipv6: Remove HBH helpers
net: Enable BIG TCP with partial GSO
udp: Support gro_ipv4_max_size > 65536
udp: Validate UDP length in udp_gro_receive
udp: Set length in UDP header to 0 for big GSO packets
vxlan: Enable BIG TCP packets
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 21 -----
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 3 -
drivers/net/ethernet/intel/ice/ice_txrx.c | 3 -
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 42 ++--------
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 75 +++---------------
drivers/net/ethernet/microsoft/mana/mana_en.c | 3 -
drivers/net/geneve.c | 2 +
drivers/net/vxlan/vxlan_core.c | 2 +
include/linux/ipv6.h | 21 ++++-
include/net/ipv6.h | 79 -------------------
include/net/netfilter/nf_tables_ipv6.h | 4 +-
net/bridge/br_netfilter_ipv6.c | 2 +-
net/bridge/netfilter/nf_conntrack_bridge.c | 4 +-
net/core/dev.c | 3 +-
net/core/gro.c | 2 -
net/core/skbuff.c | 10 +--
net/ipv4/udp.c | 5 +-
net/ipv4/udp_offload.c | 12 ++-
net/ipv4/udp_tunnel_core.c | 2 +-
net/ipv6/ip6_input.c | 2 +-
net/ipv6/ip6_offload.c | 36 +--------
net/ipv6/ip6_output.c | 20 +----
net/ipv6/ip6_udp_tunnel.c | 2 +-
net/ipv6/output_core.c | 7 +-
net/netfilter/ipvs/ip_vs_xmit.c | 2 +-
net/netfilter/nf_conntrack_ovs.c | 2 +-
net/netfilter/nf_log_syslog.c | 2 +-
net/sched/sch_cake.c | 2 +-
28 files changed, 83 insertions(+), 287 deletions(-)
--
2.49.0
Powered by blists - more mailing lists