[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADvbK_ftLCTfmj=Z5yhuatt5eOvxuf=sxbduwdjK4mfuw=4wVw@mail.gmail.com>
Date: Mon, 17 Mar 2025 15:30:40 -0400
From: Xin Long <lucien.xin@...il.com>
To: Mark Bloch <mbloch@...dia.com>
Cc: Steffen Klassert <steffen.klassert@...unet.com>, Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Christian Hopps <chopps@...n.net>, Cosmin Ratiu <cratiu@...dia.com>, Yael Chemla <ychemla@...dia.com>,
wangfe <wangfe@...gle.com>, Dragos Tatulea <dtatulea@...dia.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net] xfrm: Force software GSO only in tunnel mode
On Mon, Mar 17, 2025 at 6:32 AM Mark Bloch <mbloch@...dia.com> wrote:
>
> From: Cosmin Ratiu <cratiu@...dia.com>
>
> The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
> mode. Unfortunately, it is slightly broader than necessary, as it also
> severely affects performance for Geneve + IPSec transport mode over a
> device capable of both HW GSO and IPSec crypto offload. In this case,
> xfrm_output unnecessarily triggers software GSO instead of letting the
> HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
> a back-2-back pair of NICs with MTU 1500, the performance was observed
> to be up to 6x worse when doing software GSO compared to leaving it to
> the hardware.
>
> This commit makes xfrm_output only trigger software GSO in crypto
> offload cases for already encapsulated packets in tunnel mode, as not
> doing so would then cause the inner tunnel skb->inner_networking_header
> to be overwritten and break software GSO for that packet later if the
> device turns out to not be capable of HW GSO.
>
For UDP tunnels, there are two types:
- ENCAP_TYPE_ETHER encaps an ether packet (e.g., VXLAN, Geneve).
- ENCAP_TYPE_IPPROTO encaps an ipproto packet (e.g., SCTP over UDP).
When performing GSO via skb_udp_tunnel_segment():
- ENCAP_TYPE_ETHER relies on inner_network_header to locate the
network header.
- ENCAP_TYPE_IPPROTO relies on inner_transport_header to locate
the transport header.
However, both IPsec transport and tunnel modes modify
inner_transport_header. This patch raises a concern that GSO may
not work correctly for ENCAP_TYPE_IPPROTO UDP tunnels over IPsec
in transport mode.
> Taking a closer look at the conditions for the original bug, to better
> understand the reasons for this change:
> - vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and
> inner network header.
> - then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and
> network headers.
> - later in the xmit path, xfrm_output -> xfrm_outer_mode_output ->
> xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner
> network header with the one set in ip_tunnel_xmit before adding the
> second outer header.
> - __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation
> needs to happen based on dev features. In the original bug, the hw
> couldn't segment the packets, so skb_gso_segment was invoked.
> - deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
> tries to use the wrong inner network header, expecting the one set in
> iptunnel_handle_offloads but getting the one set by xfrm instead.
> - a bit later, ipv6_gso_segment accesses the wrong memory based on that
> wrong inner network header.
>
> With the new change, the original bug (or similar ones) cannot happen
> again, as xfrm will now trigger software GSO before applying a tunnel.
> This concern doesn't exist in packet offload mode, when the HW adds
> encapsulation headers. For the non-offloaded packets (crypto in SW),
> software GSO is still done unconditionally in the else branch.
>
> Fixes: a204aef9fd77 ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output")
> Signed-off-by: Cosmin Ratiu <cratiu@...dia.com>
> Reviewed-by: Dragos Tatulea <dtatulea@...dia.com>
> Reviewed-by: Yael Chemla <ychemla@...dia.com>
> Signed-off-by: Mark Bloch <mbloch@...dia.com>
> ---
> net/xfrm/xfrm_output.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> index f7abd42c077d..42f1ca513879 100644
> --- a/net/xfrm/xfrm_output.c
> +++ b/net/xfrm/xfrm_output.c
> @@ -758,7 +758,7 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
> skb->encapsulation = 1;
>
> if (skb_is_gso(skb)) {
> - if (skb->inner_protocol)
> + if (skb->inner_protocol && x->props.mode == XFRM_MODE_TUNNEL)
> return xfrm_output_gso(net, sk, skb);
>
> skb_shinfo(skb)->gso_type |= SKB_GSO_ESP;
> --
> 2.34.1
>
Powered by blists - more mailing lists