lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADg4-L_SNAKy3mBn7ssq7uw0_+wZ_=zyqrQ23yVTOo5J7z7Q=g@mail.gmail.com>
Date: Mon, 28 Jul 2025 18:01:16 -0700
From: Christoph Paasch <cpaasch@...nai.com>
To: Gal Pressman <gal@...dia.com>
Cc: Saeed Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>, 
	Leon Romanovsky <leon@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, 
	linux-rdma@...r.kernel.org
Subject: Re: [PATCH net v2] net/mlx5: Correctly set gso_size when LRO is used

On Mon, Jul 28, 2025 at 10:57 AM Christoph Paasch <cpaasch@...nai.com> wrote:
>
> Hi!
>
> On Mon, Jul 28, 2025 at 7:36 AM Gal Pressman <gal@...dia.com> wrote:
>>
>> On 15/07/2025 23:20, Christoph Paasch via B4 Relay wrote:
>> > From: Christoph Paasch <cpaasch@...nai.com>
>> >
>> > gso_size is expected by the networking stack to be the size of the
>> > payload (thus, not including ethernet/IP/TCP-headers). However, cqe_bcnt
>> > is the full sized frame (including the headers). Dividing cqe_bcnt by
>> > lro_num_seg will then give incorrect results.
>> >
>> > For example, running a bpftrace higher up in the TCP-stack
>> > (tcp_event_data_recv), we commonly have gso_size set to 1450 or 1451 even
>> > though in reality the payload was only 1448 bytes.
>> >
>> > This can have unintended consequences:
>> > - In tcp_measure_rcv_mss() len will be for example 1450, but. rcv_mss
>> > will be 1448 (because tp->advmss is 1448). Thus, we will always
>> > recompute scaling_ratio each time an LRO-packet is received.
>> > - In tcp_gro_receive(), it will interfere with the decision whether or
>> > not to flush and thus potentially result in less gro'ed packets.
>> >
>> > So, we need to discount the protocol headers from cqe_bcnt so we can
>> > actually divide the payload by lro_num_seg to get the real gso_size.
>> >
>> > v2:
>> >  - Use "(unsigned char *)tcp + tcp->doff * 4 - skb->data)" to compute header-len
>> >    (Tariq Toukan <tariqt@...dia.com>)
>> >  - Improve commit-message (Gal Pressman <gal@...dia.com>)
>> >
>> > Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files")
>> > Signed-off-by: Christoph Paasch <cpaasch@...nai.com>
>>
>> Hi Christoph,
>>
>> This commit results in hw csum failures [1] when running iperf tcp
>> traffic with LRO enabled for a few seconds.
>>
>> I don't think the patch is wrong, but I suspect it exposes a new flow in
>> GRO which we did not exercise before.
>>
>> I am still debugging this, maybe you have some ideas as well. If the
>> debug takes too long I recommend submitting a revert until the issue is
>> properly resolved.
>
>
> I'm looking into it. I can reproduce it indeed and when disabling GRO the issue goes away.

The below fixes it. The problem is that because gso_segs is not set,
NAPI_GRO_CB()->count is 0 when processing the packets.
So, if we have a non-LRO'ed packet followed by an LRO'ed packet being
processed, the first one will have NAPI_GRO_CB()->count set to 1 the
next one to 0 (in dev_gro_receive()).

This means we end up in gro_complete() with count == 1 and thus don't
call inet_gro_complete().

I'm still unclear why this only fails much later then when checking
the checksum, but I'm sure the below diff fixes it (and also gets rid
of all packet-loss, so throughput goes up)

Will submit a proper patch tomorrow.


----
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 7462514c7f3d..da3e340c99b7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1567,6 +1567,7 @@ static inline void mlx5e_build_rx_skb(struct
mlx5_cqe64 *cqe,
  unsigned int hdrlen = mlx5e_lro_update_hdr(skb, cqe, cqe_bcnt);

  skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt - hdrlen, lro_num_seg);
+ skb_shinfo(skb)->gso_segs = lro_num_seg;
  /* Subtract one since we already counted this as one
  * "regular" packet in mlx5e_complete_rx_cqe()
  */

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ