lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <adb592ee-2f2f-4208-bd4d-0c41bb319ff1@nvidia.com>
Date: Wed, 11 Feb 2026 18:12:05 +0100
From: Dragos Tatulea <dtatulea@...dia.com>
To: Amery Hung <ameryhung@...il.com>, Christoph Paasch <cpaasch@...nai.com>
Cc: Gal Pressman <gal@...dia.com>, Saeed Mahameed <saeedm@...dia.com>,
 Tariq Toukan <tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>,
 Leon Romanovsky <leon@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 John Fastabend <john.fastabend@...il.com>,
 Stanislav Fomichev <sdf@...ichev.me>, netdev@...r.kernel.org,
 linux-rdma@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH net-next v5 2/2] net/mlx5: Avoid copying payload to the
 skb's linear part

Hi Christoph,

On 10.09.25 21:09, Amery Hung wrote:
> On Wed, Sep 10, 2025 at 1:36 PM Christoph Paasch <cpaasch@...nai.com> wrote:

[...]

>>>>>>> This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
>>>>>>> LRO enabled):
>>>>>>>
>>>>>>> BEFORE:
>>>>>>> =======
>>>>>>> (netserver pinned to core receiving interrupts)
>>>>>>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>>>>>>>  87380  16384 262144    60.01    32547.82
>>>>>>>
>>>>>>> (netserver pinned to adjacent core receiving interrupts)
>>>>>>> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>>>>>>>  87380  16384 262144    60.00    52531.67
>>>>>>>
>>>>>>> AFTER:
>>>>>>> ======
>>>>>>> (netserver pinned to core receiving interrupts)
>>>>>>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>>>>>>>  87380  16384 262144    60.00    52896.06
>>>>>>>
>>>>>>> (netserver pinned to adjacent core receiving interrupts)
>>>>>>>  $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>>>>>>>  87380  16384 262144    60.00    85094.90
>>>>>>>
We still want this nice optimization. Seems like last discussion was
stuck on reading the header length for XDP. See below for sugession.

[...]
>>>>>>> @@ -2123,6 +2125,9 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>>>>>>>                                 pagep->frags++;
>>>>>>>                         while (++pagep < frag_page);
>>>>>>>                 }
>>>>>>> +
>>>>>>> +               headlen = eth_get_headlen(rq->netdev, mxbuf->xdp.data, headlen);
>>>>>>> +
>>>>>>
>>>>>> The size of mxbuf->xdp.data is most likely not headlen here.
>>>>>>
>>>>>> The driver currently generates a xdp_buff with empty linear data, pass
>>>>>> it to the xdp program and assumes the layout If the xdp program does
>>>>>> not change the layout of the xdp_buff through bpf_xdp_adjust_head() or
>>>>>> bpf_xdp_adjust_tail(). The assumption is not correct and I am working
>>>>>> on a fix. But, if we keep that assumption for now, mxbuf->xdp.data
>>>>>> will not contain any headers or payload. The thing that you try to do
>>>>>> probably should be:
>>>>>>
>>>>>>         skb_frag_t *frag = &sinfo->frags[0];
>>>>>>
>>>>>>         headlen = eth_get_headlen(rq->netdev, skb_frag_address(frag),
>>>>>> skb_frag_size(frag));
>>>>
>>>> So, when I look at the headlen I get, it is correct (even with my old
>>>> code using mxbuf->xdp.data).
>>>>
>>>> To make sure I test the right thing, which scenario would
>>>> mxbuf->xdp.data not contain any headers or payload ? What do I need to
>>>> do to reproduce that ?
>>>
>>> A quick look at the code, could it be that
>>> skb_flow_dissect_flow_keys_basic() returns false so that
>>> eth_get_headlen() always returns sizeof(*eth)?
>>
>> No, the headlen values were correct (meaning, it was the actual length
>> of the headers):
>>
> 
> Another possibility is that mxbuf->xdp is reused and not zeroed
> between calls to mlx5e_skb_from_cqe_mpwrq_nonlinear(). The stale
> headers might have been written to mxbuf->xdp.data before the XDP is
> attached.
> 
> I am not sure what exactly happens, but my main question remains. when
> the XDP program is attached and does nothing, the linear data will be
> empty, so what is eth_get_headlen() parsing here...?
> 

AFAIU there could be 2 cases here:

1) XDP program doesn't modify the XDP buffer so the header is in the
  first frag,
2) XDP program modifies the XDP buffer and pulls some len bytes into the
  linear part.

So how about always reading headlen from first frag *before* the XDP
program execution. And afterwards calculate the new headlen for
__pskb_pull_tail() based on previous headlen - len? Like this (untested):

```
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2159,8 +2159,12 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
        if (prog) {
                u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
+               skb_frag_t *frag = &sinfo->frags[0];
                u32 len;
 
+               headlen = eth_get_headlen(rq->netdev, skb_frag_address(frag),
+                                         skb_frag_size(frag));
+
                if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
                        if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT,
                                                 rq->flags)) {
@@ -2208,8 +2212,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
                                pagep->frags++;
                        while (++pagep < frag_page);
 
-                       headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
-                                       skb->data_len);
+                       headlen = min_t(u16, headlen - len, skb->data_len);
                        __pskb_pull_tail(skb, headlen);
                }
        } else {
```

Amery, do you agree?

Would you like to re-send the patch after the merge window closes?
If you are busy we can also pick it up ourselves, do the required
modifications and send it keeping your authorship and the existing tags.

Thanks,
Dragos



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ