linux-kernel - Re: [PATCH net-next] net/mlx5e: Reuse per-RQ XDP buffer to avoid stack zeroing overhead

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <09377c1a-dac5-487d-9fc1-d973b20b04dd@kernel.org>
Date: Fri, 16 May 2025 16:43:54 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Tariq Toukan <ttoukan.linux@...il.com>,
 Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski
 <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 John Fastabend <john.fastabend@...il.com>,
 Network Development <netdev@...r.kernel.org>, linux-rdma@...r.kernel.org,
 LKML <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
 Moshe Shemesh <moshe@...dia.com>, Mark Bloch <mbloch@...dia.com>,
 Gal Pressman <gal@...dia.com>, Carolina Jubran <cjubran@...dia.com>,
 Sebastiano Miano <mianosebastiano@...il.com>,
 Samuel Dobron <sdobron@...hat.com>
Subject: Re: [PATCH net-next] net/mlx5e: Reuse per-RQ XDP buffer to avoid
 stack zeroing overhead

On 16/05/2025 15.47, Tariq Toukan wrote:
> 
> 
> On 15/05/2025 3:26, Alexei Starovoitov wrote:
>> On Wed, May 14, 2025 at 1:04 PM Tariq Toukan <tariqt@...dia.com> wrote:
>>>
>>> From: Carolina Jubran <cjubran@...dia.com>
>>>
>>> CONFIG_INIT_STACK_ALL_ZERO introduces a performance cost by
>>> zero-initializing all stack variables on function entry. The mlx5 XDP
>>> RX path previously allocated a struct mlx5e_xdp_buff on the stack per
>>> received CQE, resulting in measurable performance degradation under
>>> this config.
>>>
>>> This patch reuses a mlx5e_xdp_buff stored in the mlx5e_rq struct,
>>> avoiding per-CQE stack allocations and repeated zeroing.
>>>
>>> With this change, XDP_DROP and XDP_TX performance matches that of
>>> kernels built without CONFIG_INIT_STACK_ALL_ZERO.
>>>
>>> Performance was measured on a ConnectX-6Dx using a single RX channel
>>> (1 CPU at 100% usage) at ~50 Mpps. The baseline results were taken from
>>> net-next-6.15.
>>>
>>> Stack zeroing disabled:
>>> - XDP_DROP:
>>>      * baseline:                     31.47 Mpps
>>>      * baseline + per-RQ allocation: 32.31 Mpps (+2.68%)
>>>

31.47 Mpps = 31.77 nanosec per packet
32.31 Mpps = 30.95 nanosec per packet
Improvement:  0.82 nanosec faster

>>> - XDP_TX:
>>>      * baseline:                     12.41 Mpps
>>>      * baseline + per-RQ allocation: 12.95 Mpps (+4.30%)
>>

The XDP_TX number are actually lower than I expected.
Hmm... I wonder if we regressed here(?)

12.41 Mpps = 80.58 nanosec per packet
12.95 Mpps = 77.22 nanosec per packet
Improvement:  3.36 nanosec faster

>> Looks good, but where are these gains coming from ?
>> The patch just moves mxbuf from stack to rq.
>> The number of operations should really be the same.
>>
> 
> I guess it's cache related. Hot/cold areas, alignments, movement of 
> other fields in the mlx5e_rq structure...

The improvements for XDP_DROP (see calc above) in nanosec is so small
that it is hard to measure accurately/stable on any system.

The improvement for XDP_TX is above 2 nanosec, which looks like an 
actual improvement...

>>> Stack zeroing enabled:
>>> - XDP_DROP:
>>>      * baseline:                     24.32 Mpps
>>>      * baseline + per-RQ allocation: 32.27 Mpps (+32.7%)
>>
>> This part makes sense.
> 

Yes, this makes sense as it is a measurable improvement.

24.32 Mpps = 41.12 nanosec per packet
32.27 Mpps = 30.99 nanosec per packet
Improvement: 10.13 nanosec faster

Acked-by: Jesper Dangaard Brouer <hawk@...nel.org>

--Jesper