[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQLSMvk3uuzTCjqQKXs6hbZH9-_XeYo2Uvu2uHAiYrnkog@mail.gmail.com>
Date: Wed, 14 May 2025 17:26:22 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Tariq Toukan <tariqt@...dia.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
Andrew Lunn <andrew+netdev@...n.ch>, Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>, John Fastabend <john.fastabend@...il.com>,
Network Development <netdev@...r.kernel.org>, linux-rdma@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Moshe Shemesh <moshe@...dia.com>, Mark Bloch <mbloch@...dia.com>, Gal Pressman <gal@...dia.com>,
Carolina Jubran <cjubran@...dia.com>
Subject: Re: [PATCH net-next] net/mlx5e: Reuse per-RQ XDP buffer to avoid
stack zeroing overhead
On Wed, May 14, 2025 at 1:04 PM Tariq Toukan <tariqt@...dia.com> wrote:
>
> From: Carolina Jubran <cjubran@...dia.com>
>
> CONFIG_INIT_STACK_ALL_ZERO introduces a performance cost by
> zero-initializing all stack variables on function entry. The mlx5 XDP
> RX path previously allocated a struct mlx5e_xdp_buff on the stack per
> received CQE, resulting in measurable performance degradation under
> this config.
>
> This patch reuses a mlx5e_xdp_buff stored in the mlx5e_rq struct,
> avoiding per-CQE stack allocations and repeated zeroing.
>
> With this change, XDP_DROP and XDP_TX performance matches that of
> kernels built without CONFIG_INIT_STACK_ALL_ZERO.
>
> Performance was measured on a ConnectX-6Dx using a single RX channel
> (1 CPU at 100% usage) at ~50 Mpps. The baseline results were taken from
> net-next-6.15.
>
> Stack zeroing disabled:
> - XDP_DROP:
> * baseline: 31.47 Mpps
> * baseline + per-RQ allocation: 32.31 Mpps (+2.68%)
>
> - XDP_TX:
> * baseline: 12.41 Mpps
> * baseline + per-RQ allocation: 12.95 Mpps (+4.30%)
Looks good, but where are these gains coming from ?
The patch just moves mxbuf from stack to rq.
The number of operations should really be the same.
> Stack zeroing enabled:
> - XDP_DROP:
> * baseline: 24.32 Mpps
> * baseline + per-RQ allocation: 32.27 Mpps (+32.7%)
This part makes sense.
Powered by blists - more mailing lists