netdev - Re: [PATCH net-next 2/2] net/mlx5: Avoid copying payload to the skb's linear part

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADg4-L-J2k3Aj5vyAD2+mnTtcvkwt4J9JX4JSbbHyhuARno+Bg@mail.gmail.com>
Date: Mon, 14 Jul 2025 15:06:32 -0700
From: Christoph Paasch <cpaasch@...nai.com>
To: Gal Pressman <gal@...dia.com>
Cc: Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>, 
	Mark Bloch <mbloch@...dia.com>, Andrew Lunn <andrew+netdev@...n.ch>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, linux-rdma@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH net-next 2/2] net/mlx5: Avoid copying payload to the skb's
 linear part

On Mon, Jul 14, 2025 at 6:59 AM Gal Pressman <gal@...dia.com> wrote:
>
> On 14/07/2025 2:33, Christoph Paasch via B4 Relay wrote:
> > From: Christoph Paasch <cpaasch@...nai.com>
> >
> > mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
> > bytes from the page-pool to the skb's linear part. Those 256 bytes
> > include part of the payload.
> >
> > When attempting to do GRO in skb_gro_receive, if headlen > data_offset
> > (and skb->head_frag is not set), we end up aggregating packets in the
> > frag_list.
> >
> > This is of course not good when we are CPU-limited. Also causes a worse
> > skb->len/truesize ratio,...
> >
> > So, let's avoid copying parts of the payload to the linear part. The
> > goal here is to err on the side of caution and prefer to copy too little
> > instead of copying too much (because once it has been copied over, we
> > trigger the above described behavior in skb_gro_receive).
> >
> > So, we can do a rough estimate of the header-space by looking at
> > cqe_l3/l4_hdr_type and kind of do a lower-bound estimate. This is now
> > done in mlx5e_cqe_get_min_hdr_len(). We always assume that TCP timestamps
> > are present, as that's the most common use-case.
> >
> > That header-len is then used in mlx5e_skb_from_cqe_mpwrq_nonlinear for
> > the headlen (which defines what is being copied over). We still
> > allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking stack
> > needs to call pskb_may_pull() later on, we don't need to reallocate
> > memory.
> >
> > This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
> > LRO enabled):
> >
> > BEFORE:
> > =======
> > (netserver pinned to core receiving interrupts)
> > $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
> >  87380  16384 262144    60.01    32547.82
> >
> > (netserver pinned to adjacent core receiving interrupts)
> > $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
> >  87380  16384 262144    60.00    52531.67
> >
> > AFTER:
> > ======
> > (netserver pinned to core receiving interrupts)
> > $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
> >  87380  16384 262144    60.00    52896.06
> >
> > (netserver pinned to adjacent core receiving interrupts)
> >  $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
> >  87380  16384 262144    60.00    85094.90
> >
> > Signed-off-by: Christoph Paasch <cpaasch@...nai.com>
>
> Cool change, thanks!
>
> This patch doesn't take encapsulated packets into account, where the
> l3/l4 indications apply for the inner packet, while you assume outer.

Yes - my goal really is to avoid copying the inner packet's payload as
that is what will "break" GRO.

Alternatively, if I can extract all the necessary info out of the cqe,
to know the real header-size, I can use that as well.

> Also, for encapsulated packets we will *always* have to pull data into
> the linear part, which might overshadow the improvement you're trying to
> achieve?

Yes, the mlx-driver will end up copying less but later on in the stack
we may have to do the slow-path in pskb_may_pull(). I would hope that
is less of an impact (given the malloc'ed size does not change and
thus we end up just copying bytes we anyways would have copied
previously).
But, let me set up some tunnelling and measure the impact.

Thanks,
Christoph