linux-kernel - Re: [PATCH net-next 01/16] libeth: convert to netmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <049ed5bc-5ee8-4fb3-944f-bd2a2116ba21@intel.com>
Date: Tue, 11 Mar 2025 18:22:06 +0100
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Mina Almasry <almasrymina@...gle.com>
CC: <intel-wired-lan@...ts.osuosl.org>, Michal Kubiak
	<michal.kubiak@...el.com>, Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
	Tony Nguyen <anthony.l.nguyen@...el.com>, Przemek Kitszel
	<przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, "Jakub
 Kicinski" <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, "Alexei
 Starovoitov" <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
	"Jesper Dangaard Brouer" <hawk@...nel.org>, John Fastabend
	<john.fastabend@...il.com>, Simon Horman <horms@...nel.org>,
	<bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 01/16] libeth: convert to netmem

From: Mina Almasry <almasrymina@...gle.com>
Date: Wed, 5 Mar 2025 16:13:32 -0800

> On Wed, Mar 5, 2025 at 8:23 AM Alexander Lobakin
> <aleksander.lobakin@...el.com> wrote:
>>
>> Back when the libeth Rx core was initially written, devmem was a draft
>> and netmem_ref didn't exist in the mainline. Now that it's here, make
>> libeth MP-agnostic before introducing any new code or any new library
>> users.

[...]

>>         /* Very rare, but possible case. The most common reason:
>>          * the last fragment contained FCS only, which was then
>>          * stripped by the HW.
>>          */
>>         if (unlikely(!len)) {
>> -               libeth_rx_recycle_slow(page);
>> +               libeth_rx_recycle_slow(netmem);
> 
> I think before this patch this would have expanded to:
> 
> page_pool_put_full_page(pool, page, true);
> 
> But now I think it expands to:
> 
> page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false);
> 
> Is the switch from true to false intentional? Is this a slow path so
> it doesn't matter?

Intentional. unlikely() means it's slowpath already. libeth_rx_recycle()
is inline, while _slow() is not. I don't want slowpath to be inlined.
While I was originally writing the code changed here, I didn't pay much
attention to that, but since then I altered my approach and now try to
put anything slow out of line to not waste object code.

Also, some time ago I changed PP's approach to decide whether it can
recycle buffers directly or not. Previously, if that `allow_direct` was
false, it was always falling back to ptr_ring, but now if `allow_direct`
is false, it still checks whether it can be recycled directly.

[...]

>> @@ -3122,16 +3122,20 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
>>                              struct libeth_fqe *buf, u32 data_len)
>>  {
>>         u32 copy = data_len <= L1_CACHE_BYTES ? data_len : ETH_HLEN;
>> +       struct page *hdr_page, *buf_page;
>>         const void *src;
>>         void *dst;
>>
>> -       if (!libeth_rx_sync_for_cpu(buf, copy))
>> +       if (unlikely(netmem_is_net_iov(buf->netmem)) ||
>> +           !libeth_rx_sync_for_cpu(buf, copy))
>>                 return 0;
>>
> 
> I could not immediately understand why you need a netmem_is_net_iov
> check here. libeth_rx_sync_for_cpu will delegate to
> page_pool_dma_sync_netmem_for_cpu which should do the right thing
> regardless of whether the netmem is a page or net_iov, right? Is this
> to save some cycles?

If the payload buffer is net_iov, the kernel doesn't have access to it.
This means, this W/A can't be performed (see memcpy() below the check).
That's why I exit early explicitly.
libeth_rx_sync_for_cpu() returns false only if the size is zero.

netmem_is_net_iov() is under unlikely() here, because when using devmem,
you explicitly configure flow steering, so that only TCP/UDP/whatever
frames will land on this queue. Such frames are split correctly by
idpf's HW.
I need this WA because let's say unfortunately this HW places the whole
frame to the payload buffer when it's not TCP/UDP/... (see the comment
above this function).
For example, it even does so for ICMP, although HW is fully aware of the
ICMP format. If I was a HW designer of this NIC, I'd instead try putting
the whole frame to the header buffer, not the payload one. And in
general, do header split for all known packet types, not just TCP/UDP/..
But meh... A different story.

> 
> --
> Thanks,
> Mina

Thanks!
Olek