[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izOCgFhNRfJwyfr9hT0hCee3yH3hJ53F7Y29c8wnp-pzyg@mail.gmail.com>
Date: Tue, 11 Mar 2025 10:43:35 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>
Cc: intel-wired-lan@...ts.osuosl.org, Michal Kubiak <michal.kubiak@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>, Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>, Simon Horman <horms@...nel.org>, bpf@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next 01/16] libeth: convert to netmem
On Tue, Mar 11, 2025 at 10:23 AM Alexander Lobakin
<aleksander.lobakin@...el.com> wrote:
>
> From: Mina Almasry <almasrymina@...gle.com>
> Date: Wed, 5 Mar 2025 16:13:32 -0800
>
> > On Wed, Mar 5, 2025 at 8:23 AM Alexander Lobakin
> > <aleksander.lobakin@...el.com> wrote:
> >>
> >> Back when the libeth Rx core was initially written, devmem was a draft
> >> and netmem_ref didn't exist in the mainline. Now that it's here, make
> >> libeth MP-agnostic before introducing any new code or any new library
> >> users.
>
> [...]
>
> >> /* Very rare, but possible case. The most common reason:
> >> * the last fragment contained FCS only, which was then
> >> * stripped by the HW.
> >> */
> >> if (unlikely(!len)) {
> >> - libeth_rx_recycle_slow(page);
> >> + libeth_rx_recycle_slow(netmem);
> >
> > I think before this patch this would have expanded to:
> >
> > page_pool_put_full_page(pool, page, true);
> >
> > But now I think it expands to:
> >
> > page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false);
> >
> > Is the switch from true to false intentional? Is this a slow path so
> > it doesn't matter?
>
> Intentional. unlikely() means it's slowpath already. libeth_rx_recycle()
> is inline, while _slow() is not. I don't want slowpath to be inlined.
> While I was originally writing the code changed here, I didn't pay much
> attention to that, but since then I altered my approach and now try to
> put anything slow out of line to not waste object code.
>
> Also, some time ago I changed PP's approach to decide whether it can
> recycle buffers directly or not. Previously, if that `allow_direct` was
> false, it was always falling back to ptr_ring, but now if `allow_direct`
> is false, it still checks whether it can be recycled directly.
>
Thanks, yes I forgot about that.
> [...]
>
> >> @@ -3122,16 +3122,20 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
> >> struct libeth_fqe *buf, u32 data_len)
> >> {
> >> u32 copy = data_len <= L1_CACHE_BYTES ? data_len : ETH_HLEN;
> >> + struct page *hdr_page, *buf_page;
> >> const void *src;
> >> void *dst;
> >>
> >> - if (!libeth_rx_sync_for_cpu(buf, copy))
> >> + if (unlikely(netmem_is_net_iov(buf->netmem)) ||
> >> + !libeth_rx_sync_for_cpu(buf, copy))
> >> return 0;
> >>
> >
> > I could not immediately understand why you need a netmem_is_net_iov
> > check here. libeth_rx_sync_for_cpu will delegate to
> > page_pool_dma_sync_netmem_for_cpu which should do the right thing
> > regardless of whether the netmem is a page or net_iov, right? Is this
> > to save some cycles?
>
> If the payload buffer is net_iov, the kernel doesn't have access to it.
> This means, this W/A can't be performed (see memcpy() below the check).
> That's why I exit early explicitly.
> libeth_rx_sync_for_cpu() returns false only if the size is zero.
>
> netmem_is_net_iov() is under unlikely() here, because when using devmem,
> you explicitly configure flow steering, so that only TCP/UDP/whatever
> frames will land on this queue. Such frames are split correctly by
> idpf's HW.
> I need this WA because let's say unfortunately this HW places the whole
> frame to the payload buffer when it's not TCP/UDP/... (see the comment
> above this function).
> For example, it even does so for ICMP, although HW is fully aware of the
> ICMP format. If I was a HW designer of this NIC, I'd instead try putting
> the whole frame to the header buffer, not the payload one. And in
> general, do header split for all known packet types, not just TCP/UDP/..
> But meh... A different story.
>
Makes sense. FWIW:
Reviewed-by: Mina Almasry <almasrymina@...gle.com>
--
Thanks,
Mina
Powered by blists - more mailing lists