linux-kernel - Re: [PATCH iwl-next 11/12] idpf: convert header split mode to libeth + napi_build

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <667458e38c879_2b190d294f5@willemb.c.googlers.com.notmuch>
Date: Thu, 20 Jun 2024 12:29:23 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: intel-wired-lan@...ts.osuosl.org, 
 Tony Nguyen <anthony.l.nguyen@...el.com>, 
 "David S. Miller" <davem@...emloft.net>, 
 Eric Dumazet <edumazet@...gle.com>, 
 Jakub Kicinski <kuba@...nel.org>, 
 Paolo Abeni <pabeni@...hat.com>, 
 Mina Almasry <almasrymina@...gle.com>, 
 nex.sw.ncis.osdt.itp.upstreaming@...el.com, 
 netdev@...r.kernel.org, 
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH iwl-next 11/12] idpf: convert header split mode to libeth
 + napi_build_skb()

Alexander Lobakin wrote:
> From: Willem De Bruijn <willemdebruijn.kernel@...il.com>
> Date: Mon, 17 Jun 2024 14:13:07 -0400
> 
> > Alexander Lobakin wrote:
> >> From: Willem De Bruijn <willemdebruijn.kernel@...il.com>
> >> Date: Thu, 30 May 2024 09:46:46 -0400
> >>
> >>> Alexander Lobakin wrote:
> >>>> Currently, idpf uses the following model for the header buffers:
> >>>>
> >>>> * buffers are allocated via dma_alloc_coherent();
> >>>> * when receiving, napi_alloc_skb() is called and then the header is
> >>>>   copied to the newly allocated linear part.
> >>>>
> >>>> This is far from optimal as DMA coherent zone is slow on many systems
> >>>> and memcpy() neutralizes the idea and benefits of the header split. 
> >>>
> >>> In the previous revision this assertion was called out, as we have
> >>> lots of experience with the existing implementation and a previous one
> >>> based on dynamic allocation one that performed much worse. You would
> >>
> >> napi_build_skb() is not a dynamic allocation. In contrary,
> >> napi_alloc_skb() from the current implementation actually *is* a dynamic
> >> allocation. It allocates a page frag for every header buffer each time.
> >>
> >> Page Pool refills header buffers from its pool of recycled frags.
> >> Plus, on x86_64, truesize of a header buffer is 1024, meaning it picks
> >> a new page from the pool every 4th buffer. During the testing of common
> >> workloads, I had literally zero new page allocations, as the skb core
> >> recycles frags from skbs back to the pool.
> >>
> >> IOW, the current version you're defending actually performs more dynamic
> >> allocations on hotpath than this one ¯\_(ツ)_/¯
> >>
> >> (I explained all this several times already)
> >>
> >>> share performance numbers in the next revision
> >>
> >> I can't share numbers in the outside, only percents.
> >>
> >> I shared before/after % in the cover letter. Every test yielded more
> >> Mpps after this change, esp. non-XDP_PASS ones when you don't have
> >> networking stack overhead.
> > 
> > This is the main concern: AF_XDP has no existing users, but TCP/IP is
> > used in production environments. So we cannot risk TCP/IP regressions
> > in favor of somewhat faster AF_XDP. Secondary is that a functional
> > implementation of AF_XDP soon with optimizations later is preferable
> > over the fastest solution later.
> 
> I have perf numbers before-after for all the common workloads and I see
> only improvements there.

Good. That was the request. Not only from me, to remind.

> Do you have any to prove that this change
> introduces regressions?

I have no data yet. We can run some tests on your github series too.