lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 16 Jun 2023 11:07:12 +0800
From: Liang Chen <liangchen.linux@...il.com>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: Jakub Kicinski <kuba@...nel.org>, brouer@...hat.com, hawk@...nel.org, 
	ilias.apalodimas@...aro.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com, 
	pabeni@...hat.com
Subject: Re: [PATCH net-next] page pool: not return page to alloc cache during
 pool destruction

On Thu, Jun 15, 2023 at 10:00 PM Jesper Dangaard Brouer
<jbrouer@...hat.com> wrote:
>
>
>
> On 15/06/2023 06.20, Jakub Kicinski wrote:
> > On Thu, 15 Jun 2023 09:36:45 +0800 Liang Chen wrote:
> >> When destroying a page pool, the alloc cache and recycle ring are emptied.
> >> If there are inflight pages, the retry process will periodically check the
> >> recycle ring for recently returned pages, but not the alloc cache (alloc
> >> cache is only emptied once). As a result, any pages returned to the alloc
> >> cache after the page pool destruction will be stuck there and cause the
> >> retry process to continuously look for inflight pages and report warnings.
> >>
> >> To safeguard against this situation, any pages returning to the alloc cache
> >> after pool destruction should be prevented.
> >
> > Let's hear from the page pool maintainers but I think the driver
> > is supposed to prevent allocations while pool is getting destroyed.
>
> Yes, this is a driver API violation. Direct returns (allow_direct) can
> only happen from drivers RX path, e.g while driver is active processing
> packets (in NAPI).  When driver is shutting down a page_pool, it MUST
> have stopped RX path and NAPI (napi_disable()) before calling
> page_pool_destroy()  Thus, this situation cannot happen and if it does
> it is a driver bug.
>
> > Perhaps we can add DEBUG_NET_WARN_ON_ONCE() for this condition to
> > prevent wasting cycles in production builds?
> >
>
> For this page_pool code path ("allow_direct") it is extremely important
> we avoid wasting cycles in production.  As this is used for XDP_DROP
> use-cases for 100Gbit/s NICs.
>
> At 100Gbit/s with 64 bytes Ethernet frames (84 on wire), the wirespeed
> is 148.8Mpps which gives CPU 6.72 nanosec to process each packet.
> The microbench[1] shows (below signature) that page_pool_alloc_pages() +
> page_pool_recycle_direct() cost 4.041 ns (or 14 cycles(tsc)).
> Thus, for this code fast-path every cycle counts.
>
> In practice PCIe transactions/sec seems limit total system to 108Mpps
> (with multiple RX-queues + descriptor compression) thus 9.26 nanosec to
> process each packet. Individual hardware RX queues seems be limited to
> around 36Mpps thus 27.77 nanosec to process each packet.
>
> Adding a DEBUG_NET_WARN_ON_ONCE will be annoying as I like to run my
> testlab kernels with CONFIG_DEBUG_NET, which will change this extreme
> fash-path slightly (adding some unlikely's affecting code layout to the
> mix).
>
> Question to Liang Chen: Did you hit this bug in practice?
>
> --Jesper
>

Yeah, we hit this problem while implementing page pool support for
virtio_net driver, where we only enable page pool for xdp path, i.e.
turning on/off page pool when xdp is enabled/disabled. The problem
turns up when the xdp program is uninstalled, and there are still
inflight page pool page buffers. Then napi is enabled again, the
driver starts to process those inflight page pool buffers. So we will
need to be aware of the state of the page pool (if it is being
destructed) while returning the pages back. That's what motivated us
to add this check to __page_pool_put_page.

Thanks,
Liang



> CPU E5-1650 v4 @ 3.60GHz
>   tasklet_page_pool01_fast_path Per elem:  14 cycles(tsc)  4.041 ns
>   tasklet_page_pool02_ptr_ring  Per elem:  49 cycles(tsc) 13.622 ns
>   tasklet_page_pool03_slow      Per elem: 162 cycles(tsc) 45.198 ns
>
> [1]
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ