lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b28b0e3e-87e4-5a02-c172-2d1424405a5a@redhat.com>
Date:   Thu, 15 Jun 2023 16:00:13 +0200
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Liang Chen <liangchen.linux@...il.com>
Cc:     brouer@...hat.com, hawk@...nel.org, ilias.apalodimas@...aro.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com
Subject: Re: [PATCH net-next] page pool: not return page to alloc cache during
 pool destruction



On 15/06/2023 06.20, Jakub Kicinski wrote:
> On Thu, 15 Jun 2023 09:36:45 +0800 Liang Chen wrote:
>> When destroying a page pool, the alloc cache and recycle ring are emptied.
>> If there are inflight pages, the retry process will periodically check the
>> recycle ring for recently returned pages, but not the alloc cache (alloc
>> cache is only emptied once). As a result, any pages returned to the alloc
>> cache after the page pool destruction will be stuck there and cause the
>> retry process to continuously look for inflight pages and report warnings.
>>
>> To safeguard against this situation, any pages returning to the alloc cache
>> after pool destruction should be prevented.
> 
> Let's hear from the page pool maintainers but I think the driver
> is supposed to prevent allocations while pool is getting destroyed.

Yes, this is a driver API violation. Direct returns (allow_direct) can
only happen from drivers RX path, e.g while driver is active processing
packets (in NAPI).  When driver is shutting down a page_pool, it MUST
have stopped RX path and NAPI (napi_disable()) before calling
page_pool_destroy()  Thus, this situation cannot happen and if it does
it is a driver bug.

> Perhaps we can add DEBUG_NET_WARN_ON_ONCE() for this condition to
> prevent wasting cycles in production builds?
> 

For this page_pool code path ("allow_direct") it is extremely important
we avoid wasting cycles in production.  As this is used for XDP_DROP
use-cases for 100Gbit/s NICs.

At 100Gbit/s with 64 bytes Ethernet frames (84 on wire), the wirespeed
is 148.8Mpps which gives CPU 6.72 nanosec to process each packet.
The microbench[1] shows (below signature) that page_pool_alloc_pages() +
page_pool_recycle_direct() cost 4.041 ns (or 14 cycles(tsc)).
Thus, for this code fast-path every cycle counts.

In practice PCIe transactions/sec seems limit total system to 108Mpps
(with multiple RX-queues + descriptor compression) thus 9.26 nanosec to
process each packet. Individual hardware RX queues seems be limited to
around 36Mpps thus 27.77 nanosec to process each packet.

Adding a DEBUG_NET_WARN_ON_ONCE will be annoying as I like to run my
testlab kernels with CONFIG_DEBUG_NET, which will change this extreme
fash-path slightly (adding some unlikely's affecting code layout to the
mix).

Question to Liang Chen: Did you hit this bug in practice?

--Jesper

CPU E5-1650 v4 @ 3.60GHz
  tasklet_page_pool01_fast_path Per elem:  14 cycles(tsc)  4.041 ns
  tasklet_page_pool02_ptr_ring  Per elem:  49 cycles(tsc) 13.622 ns
  tasklet_page_pool03_slow      Per elem: 162 cycles(tsc) 45.198 ns

[1] 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ