netdev - Re: [RFC net-next 1/2] page_pool: allow caching from safely localized NAPI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 4 Apr 2023 08:53:36 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Jakub Kicinski <kuba@...nel.org>, <davem@...emloft.net>
CC:     <netdev@...r.kernel.org>, <edumazet@...gle.com>,
        <pabeni@...hat.com>, <hawk@...nel.org>,
        <ilias.apalodimas@...aro.org>
Subject: Re: [RFC net-next 1/2] page_pool: allow caching from safely localized
 NAPI

On 2023/3/31 12:39, Jakub Kicinski wrote:
> Recent patches to mlx5 mentioned a regression when moving from
> driver local page pool to only using the generic page pool code.
> Page pool has two recycling paths (1) direct one, which runs in
> safe NAPI context (basically consumer context, so producing
> can be lockless); and (2) via a ptr_ring, which takes a spin
> lock because the freeing can happen from any CPU; producer
> and consumer may run concurrently.
> 
> Since the page pool code was added, Eric introduced a revised version
> of deferred skb freeing. TCP skbs are now usually returned to the CPU
> which allocated them, and freed in softirq context. This places the
> freeing (producing of pages back to the pool) enticingly close to
> the allocation (consumer).
> 
> If we can prove that we're freeing in the same softirq context in which
> the consumer NAPI will run - lockless use of the cache is perfectly fine,
> no need for the lock.
> 
> Let drivers link the page pool to a NAPI instance. If the NAPI instance
> is scheduled on the same CPU on which we're freeing - place the pages
> in the direct cache.
> 
> With that and patched bnxt (XDP enabled to engage the page pool, sigh,
> bnxt really needs page pool work :() I see a 2.6% perf boost with
> a TCP stream test (app on a different physical core than softirq).
> 
> The CPU use of relevant functions decreases as expected:
> 
>   page_pool_refill_alloc_cache   1.17% -> 0%
>   _raw_spin_lock                 2.41% -> 0.98%
> 
> Only consider lockless path to be safe when NAPI is scheduled
> - in practice this should cover majority if not all of steady state
> workloads. It's usually the NAPI kicking in that causes the skb flush.

Interesting.
I wonder if we can make this more generic by adding the skb to per napi
list instead of sd->defer_list, so that we can always use NAPI kicking to
flush skb as net_tx_action() done for sd->completion_queue instead of
softirq kicking?

And it seems we know which napi binds to a specific socket through
busypoll mechanism, we can reuse that to release a skb to the napi
bound to that socket?

> 
> The main case we'll miss out on is when application runs on the same
> CPU as NAPI. In that case we don't use the deferred skb free path.
> We could disable softirq one that path, too... maybe?
>