netdev - Re: [PATCH net-next 0/3] page_pool: allow caching from safely localized NAPI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c433e59d-1262-656e-b08c-30adfc12edcc@gmail.com>
Date:   Wed, 12 Apr 2023 09:41:16 +0300
From:   Tariq Toukan <ttoukan.linux@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc:     netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        hawk@...nel.org, ilias.apalodimas@...aro.org,
        linyunsheng@...wei.com, Dragos Tatulea <dtatulea@...dia.com>
Subject: Re: [PATCH net-next 0/3] page_pool: allow caching from safely
 localized NAPI

Hi,

Happy holidays! I was traveling and couldn't review the RFCs.

Dragos and I had this task in our plans after our recent page pool work 
in mlx5e. We're so glad to see this patchset! :)

On 11/04/2023 23:17, Jakub Kicinski wrote:
> I went back to the explicit "are we in NAPI method", mostly
> because I don't like having both around :( (even tho I maintain
> that in_softirq() && !in_hardirq() is as safe, as softirqs do
> not nest).
> 
> Still returning the skbs to a CPU, tho, not to the NAPI instance.
> I reckon we could create a small refcounted struct per NAPI instance
> which would allow sockets and other users so hold a persisent
> and safe reference. But that's a bigger change, and I get 90+%
> recycling thru the cache with just these patches (for RR and
> streaming tests with 100% CPU use it's almost 100%).
> 
> Some numbers for streaming test with 100% CPU use (from previous version,
> but really they perform the same):
> 
> 		HW-GRO				page=page
> 		before		after		before		after
> recycle:
> cached:			0	138669686		0	150197505
> cache_full:		0	   223391		0	    74582
> ring:		138551933         9997191	149299454		0
> ring_full: 		0             488	     3154	   127590
> released_refcnt:	0		0		0		0
> 

Impressive.

Dragos tested your RFC v1.
He can test this one as well, expecting the same effect.

> alloc:
> fast:		136491361	148615710	146969587	150322859
> slow:		     1772	     1799	      144	      105
> slow_high_order:	0		0		0		0
> empty:		     1772	     1799	      144	      105
> refill:		  2165245	   156302	  2332880	     2128
> waive:			0		0		0		0
> 

General note:
For fragmented page-pool pages, the decision to whether go though the 
cache or the ring depends only on the last releasing thread and its context.

Our in-driver deferred release trick is in many cases beneficial even in 
the presence of this idea. The sets of cases improved by each idea 
intersect, but are not completely identical.
That's why we decided to go with both solutions working together, and 
not only one.

> v1:
>   - rename the arg in_normal_napi -> napi_safe
>   - also allow recycling in __kfree_skb_defer()
> rfcv2: https://lore.kernel.org/all/20230405232100.103392-1-kuba@kernel.org/
> 
> Jakub Kicinski (3):
>    net: skb: plumb napi state thru skb freeing paths
>    page_pool: allow caching from safely localized NAPI
>    bnxt: hook NAPIs to page pools
> 
>   Documentation/networking/page_pool.rst    |  1 +
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c |  1 +
>   include/linux/netdevice.h                 |  3 ++
>   include/linux/skbuff.h                    | 20 +++++++----
>   include/net/page_pool.h                   |  3 +-
>   net/core/dev.c                            |  3 ++
>   net/core/page_pool.c                      | 15 ++++++--
>   net/core/skbuff.c                         | 42 ++++++++++++-----------
>   8 files changed, 58 insertions(+), 30 deletions(-)
>