lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c433e59d-1262-656e-b08c-30adfc12edcc@gmail.com>
Date:   Wed, 12 Apr 2023 09:41:16 +0300
From:   Tariq Toukan <ttoukan.linux@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc:     netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        hawk@...nel.org, ilias.apalodimas@...aro.org,
        linyunsheng@...wei.com, Dragos Tatulea <dtatulea@...dia.com>
Subject: Re: [PATCH net-next 0/3] page_pool: allow caching from safely
 localized NAPI

Hi,

Happy holidays! I was traveling and couldn't review the RFCs.

Dragos and I had this task in our plans after our recent page pool work 
in mlx5e. We're so glad to see this patchset! :)

On 11/04/2023 23:17, Jakub Kicinski wrote:
> I went back to the explicit "are we in NAPI method", mostly
> because I don't like having both around :( (even tho I maintain
> that in_softirq() && !in_hardirq() is as safe, as softirqs do
> not nest).
> 
> Still returning the skbs to a CPU, tho, not to the NAPI instance.
> I reckon we could create a small refcounted struct per NAPI instance
> which would allow sockets and other users so hold a persisent
> and safe reference. But that's a bigger change, and I get 90+%
> recycling thru the cache with just these patches (for RR and
> streaming tests with 100% CPU use it's almost 100%).
> 
> Some numbers for streaming test with 100% CPU use (from previous version,
> but really they perform the same):
> 
> 		HW-GRO				page=page
> 		before		after		before		after
> recycle:
> cached:			0	138669686		0	150197505
> cache_full:		0	   223391		0	    74582
> ring:		138551933         9997191	149299454		0
> ring_full: 		0             488	     3154	   127590
> released_refcnt:	0		0		0		0
> 

Impressive.

Dragos tested your RFC v1.
He can test this one as well, expecting the same effect.

> alloc:
> fast:		136491361	148615710	146969587	150322859
> slow:		     1772	     1799	      144	      105
> slow_high_order:	0		0		0		0
> empty:		     1772	     1799	      144	      105
> refill:		  2165245	   156302	  2332880	     2128
> waive:			0		0		0		0
> 

General note:
For fragmented page-pool pages, the decision to whether go though the 
cache or the ring depends only on the last releasing thread and its context.

Our in-driver deferred release trick is in many cases beneficial even in 
the presence of this idea. The sets of cases improved by each idea 
intersect, but are not completely identical.
That's why we decided to go with both solutions working together, and 
not only one.

> v1:
>   - rename the arg in_normal_napi -> napi_safe
>   - also allow recycling in __kfree_skb_defer()
> rfcv2: https://lore.kernel.org/all/20230405232100.103392-1-kuba@kernel.org/
> 
> Jakub Kicinski (3):
>    net: skb: plumb napi state thru skb freeing paths
>    page_pool: allow caching from safely localized NAPI
>    bnxt: hook NAPIs to page pools
> 
>   Documentation/networking/page_pool.rst    |  1 +
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c |  1 +
>   include/linux/netdevice.h                 |  3 ++
>   include/linux/skbuff.h                    | 20 +++++++----
>   include/net/page_pool.h                   |  3 +-
>   net/core/dev.c                            |  3 ++
>   net/core/page_pool.c                      | 15 ++++++--
>   net/core/skbuff.c                         | 42 ++++++++++++-----------
>   8 files changed, 58 insertions(+), 30 deletions(-)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ