netdev - Re: [PATCH net-next v2 0/3] page_pool: allow caching from safely localized NAPI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3dd7675fcca2ce4d7ca93643d7507219bd7fa825.camel@nvidia.com>
Date:   Fri, 14 Apr 2023 13:09:30 +0000
From:   Dragos Tatulea <dtatulea@...dia.com>
To:     "kuba@...nel.org" <kuba@...nel.org>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        "alexander.duyck@...il.com" <alexander.duyck@...il.com>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        "linyunsheng@...wei.com" <linyunsheng@...wei.com>,
        "hawk@...nel.org" <hawk@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "ttoukan@...dia.com" <ttoukan@...dia.com>
Subject: Re: [PATCH net-next v2 0/3] page_pool: allow caching from safely
 localized NAPI

On Wed, 2023-04-12 at 21:26 -0700, Jakub Kicinski wrote:
> I went back to the explicit "are we in NAPI method", mostly
> because I don't like having both around :( (even tho I maintain
> that in_softirq() && !in_hardirq() is as safe, as softirqs do
> not nest).
> 
> Still returning the skbs to a CPU, tho, not to the NAPI instance.
> I reckon we could create a small refcounted struct per NAPI instance
> which would allow sockets and other users so hold a persisent
> and safe reference. But that's a bigger change, and I get 90+%
> recycling thru the cache with just these patches (for RR and
> streaming tests with 100% CPU use it's almost 100%).
> 
> Some numbers for streaming test with 100% CPU use (from previous
> version,
> but really they perform the same):
> 
>                 HW-GRO                          page=page
>                 before          after           before          after
> recycle:
> cached:                 0       138669686               0       15019
> 7505
> cache_full:             0          223391               0          
> 74582
> ring:           138551933        
> 9997191       149299454               0
> ring_full:              0             488            3154         
> 127590
> released_refcnt:        0               0               0            
>    0
> 
> alloc:
> fast:           136491361       148615710       146969587       15032
> 2859
> slow:                1772            1799             144            
> 105
> slow_high_order:        0               0               0            
>    0
> empty:               1772            1799             144            
> 105
> refill:           2165245          156302         2332880           
> 2128
> waive:                  0               0               0            
>    0
> 
Enabled this on the mlx5 driver and seeing the following page_pool
cache usage improvements for single stream iperf test case:

- For 1500 MTU and legacy rq, seeing a 20% improvement of cache usage.

- For 9K MTU, seeing 33-40 % page_pool cache usage improvements for
both striding and legacy rq (depending if the app is running on the
same core as the rq or not).

One thing to note is that the page_pool cache seems to get filled more
often for striding rq now which is something that we could potentially
improve on the mlx5 side.

Regarding my earlier comment:

> After enabling this in the mlx5 driver, there is already improved
> page_pool cache usage for our test with the application running on 
> the same CPU with the receive queue NAPI (0 -> 98 % cache usage).

I was testing without the deferred release optimizations that we did in
the mlx5 driver.

> v2:
>  - minor commit message fixes (patch 1)
> v1:
> https://lore.kernel.org/all/20230411201800.596103-1-kuba@kernel.org/
>  - rename the arg in_normal_napi -> napi_safe
>  - also allow recycling in __kfree_skb_defer()
> rfcv2:
> https://lore.kernel.org/all/20230405232100.103392-1-kuba@kernel.org/
> 
> Jakub Kicinski (3):
>   net: skb: plumb napi state thru skb freeing paths
>   page_pool: allow caching from safely localized NAPI
>   bnxt: hook NAPIs to page pools
> 
>  Documentation/networking/page_pool.rst    |  1 +
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |  1 +
>  include/linux/netdevice.h                 |  3 ++
>  include/linux/skbuff.h                    | 20 +++++++----
>  include/net/page_pool.h                   |  3 +-
>  net/core/dev.c                            |  3 ++
>  net/core/page_pool.c                      | 15 ++++++--
>  net/core/skbuff.c                         | 42 ++++++++++++---------
> --
>  8 files changed, 58 insertions(+), 30 deletions(-)
> 

Tested-by: Dragos Tatulea <dtatulea@...dia.com>