netdev - Re: Created benchmarks modules for page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200122104205.GA569175@apalos.home>
Date:   Wed, 22 Jan 2020 12:42:05 +0200
From:   Ilias Apalodimas <ilias.apalodimas@...aro.org>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Matteo Croce <mcroce@...hat.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Created benchmarks modules for page_pool

Hi Jesper, 

On Tue, Jan 21, 2020 at 05:09:45PM +0100, Jesper Dangaard Brouer wrote:
> Hi Ilias and Lorenzo, (Cc others + netdev)
> 
> I've created two benchmarks modules for page_pool.
> 
> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
> [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_cross_cpu.c
> 
> I think we/you could actually use this as part of your presentation[3]?

I think we can mention this as part of the improvements we can offer, alongside
with native SKB recycling.

> 
> The first benchmark[1] illustrate/measure what happen when page_pool
> alloc and free/return happens on the same CPU.  Here there are 3 modes
> of operations with different performance characteristic.
> 
> Fast_path NAPI recycle (XDP_DROP use-case)
>  - cost per elem: 15 cycles(tsc) 4.437 ns
> 
> Recycle via ptr_ring
>  - cost per elem: 48 cycles(tsc) 13.439 ns
> 
> Failed recycle, return to page-allocator
>  - cost per elem: 256 cycles(tsc) 71.169 ns
> 
> 
> The second benchmark[2] measures what happens cross-CPU.  It is
> primarily the concurrent return-path that I want to capture. As this
> is page_pool's weak spot, that we/I need to improve performance of.
> Hint when SKBs use page_pool return this will happen more often.
> It is a little more tricky to get proper measurement as we want to
> observe the case, where return-path isn't stalling/waiting on pages to
> return.
> 
> - 1 CPU returning  , cost per elem: 110 cycles(tsc)   30.709 ns
> - 2 concurrent CPUs, cost per elem: 989 cycles(tsc)  274.861 ns
> - 3 concurrent CPUs, cost per elem: 2089 cycles(tsc) 580.530 ns
> - 4 concurrent CPUs, cost per elem: 2339 cycles(tsc) 649.984 ns

Interesting, i'll try having a look at the code and maybe run then on my armv8
board.

Thanks!
/Ilias
> 
> [3] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>