lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200121170945.41e58f32@carbon>
Date:   Tue, 21 Jan 2020 17:09:45 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>
Cc:     brouer@...hat.com, Saeed Mahameed <saeedm@...lanox.com>,
        Matteo Croce <mcroce@...hat.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Created benchmarks modules for page_pool

Hi Ilias and Lorenzo, (Cc others + netdev)

I've created two benchmarks modules for page_pool.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
[2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_cross_cpu.c

I think we/you could actually use this as part of your presentation[3]?

The first benchmark[1] illustrate/measure what happen when page_pool
alloc and free/return happens on the same CPU.  Here there are 3 modes
of operations with different performance characteristic.

Fast_path NAPI recycle (XDP_DROP use-case)
 - cost per elem: 15 cycles(tsc) 4.437 ns

Recycle via ptr_ring
 - cost per elem: 48 cycles(tsc) 13.439 ns

Failed recycle, return to page-allocator
 - cost per elem: 256 cycles(tsc) 71.169 ns


The second benchmark[2] measures what happens cross-CPU.  It is
primarily the concurrent return-path that I want to capture. As this
is page_pool's weak spot, that we/I need to improve performance of.
Hint when SKBs use page_pool return this will happen more often.
It is a little more tricky to get proper measurement as we want to
observe the case, where return-path isn't stalling/waiting on pages to
return.

- 1 CPU returning  , cost per elem: 110 cycles(tsc)   30.709 ns
- 2 concurrent CPUs, cost per elem: 989 cycles(tsc)  274.861 ns
- 3 concurrent CPUs, cost per elem: 2089 cycles(tsc) 580.530 ns
- 4 concurrent CPUs, cost per elem: 2339 cycles(tsc) 649.984 ns

[3] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ