[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200122130932.0209cb27@carbon>
Date: Wed, 22 Jan 2020 13:09:32 +0100
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Ilias Apalodimas <ilias.apalodimas@...aro.org>
Cc: Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
Saeed Mahameed <saeedm@...lanox.com>,
Matteo Croce <mcroce@...hat.com>,
Tariq Toukan <tariqt@...lanox.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
brouer@...hat.com
Subject: Re: Created benchmarks modules for page_pool
On Wed, 22 Jan 2020 12:42:05 +0200
Ilias Apalodimas <ilias.apalodimas@...aro.org> wrote:
> Hi Jesper,
>
> On Tue, Jan 21, 2020 at 05:09:45PM +0100, Jesper Dangaard Brouer wrote:
> > Hi Ilias and Lorenzo, (Cc others + netdev)
> >
> > I've created two benchmarks modules for page_pool.
> >
> > [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
> > [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_cross_cpu.c
> >
> > I think we/you could actually use this as part of your presentation[3]?
>
> I think we can mention this as part of the improvements we can offer,
> alongside with native SKB recycling.
Yes, but you should notice that the cross CPU return benchmark test
show that we/page_pool is too slow...
> >
> > The first benchmark[1] illustrate/measure what happen when page_pool
> > alloc and free/return happens on the same CPU. Here there are 3
> > modes of operations with different performance characteristic.
> >
> > Fast_path NAPI recycle (XDP_DROP use-case)
> > - cost per elem: 15 cycles(tsc) 4.437 ns
> >
> > Recycle via ptr_ring
> > - cost per elem: 48 cycles(tsc) 13.439 ns
> >
> > Failed recycle, return to page-allocator
> > - cost per elem: 256 cycles(tsc) 71.169 ns
> >
> >
> > The second benchmark[2] measures what happens cross-CPU. It is
> > primarily the concurrent return-path that I want to capture. As this
> > is page_pool's weak spot, that we/I need to improve performance of.
> > Hint when SKBs use page_pool return this will happen more often.
> > It is a little more tricky to get proper measurement as we want to
> > observe the case, where return-path isn't stalling/waiting on pages
> > to return.
> >
> > - 1 CPU returning , cost per elem: 110 cycles(tsc) 30.709 ns
> > - 2 concurrent CPUs, cost per elem: 989 cycles(tsc) 274.861 ns
> > - 3 concurrent CPUs, cost per elem: 2089 cycles(tsc) 580.530 ns
> > - 4 concurrent CPUs, cost per elem: 2339 cycles(tsc) 649.984 ns
Add a small bug, thus re-run of cross_cpu bench numbers:
- 2 concurrent CPUs, cost per elem: 462 cycles(tsc) 128.502 ns
- 3 concurrent CPUs, cost per elem: 1992 cycles(tsc) 553.507 ns
- 4 concurrent CPUs, cost per elem: 2323 cycles(tsc) 645.389 ns
> Interesting, i'll try having a look at the code and maybe run then on
> my armv8 board.
That will be great, but we/you have to fixup the Intel specific ASM
instructions in time_bench.c (which we already discussed on IRC).
> >
> > [3] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists