[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fda4dfdc-7ef8-4c75-8d29-a33a621fd703@kernel.org>
Date: Tue, 10 Sep 2024 13:27:02 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Yunsheng Lin <linyunsheng@...wei.com>,
Mina Almasry <almasrymina@...gle.com>
Cc: ilias.apalodimas@...aro.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Shuah Khan <shuah@...nel.org>,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH net-next] page_pool: add a test module for page_pool
On 10/09/2024 12.46, Yunsheng Lin wrote:
> On 2024/9/10 1:28, Mina Almasry wrote:
>> On Mon, Sep 9, 2024 at 2:25 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>>
>>> The testing is done by ensuring that the page allocated from
>>> the page_pool instance is pushed into a ptr_ring instance in
>>> a kthread/napi binded to a specified cpu, and a kthread/napi
>>> binded to a specified cpu will pop the page from the ptr_ring
>>> and free it back to the page_pool.
>>>
>>> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
>>
>> It seems this test is has a correctness part and a performance part.
>> For the performance test, Jesper has out of tree tests for the
>> page_pool:
>> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
>>
>> I have these rebased on top of net-next and use them to verify devmem
>> & memory-provider performance:
>> https://github.com/mina/linux/commit/07fd1c04591395d15d83c07298b4d37f6b56157f
>
> Yes, I used that testing ko too when adding frag API support for
> page_pool.
>
> The main issue I remembered was that it only support x86:(
>
Yes, because I've added ASM code for reading TSC counter in a very
precise manor. Given we run many iterations, then I don't think we
need this precise reading. I guess it can simply be replaced with
get_cycles() or get_cycles64(). Then it should work on all archs.
The code already supports wall-clock time via ktime_get() (specifically
ktime_get_real_ts64()).
>>
>> My preference here (for the performance part) is to upstream the
>> out-of-tree tests that Jesper (and probably others) are using, rather
>> than adding a new performance test that is not as battle-hardened.
>
> I looked through the out-of-tree tests again, it seems we can take the
> best of them.
> For Jesper' ko:
> It seems we can do prefill as something that pp_fill_ptr_ring() does
> in bench_page_pool_simple.c to avoid the noise from the page allocator.
>
>
> For the ko in this patch:
> It uses NAPI instead of tasklet mimicking the NAPI context, support
> PP_FLAG_DMA_MAP flag testing, and return '-EAGAIN' in module_init()
> to use perf stat for collecting and calculating performance data.
>
My bench don't return minus-number on module load, because I used perf
record, and to see symbols decoded with perf report, I needed the module
to be loaded.
I started on reading the PMU counters[1] around the bench loop, it works
if enabling PMU counters yourself/manually, but I never finished that work.
[1]
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/linux/time_bench.h#L195-L209
> Is there other testcase or better practicing that we can learn from
> Jesper' out of tree ko?
>
I created a time_bench.c [2] module that other modules [3] can use to
easier reuse the benchmarking code in other modules.
[2]
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench.c
[3]
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
--Jesper
Powered by blists - more mailing lists