[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3de1b8a3-ae4f-492f-969d-bc6f2c145d09@huawei.com>
Date: Mon, 9 Dec 2024 19:42:51 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Shuah Khan
<skhan@...uxfoundation.org>, Andrew Morton <akpm@...ux-foundation.org>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH net-next v2 00/10] Replace page_frag with page_frag_cache
(Part-2)
On 2024/12/9 5:34, Alexander Duyck wrote:
...
>>
>> Performance validation for part2:
>> 1. Using micro-benchmark ko added in patch 1 to test aligned and
>> non-aligned API performance impact for the existing users, there
>> seems to be about 20% performance degradation for refactoring
>> page_frag to support the new API, which seems to nullify most of
>> the performance gain in [3] of part1.
>
> So if I am understanding correctly then this is showing a 20%
> performance degradation with this patchset. I would argue that it is
> significant enough that it would be a blocking factor for this patch
> set. I would suggest bisecting the patch set to identify where the
> performance degradation has been added and see what we can do to
> resolve it, and if nothing else document it in that patch so we can
> identify the root cause for the slowdown.
The only patch in this patchset affecting the performance of existing API
seems to be patch 1, only including patch 1 does show ~20% performance
degradation as including the whole patchset does:
mm: page_frag: some minor refactoring before adding new API
And the cause seems to be about the binary increasing as below, as the
performance degradation didn't seems to change much when I tried inlining
the __page_frag_cache_commit_noref() by moving it to the header file:
./scripts/bloat-o-meter vmlinux_orig vmlinux
add/remove: 3/2 grow/shrink: 5/0 up/down: 920/-500 (420)
Function old new delta
__page_frag_cache_prepare - 500 +500
__napi_alloc_frag_align 68 180 +112
__netdev_alloc_skb 488 596 +108
napi_alloc_skb 556 624 +68
__netdev_alloc_frag_align 196 252 +56
svc_tcp_sendmsg 340 376 +36
__page_frag_cache_commit_noref - 32 +32
e843419@...6_0000bd47_30 - 8 +8
e843419@...9_000044ee_684 8 - -8
__page_frag_alloc_align 492 - -492
Total: Before=34719207, After=34719627, chg +0.00%
./scripts/bloat-o-meter page_frag_test_orig.ko page_frag_test.ko
add/remove: 0/0 grow/shrink: 2/0 up/down: 78/0 (78)
Function old new delta
page_frag_push_thread 508 580 +72
__UNIQUE_ID_vermagic367 67 73 +6
Total: Before=4582, After=4660, chg +1.70%
Patch 1 is about refactoring common codes from __page_frag_alloc_va_align()
to __page_frag_cache_prepare() and __page_frag_cache_commit(), so that the
new API can make use of them as much as possible.
Any better idea to reuse common codes as much as possible while avoiding
the performance degradation as much as possible?
>
>> 2. Use the below netcat test case, there seems to be some minor
>> performance gain for replacing 'page_frag' with 'page_frag_cache'
>> using the new page_frag API after this patchset.
>> server: taskset -c 32 nc -l -k 1234 > /dev/null
>> client: perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234
>
> This test would barely touch the page pool. The fact is most of the
I am guessing you meant page_frag here?
> overhead for this would likely be things like TCP latency and data
> copy much more than the page allocation. As such fluctuations here are
> likely not related to your changes.
But it does tell us something that the replacing does not seems to
cause obvious regression, right?
I tried using a smaller MTU to amplify the impact of page allocation,
it seemed to have a similar result.
Powered by blists - more mailing lists