linux-kernel - Re: [PATCH net-next v2 00/10] Replace page_frag with page_frag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3de1b8a3-ae4f-492f-969d-bc6f2c145d09@huawei.com>
Date: Mon, 9 Dec 2024 19:42:51 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Shuah Khan
	<skhan@...uxfoundation.org>, Andrew Morton <akpm@...ux-foundation.org>,
	Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH net-next v2 00/10] Replace page_frag with page_frag_cache
 (Part-2)

On 2024/12/9 5:34, Alexander Duyck wrote:

...

>>
>> Performance validation for part2:
>> 1. Using micro-benchmark ko added in patch 1 to test aligned and
>>    non-aligned API performance impact for the existing users, there
>>    seems to be about 20% performance degradation for refactoring
>>    page_frag to support the new API, which seems to nullify most of
>>    the performance gain in [3] of part1.
> 
> So if I am understanding correctly then this is showing a 20%
> performance degradation with this patchset. I would argue that it is
> significant enough that it would be a blocking factor for this patch
> set. I would suggest bisecting the patch set to identify where the
> performance degradation has been added and see what we can do to
> resolve it, and if nothing else document it in that patch so we can
> identify the root cause for the slowdown.

The only patch in this patchset affecting the performance of existing API
seems to be patch 1, only including patch 1 does show ~20% performance
degradation as including the whole patchset does:
mm: page_frag: some minor refactoring before adding new API

And the cause seems to be about the binary increasing as below, as the
performance degradation didn't seems to change much when I tried inlining
the __page_frag_cache_commit_noref() by moving it to the header file:

./scripts/bloat-o-meter vmlinux_orig vmlinux
add/remove: 3/2 grow/shrink: 5/0 up/down: 920/-500 (420)
Function                                     old     new   delta
__page_frag_cache_prepare                      -     500    +500
__napi_alloc_frag_align                       68     180    +112
__netdev_alloc_skb                           488     596    +108
napi_alloc_skb                               556     624     +68
__netdev_alloc_frag_align                    196     252     +56
svc_tcp_sendmsg                              340     376     +36
__page_frag_cache_commit_noref                 -      32     +32
e843419@...6_0000bd47_30                       -       8      +8
e843419@...9_000044ee_684                      8       -      -8
__page_frag_alloc_align                      492       -    -492
Total: Before=34719207, After=34719627, chg +0.00%

./scripts/bloat-o-meter page_frag_test_orig.ko page_frag_test.ko
add/remove: 0/0 grow/shrink: 2/0 up/down: 78/0 (78)
Function                                     old     new   delta
page_frag_push_thread                        508     580     +72
__UNIQUE_ID_vermagic367                       67      73      +6
Total: Before=4582, After=4660, chg +1.70%

Patch 1 is about refactoring common codes from __page_frag_alloc_va_align()
to __page_frag_cache_prepare() and __page_frag_cache_commit(), so that the
new API can make use of them as much as possible.

Any better idea to reuse common codes as much as possible while avoiding
the performance degradation as much as possible?

> 
>> 2. Use the below netcat test case, there seems to be some minor
>>    performance gain for replacing 'page_frag' with 'page_frag_cache'
>>    using the new page_frag API after this patchset.
>>    server: taskset -c 32 nc -l -k 1234 > /dev/null
>>    client: perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234
> 
> This test would barely touch the page pool. The fact is most of the

I am guessing you meant page_frag here?

> overhead for this would likely be things like TCP latency and data
> copy much more than the page allocation. As such fluctuations here are
> likely not related to your changes.

But it does tell us something that the replacing does not seems to
cause obvious regression, right?

I tried using a smaller MTU to amplify the impact of page allocation,
it seemed to have a similar result.