[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UdUVo6ujupoo-hdrW95XOGQLCDzd+rHGUVB6_SEmvqFHg@mail.gmail.com>
Date: Mon, 28 Oct 2024 08:30:45 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Shuah Khan <skhan@...uxfoundation.org>, Andrew Morton <akpm@...ux-foundation.org>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH net-next v23 0/7] Replace page_frag with page_frag_cache (Part-1)
On Mon, Oct 28, 2024 at 5:00 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> This is part 1 of "Replace page_frag with page_frag_cache",
> which mainly contain refactoring and optimization for the
> implementation of page_frag API before the replacing.
>
> As the discussion in [1], it would be better to target net-next
> tree to get more testing as all the callers page_frag API are
> in networking, and the chance of conflicting with MM tree seems
> low as implementation of page_frag API seems quite self-contained.
>
> After [2], there are still two implementations for page frag:
>
> 1. mm/page_alloc.c: net stack seems to be using it in the
> rx part with 'struct page_frag_cache' and the main API
> being page_frag_alloc_align().
> 2. net/core/sock.c: net stack seems to be using it in the
> tx part with 'struct page_frag' and the main API being
> skb_page_frag_refill().
>
> This patchset tries to unfiy the page frag implementation
> by replacing page_frag with page_frag_cache for sk_page_frag()
> first. net_high_order_alloc_disable_key for the implementation
> in net/core/sock.c doesn't seems matter that much now as pcp
> is also supported for high-order pages:
> commit 44042b449872 ("mm/page_alloc: allow high-order pages to
> be stored on the per-cpu lists")
>
> As the related change is mostly related to networking, so
> targeting the net-next. And will try to replace the rest
> of page_frag in the follow patchset.
>
> After this patchset:
> 1. Unify the page frag implementation by taking the best out of
> two the existing implementations: we are able to save some space
> for the 'page_frag_cache' API user, and avoid 'get_page()' for
> the old 'page_frag' API user.
> 2. Future bugfix and performance can be done in one place, hence
> improving maintainability of page_frag's implementation.
>
> Kernel Image changing:
> Linux Kernel total | text data bss
> ------------------------------------------------------
> after 45250307 | 27274279 17209996 766032
> before 45254134 | 27278118 17209984 766032
> delta -3827 | -3839 +12 +0
>
> Performance validation:
> 1. Using micro-benchmark ko added in patch 1 to test aligned and
> non-aligned API performance impact for the existing users, there
> is no notiable performance degradation. Instead we seems to have
> some major performance boot for both aligned and non-aligned API
> after switching to ptr_ring for testing, respectively about 200%
> and 10% improvement in arm64 server as below.
>
> 2. Use the below netcat test case, we also have some minor
> performance boot for replacing 'page_frag' with 'page_frag_cache'
> after this patchset.
> server: taskset -c 32 nc -l -k 1234 > /dev/null
> client: perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234
>
> In order to avoid performance noise as much as possible, the testing
> is done in system without any other load and have enough iterations to
> prove the data is stable enough, complete log for testing is below:
>
> perf stat -r 200 -- insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000
> perf stat -r 200 -- insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1
> taskset -c 32 nc -l -k 1234 > /dev/null
> perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234
>
> *After* this patchset:
>
> Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs):
>
> 17.758393 task-clock (msec) # 0.004 CPUs utilized ( +- 0.51% )
> 5 context-switches # 0.293 K/sec ( +- 0.65% )
> 0 cpu-migrations # 0.008 K/sec ( +- 17.21% )
> 74 page-faults # 0.004 M/sec ( +- 0.12% )
> 46128650 cycles # 2.598 GHz ( +- 0.51% )
> 60810511 instructions # 1.32 insn per cycle ( +- 0.04% )
> 14764914 branches # 831.433 M/sec ( +- 0.04% )
> 19281 branch-misses # 0.13% of all branches ( +- 0.13% )
>
> 4.240273854 seconds time elapsed ( +- 0.13% )
>
> Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs):
>
> 17.348690 task-clock (msec) # 0.019 CPUs utilized ( +- 0.66% )
> 5 context-switches # 0.310 K/sec ( +- 0.84% )
> 0 cpu-migrations # 0.009 K/sec ( +- 16.55% )
> 74 page-faults # 0.004 M/sec ( +- 0.11% )
> 45065287 cycles # 2.598 GHz ( +- 0.66% )
> 60755389 instructions # 1.35 insn per cycle ( +- 0.05% )
> 14747865 branches # 850.085 M/sec ( +- 0.05% )
> 19272 branch-misses # 0.13% of all branches ( +- 0.13% )
>
> 0.935251375 seconds time elapsed ( +- 0.07% )
>
> Performance counter stats for 'taskset -c 0 head -c 20G /dev/zero' (200 runs):
>
> 16626.042731 task-clock (msec) # 0.607 CPUs utilized ( +- 0.03% )
> 3291020 context-switches # 0.198 M/sec ( +- 0.05% )
> 1 cpu-migrations # 0.000 K/sec ( +- 0.50% )
> 85 page-faults # 0.005 K/sec ( +- 0.16% )
> 30581044838 cycles # 1.839 GHz ( +- 0.05% )
> 34962744631 instructions # 1.14 insn per cycle ( +- 0.01% )
> 6483883671 branches # 389.984 M/sec ( +- 0.02% )
> 99624551 branch-misses # 1.54% of all branches ( +- 0.17% )
>
> 27.370305077 seconds time elapsed ( +- 0.01% )
>
>
> *Before* this patchset:
>
> Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs):
>
> 21.587934 task-clock (msec) # 0.005 CPUs utilized ( +- 0.72% )
> 6 context-switches # 0.281 K/sec ( +- 0.28% )
> 1 cpu-migrations # 0.047 K/sec ( +- 0.50% )
> 73 page-faults # 0.003 M/sec ( +- 0.12% )
> 56080697 cycles # 2.598 GHz ( +- 0.72% )
> 61605150 instructions # 1.10 insn per cycle ( +- 0.05% )
> 14950196 branches # 692.526 M/sec ( +- 0.05% )
> 19410 branch-misses # 0.13% of all branches ( +- 0.18% )
>
> 4.603530546 seconds time elapsed ( +- 0.11% )
>
> Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs):
>
> 20.988297 task-clock (msec) # 0.006 CPUs utilized ( +- 0.81% )
> 7 context-switches # 0.316 K/sec ( +- 0.54% )
> 1 cpu-migrations # 0.048 K/sec ( +- 0.70% )
> 73 page-faults # 0.003 M/sec ( +- 0.11% )
> 54512166 cycles # 2.597 GHz ( +- 0.81% )
> 61440941 instructions # 1.13 insn per cycle ( +- 0.08% )
> 14906043 branches # 710.207 M/sec ( +- 0.08% )
> 19927 branch-misses # 0.13% of all branches ( +- 0.17% )
>
> 3.438041238 seconds time elapsed ( +- 1.11% )
>
> Performance counter stats for 'taskset -c 0 head -c 20G /dev/zero' (200 runs):
>
> 17364.040855 task-clock (msec) # 0.624 CPUs utilized ( +- 0.02% )
> 3340375 context-switches # 0.192 M/sec ( +- 0.06% )
> 1 cpu-migrations # 0.000 K/sec
> 85 page-faults # 0.005 K/sec ( +- 0.15% )
> 32077623335 cycles # 1.847 GHz ( +- 0.03% )
> 35121047596 instructions # 1.09 insn per cycle ( +- 0.01% )
> 6519872824 branches # 375.481 M/sec ( +- 0.02% )
> 101877022 branch-misses # 1.56% of all branches ( +- 0.14% )
>
> 27.842745343 seconds time elapsed ( +- 0.02% )
>
>
Is this actually the numbers for this patch set? Seems like you have
been using the same numbers for the last several releases. I can
understand the "before" being mostly the same, but since we have
factored out the refactor portion of it the numbers for the "after"
should have deviated as I find it highly unlikely the numbers are
exactly the same down to the nanosecond. from the previous patch set.
Also it wouldn't hurt to have an explanation for the 3.4->0.9 second
performance change as it seems like the samples don't seem to match up
with the elapsed time data.
Powered by blists - more mailing lists