[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y1gsrx32.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Wed, 27 Sep 2023 13:42:25 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Mel Gorman <mgorman@...hsingularity.net>,
Miaohe Lin <linmiaohe@...wei.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Zi Yan <ziy@...dia.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching
Johannes Weiner <hannes@...xchg.org> writes:
> The idea behind the cache is to save get_pageblock_migratetype()
> lookups during bulk freeing. A microbenchmark suggests this isn't
> helping, though. The pcp migratetype can get stale, which means that
> bulk freeing has an extra branch to check if the pageblock was
> isolated while on the pcp.
>
> While the variance overlaps, the cache write and the branch seem to
> make this a net negative. The following test allocates and frees
> batches of 10,000 pages (~3x the pcp high marks to trigger flushing):
>
> Before:
> 8,668.48 msec task-clock # 99.735 CPUs utilized ( +- 2.90% )
> 19 context-switches # 4.341 /sec ( +- 3.24% )
> 0 cpu-migrations # 0.000 /sec
> 17,440 page-faults # 3.984 K/sec ( +- 2.90% )
> 41,758,692,473 cycles # 9.541 GHz ( +- 2.90% )
> 126,201,294,231 instructions # 5.98 insn per cycle ( +- 2.90% )
> 25,348,098,335 branches # 5.791 G/sec ( +- 2.90% )
> 33,436,921 branch-misses # 0.26% of all branches ( +- 2.90% )
>
> 0.0869148 +- 0.0000302 seconds time elapsed ( +- 0.03% )
>
> After:
> 8,444.81 msec task-clock # 99.726 CPUs utilized ( +- 2.90% )
> 22 context-switches # 5.160 /sec ( +- 3.23% )
> 0 cpu-migrations # 0.000 /sec
> 17,443 page-faults # 4.091 K/sec ( +- 2.90% )
> 40,616,738,355 cycles # 9.527 GHz ( +- 2.90% )
> 126,383,351,792 instructions # 6.16 insn per cycle ( +- 2.90% )
> 25,224,985,153 branches # 5.917 G/sec ( +- 2.90% )
> 32,236,793 branch-misses # 0.25% of all branches ( +- 2.90% )
>
> 0.0846799 +- 0.0000412 seconds time elapsed ( +- 0.05% )
>
> A side effect is that this also ensures that pages whose pageblock
> gets stolen while on the pcplist end up on the right freelist and we
> don't perform potentially type-incompatible buddy merges (or skip
> merges when we shouldn't), whis is likely beneficial to long-term
> fragmentation management, although the effects would be harder to
> measure. Settle for simpler and faster code as justification here.
I suspected the PCP allocating/freeing path may be influenced (that is,
allocating/freeing batch is less than PCP high). So I tested
one-process will-it-scale/page_fault1 with sysctl
percpu_pagelist_high_fraction=8. So pages will be allocated/freed
from/to PCP only. The test results are as follows,
Before:
will-it-scale.1.processes 618364.3 (+- 0.075%)
perf-profile.children.get_pfnblock_flags_mask 0.13 (+- 9.350%)
After:
will-it-scale.1.processes 616512.0 (+- 0.057%)
perf-profile.children.get_pfnblock_flags_mask 0.41 (+- 22.44%)
The change isn't large: -0.3%. Perf profiling shows the cycles% of
get_pfnblock_flags_mask() increases.
--
Best Regards,
Huang, Ying
Powered by blists - more mailing lists