[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87355dlzd3.fsf@oracle.com>
Date: Thu, 06 Apr 2023 10:00:40 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
willy@...radead.org, mgorman@...e.de, rostedt@...dmis.org,
tglx@...utronix.de, vincent.guittot@...aro.org, jon.grimm@....com,
bharata@....com, boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH 9/9] x86/clear_huge_page: make clear_contig_region()
preemptible
Peter Zijlstra <peterz@...radead.org> writes:
> On Sun, Apr 02, 2023 at 10:22:33PM -0700, Ankur Arora wrote:
>> clear_contig_region() can be used to clear up to a huge-page (2MB/1GB)
>> chunk Allow preemption in the irqentry_exit path to make sure we don't
>> hold on to the CPU for an arbitrarily long period.
>>
>> Performance: vm-scalability/case-anon-w-seq-hugetlb mmaps an anonymous
>> hugetlb-2mb region, and then writes sequentially to the region, demand
>> faulting pages on the way.
>>
>> This test, with a CONFIG_VOLUNTARY config shows the effects of this
>> change: stime drops (~18% on Icelakex, ~5% on Milan), while the utime
>> goes up (~15% on Icelakex, ~13% on Milan.)
>>
>> *Icelakex* mm/clear_huge_page x86/clear_huge_page change
>> (mem=4GB/task, tasks=128)
>>
>> stime 293.02 +- .49% 239.39 +- .83% -18.30%
>> utime 440.11 +- .28% 508.74 +- .60% +15.59%
>> wall-clock 5.96 +- .33% 6.27 +-2.23% + 5.20%
>>
>>
>>
>> *Milan* mm/clear_huge_page x86/clear_huge_page change
>> (mem=1GB/task, tasks=512)
>>
>> stime 490.95 +- 3.55% 466.90 +- 4.79% - 4.89%
>> utime 276.43 +- 2.85% 311.97 +- 5.15% +12.85%
>> wall-clock 3.74 +- 6.41% 3.58 +- 7.82% - 4.27%
>>
>> The drop in stime is due to REP; STOS being more efficient for bigger
>> extents. The increase in utime is due to cache effects of that change:
>> mm/clear_huge_page() clears page-at-a-time, while narrowing towards the
>> faulting page; while x86/clear_huge_page only optimizes for cache
>> locality in the local neighbourhood of the faulting address.
>>
>> This effect on utime is visible via the increased L1-dcache-load-misses
>> and LLC-load* and an increased backend boundedness for perf user-stat
>> --all-user on Icelakex. The effect is slight but given the heavy cache
>> pressure generated by the test, shows up in the drop in user IPC:
>>
>> - 9,455,243,414,829 instructions # 2.75 insn per cycle ( +- 14.14% ) (46.17%)
>> - 2,367,920,864,112 L1-dcache-loads # 1.054 G/sec ( +- 14.14% ) (69.24%)
>> - 42,075,182,813 L1-dcache-load-misses # 2.96% of all L1-dcache accesses ( +- 14.14% ) (69.24%)
>> - 20,365,688 LLC-loads # 9.064 K/sec ( +- 13.98% ) (69.24%)
>> - 890,382 LLC-load-misses # 7.18% of all LL-cache accesses ( +- 14.91% ) (69.24%)
>>
>> + 9,467,796,660,698 instructions # 2.37 insn per cycle ( +- 14.14% ) (46.16%)
>> + 2,369,973,307,561 L1-dcache-loads # 1.027 G/sec ( +- 14.14% ) (69.24%)
>> + 42,155,621,201 L1-dcache-load-misses # 2.96% of all L1-dcache accesses ( +- 14.14% ) (69.24%)
>> + 22,116,300 LLC-loads # 9.588 K/sec ( +- 14.20% ) (69.24%)
>> + 1,355,607 LLC-load-misses # 10.29% of all LL-cache accesses ( +- 15.49% ) (69.25%)
>>
>> Given the fact that the stime improves for all loads using this path,
>> while the utime drop is load dependent add this change.
>
> Either I really need sleep, or *NONE* of the above is actually relevant
> to what the patch below actually does!
Yeah, you are right about the relevance.
I wanted to provide two sets of stats: the raw memory BW stats and the
relevant vm-scalability workload. The vm-scalability workload needs a
more reasonable scheduling model than what's present until this patch
and so it seemed to make sense to put here for that reason.
But yeah it doesn't really fit here.
> The above talks about the glories of using large clears, while the patch
> allows reschedules which are about latency.
Yeah, let me find a more reasonable way to present these.
Ankur
>> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
>> ---
>> arch/x86/mm/hugetlbpage.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>> index 4294b77c4f18..c8564b0552e5 100644
>> --- a/arch/x86/mm/hugetlbpage.c
>> +++ b/arch/x86/mm/hugetlbpage.c
>> @@ -158,7 +158,17 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>> static void clear_contig_region(struct page *page, unsigned long vaddr,
>> unsigned int npages)
>> {
>> + might_sleep();
>> +
>> + /*
>> + * We might be clearing a large region.
>> + * Allow rescheduling.
>> + */
>> + allow_resched();
>> clear_user_pages(page_address(page), vaddr, page, npages);
>> + disallow_resched();
>> +
>> + cond_resched();
>> }
>>
>> void clear_huge_page(struct page *page,
>> --
>> 2.31.1
>>
--
ankur
Powered by blists - more mailing lists