[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8b661116-0a85-4928-91ed-3c01ebbf8d39@bytedance.com>
Date: Wed, 29 Jan 2025 01:06:55 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: David Hildenbrand <david@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...ux.intel.com>, Andy Lutomirski
<luto@...nel.org>, Catalin Marinas <catalin.marinas@....com>,
David Rientjes <rientjes@...gle.com>, Hugh Dickins <hughd@...gle.com>,
Jann Horn <jannh@...gle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Matthew Wilcox <willy@...radead.org>, Mel Gorman <mgorman@...e.de>,
Muchun Song <muchun.song@...ux.dev>, Peter Xu <peterx@...hat.com>,
Will Deacon <will@...nel.org>, Zach O'Keefe <zokeefe@...gle.com>,
Dan Carpenter <dan.carpenter@...aro.org>, Rik van Riel <riel@...riel.com>
Subject: Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per_sec
63.0% regression
Hi,
On 2025/1/28 21:42, David Hildenbrand wrote:
> On 28.01.25 14:28, Peter Zijlstra wrote:
>> On Tue, Jan 28, 2025 at 12:39:51PM +0100, David Hildenbrand wrote:
>>> On 28.01.25 12:31, Peter Zijlstra wrote:
>>
>>>>> I recall a recent series to select MMU_GATHER_RCU_TABLE_FREE on x86
>>>>> unconditionally (@Peter, @Rik).
>>>>
>>>> Those changes should not have made it to Linus yet.
>>>>
>>>> /me updates git and checks...
>>>>
>>>> nope, nothing changed there ... yet
>>>
>>> Sorry, I wasn't quite clear. CONFIG_PT_RECLAIM made it upstream,
>>> which has
>>> "select MMU_GATHER_RCU_TABLE_FREE" in kconfig.
>>>
>>> So I'm wondering if the degradation we see in this report is due to
>>> MMU_GATHER_RCU_TABLE_FREE being selected by CONFIG_PT_RECLAIM, and
>>> we'd get
>>> the same result (degradation) when unconditionally enabling
>>> MMU_GATHER_RCU_TABLE_FREE.
>>
>> Ah, yes, put a RHEL based config (as is the case here) should already
>> have it selected due to PARAVIRT.
>
> Ah, right. Most distros will just have it enabled either way.
>
> But that would then mean that MMU_GATHER_RCU_TABLE_FREE is not the cause
> for the regression here, and something else is going wrong.
>
I did reproduce the performance regression using the following test
program:
stress-ng --timeout 60 --times --verify --metrics --no-rand-seed
--mmapaddr 64
The results are as follows:
1) Enable CONFIG_PT_RECLAIM
stress-ng: info: [826] dispatching hogs: 64 mmapaddr
stress-ng: info: [826] successful run completed in 60.29s (1 min, 0.29
secs)
stress-ng: info: [826] stressor bogo ops real time usr time sys
time bogo ops/s bogo ops/s
stress-ng: info: [826] (secs) (secs)
(secs) (real time) (usr+sys time)
stress-ng: info: [826] mmapaddr 17233711 60.01 238.47
1128.46 287178.92 12607.60
stress-ng: info: [826] for a 60.29s run time:
stress-ng: info: [826] 1447.07s available CPU time
stress-ng: info: [826] 238.85s user time ( 16.51%)
stress-ng: info: [826] 1128.87s system time ( 78.01%)
stress-ng: info: [826] 1367.72s total time ( 94.52%)
stress-ng: info: [826] load average: 48.64 20.73 7.82
2) Disable CONFIG_PT_RECLAIM
stress-ng: info: [704] dispatching hogs: 64 mmapaddr
stress-ng: info: [704] successful run completed in 60.05s (1 min, 0.05
secs)
stress-ng: info: [704] stressor bogo ops real time usr time sys
time bogo ops/s bogo ops/s
stress-ng: info: [704] (secs) (secs)
(secs) (real time) (usr+sys time)
stress-ng: info: [704] mmapaddr 28440843 60.02 343.93
1090.70 473882.98 19824.51
stress-ng: info: [704] for a 60.05s run time:
stress-ng: info: [704] 1441.23s available CPU time
stress-ng: info: [704] 344.30s user time ( 23.89%)
stress-ng: info: [704] 1091.12s system time ( 75.71%)
stress-ng: info: [704] 1435.42s total time ( 99.60%)
stress-ng: info: [704] load average: 40.03 11.51 3.96
Then I found that after enabling CONFIG_PT_RECLAIM, there was an
additional perf hotspot function:
16.35% [kernel] [k] _raw_spin_unlock_irqrestore
9.09% [kernel] [k] clear_page_rep
6.92% [kernel] [k] do_syscall_64
3.76% [kernel] [k] _raw_spin_lock
3.27% [kernel] [k] __slab_free
2.07% [kernel] [k] rcu_cblist_dequeue
1.94% [kernel] [k] flush_tlb_mm_range
1.87% [kernel] [k] lruvec_stat_mod_folio.part.130
1.79% [kernel] [k] get_page_from_freelist
1.61% [kernel] [k] tlb_remove_table_rcu
1.58% [kernel] [k] kmem_cache_alloc_noprof
1.43% [kernel] [k] mtree_range_walk
And its call stack is as follows:
bpftrace -e 'k:_raw_spin_unlock_irqrestore {@[kstack,comm]=count();}
interval:s:1 {exit();}'
@[
_raw_spin_unlock_irqrestore+5
free_one_page+85
rcu_do_batch+424
rcu_core+401
handle_softirqs+204
irq_exit_rcu+208
sysvec_apic_timer_interrupt+113
asm_sysvec_apic_timer_interrupt+26
_raw_spin_unlock_irqrestore+29
get_page_from_freelist+2014
__alloc_frozen_pages_noprof+364
alloc_pages_mpol+123
alloc_pages_noprof+14
get_free_pages_noprof+17
__x64_sys_mincore+141
do_syscall_64+98
entry_SYSCALL_64_after_hwframe+118
, stress-ng-mmapa]: 2283
@[
_raw_spin_unlock_irqrestore+5
get_page_from_freelist+2014
__alloc_frozen_pages_noprof+364
alloc_pages_mpol+123
alloc_pages_noprof+14
pte_alloc_one+30
__pte_alloc+42
do_pte_missing+2499
__handle_mm_fault+1862
handle_mm_fault+195
__get_user_pages+690
populate_vma_page_range+127
__mm_populate+159
vm_mmap_pgoff+329
do_syscall_64+98
entry_SYSCALL_64_after_hwframe+118
, stress-ng-mmapa]: 2443
@[
_raw_spin_unlock_irqrestore+5
get_page_from_freelist+2014
__alloc_frozen_pages_noprof+364
alloc_pages_mpol+123
alloc_pages_noprof+14
get_free_pages_noprof+17
__x64_sys_mincore+141
do_syscall_64+98
entry_SYSCALL_64_after_hwframe+118
, stress-ng-mmapa]: 5184
@[
_raw_spin_unlock_irqrestore+5
free_one_page+85
tlb_remove_table_rcu+140
rcu_do_batch+424
rcu_core+401
handle_softirqs+204
irq_exit_rcu+208
sysvec_apic_timer_interrupt+113
asm_sysvec_apic_timer_interrupt+26
_raw_spin_unlock_irqrestore+29
get_page_from_freelist+2014
__alloc_frozen_pages_noprof+364
alloc_pages_mpol+123
alloc_pages_noprof+14
get_free_pages_noprof+17
__x64_sys_mincore+141
do_syscall_64+98
entry_SYSCALL_64_after_hwframe+118
, stress-ng-mmapa]: 5301
@Error looking up stack id 4294967279 (pid -1): -1
[, stress-ng-mmapa]: 53366
It seems to be related to CONFIG_MMU_GATHER_RCU_TABLE_FREE?
I will continue to investigate further.
Thanks!
Powered by blists - more mailing lists