[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z1FjemRX/GjP5EVS@xsang-OptiPlex-9020>
Date: Thu, 5 Dec 2024 16:25:30 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Yu Zhao <yuzhao@...gle.com>
CC: Yin Fengwei <fengwei.yin@...el.com>, <oe-lkp@...ts.linux.dev>,
<lkp@...el.com>, <linux-kernel@...r.kernel.org>, Andrew Morton
<akpm@...ux-foundation.org>, Ryan Roberts <ryan.roberts@....com>, "David
Hildenbrand" <david@...hat.com>, Kefeng Wang <wangkefeng.wang@...wei.com>,
Matthew Wilcox <willy@...radead.org>, Minchan Kim <minchan@...nel.org>,
Vishal Moola <vishal.moola@...il.com>, Yang Shi <shy828301@...il.com>,
<linux-mm@...ck.org>, <oliver.sang@...el.com>
Subject: Re: [linus:master] [madvise] 2f406263e3:
stress-ng.mremap.ops_per_sec 6.7% regression
hi, Yu Zhao,
On Mon, Dec 02, 2024 at 03:24:18PM -0700, Yu Zhao wrote:
> Hi Oliver,
>
> On Fri, Nov 29, 2024 at 12:50 AM kernel test robot
> <oliver.sang@...el.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 6.7% regression of stress-ng.mremap.ops_per_sec on:
> >
> >
> > commit: 2f406263e3e954aa24c1248edcfa9be0c1bb30fa ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [still regression on fix commit cc864ebba5f612ce2960e7e09322a193e8fda0d7]
> >
> > testcase: stress-ng
> > config: x86_64-rhel-8.3
> > compiler: gcc-12
> > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > parameters:
> >
> > nr_threads: 100%
> > testtime: 60s
> > test: mremap
> > cpufreq_governor: performance
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202411291513.ad55672a-lkp@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20241129/202411291513.ad55672a-lkp@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> > gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mremap/stress-ng/60s
> >
> > commit:
> > 6867c7a332 ("mm: multi-gen LRU: don't spin during memcg release")
> > 2f406263e3 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
>
> The .config you attached shows CONFIG_LRU_GEN_ENABLED is NOT set for
> LKP. So this regression can't be from the first commit above.
is the "first commit above" you mentioned 6867c7a332?
sorry that 6867c7a332 is actually the parent of 2f406263e3, and we reported
this regression upon 2f406263e3.
>
> Also, I asked you a few times if it's possible to set it to 'y'. It'd
> be great if we could do that :)
thanks a lot for information! recently we updated our config from above
x86_64-rhel-8.3 to x86_64-rhel-9.4, which really contains the
CONFIG_LRU_GEN_ENABLED (the config is attached).
then we rerun test, seems there is still similar regression.
(still 7.2% regression on 2f406263e3 if comparing to its parent 6867c7a332)
just FYI. if you have other config or patch need us to test, please let us know
thanks a lot!
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mremap/stress-ng/60s
commit:
6867c7a332 ("mm: multi-gen LRU: don't spin during memcg release")
2f406263e3 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
6867c7a3320669cb 2f406263e3e954aa24c1248edcf
---------------- ---------------------------
%stddev %change %stddev
\ | \
38.47 ± 4% +4.8 43.29 ± 3% mpstat.cpu.all.sys%
380.40 ± 27% +134.4% 891.60 ± 26% perf-c2c.HITM.local
75.50 ± 58% +166.0% 200.80 ± 21% perf-c2c.HITM.remote
2.308e+08 -7.2% 2.141e+08 ± 2% time.minor_page_faults
1494 ± 4% +12.9% 1686 ± 3% time.system_time
2163 ± 3% -8.9% 1971 ± 3% time.user_time
367631 -7.2% 341043 ± 2% stress-ng.mremap.ops
6127 -7.2% 5683 ± 2% stress-ng.mremap.ops_per_sec
2.308e+08 -7.2% 2.141e+08 ± 2% stress-ng.time.minor_page_faults
1494 ± 4% +12.9% 1686 ± 3% stress-ng.time.system_time
2163 ± 3% -8.9% 1971 ± 3% stress-ng.time.user_time
0.02 ± 10% +139.7% 0.04 ± 45% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.03 ± 39% +139.1% 0.08 ± 71% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.02 ± 7% +273.7% 0.07 ± 87% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.03 ± 12% +159.2% 0.07 ± 91% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.14 ± 52% +1075.0% 1.66 ± 79% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
2.54 ± 17% +104.7% 5.20 ± 21% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.21 ± 2% +17.3% 0.25 ± 13% perf-sched.total_sch_delay.average.ms
1248 ± 17% +50.5% 1879 ± 16% perf-sched.wait_and_delay.count.__cond_resched.shrink_folio_list.reclaim_folio_list.reclaim_pages.madvise_cold_or_pageout_pte_range
2.177e+08 -7.3% 2.019e+08 ± 2% proc-vmstat.numa_hit
2.176e+08 -7.3% 2.018e+08 ± 2% proc-vmstat.numa_local
4.04e+08 -7.2% 3.749e+08 ± 2% proc-vmstat.pgalloc_normal
2.312e+08 -7.2% 2.146e+08 ± 2% proc-vmstat.pgfault
4.038e+08 -7.2% 3.747e+08 ± 2% proc-vmstat.pgfree
343815 -7.2% 318954 ± 2% proc-vmstat.thp_deferred_split_page
367694 -7.2% 341106 ± 2% proc-vmstat.thp_fault_alloc
23844 +285.8% 91982 ± 49% proc-vmstat.thp_split_page
367723 -7.2% 341134 ± 2% proc-vmstat.thp_split_pmd
23844 -7.2% 22116 ± 3% proc-vmstat.thp_swpout_fallback
24.78 ± 2% -5.1% 23.52 perf-stat.i.MPKI
1.12e+09 ± 2% -7.0% 1.041e+09 perf-stat.i.cache-misses
1.663e+09 -6.7% 1.551e+09 ± 2% perf-stat.i.cache-references
4.33 +2.1% 4.42 perf-stat.i.cpi
194.59 ± 3% -19.7% 156.24 ± 8% perf-stat.i.cpu-migrations
4.484e+10 -2.1% 4.389e+10 perf-stat.i.instructions
0.23 -2.0% 0.23 perf-stat.i.ipc
24.97 ± 2% -5.1% 23.71 perf-stat.overall.MPKI
4.34 +2.1% 4.43 perf-stat.overall.cpi
174.07 ± 2% +7.5% 187.04 perf-stat.overall.cycles-between-cache-misses
0.23 -2.0% 0.23 perf-stat.overall.ipc
1.102e+09 ± 2% -7.1% 1.024e+09 perf-stat.ps.cache-misses
1.637e+09 -6.8% 1.526e+09 ± 2% perf-stat.ps.cache-references
190.96 ± 3% -19.6% 153.53 ± 8% perf-stat.ps.cpu-migrations
4.412e+10 -2.1% 4.319e+10 perf-stat.ps.instructions
2.709e+12 -2.1% 2.651e+12 perf-stat.total.instructions
0.96 ± 3% +0.2 1.14 ± 15% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.do_munmap
1.81 ± 3% +0.4 2.18 ± 16% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
1.74 ± 3% +0.4 2.12 ± 16% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
0.71 ± 8% +0.4 1.11 ± 34% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
0.83 ± 6% +1.3 2.10 ± 51% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range
0.86 ± 6% +1.3 2.15 ± 50% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range
0.85 ± 6% +1.3 2.14 ± 50% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range
0.98 ± 5% +1.3 2.32 ± 48% perf-profile.calltrace.cycles-pp.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range
0.00 +2.0 1.97 ± 78% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.madvise_cold
0.00 +2.0 1.97 ± 78% perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_range.madvise_cold.madvise_vma_behavior
0.00 +2.0 1.97 ± 78% perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_range.madvise_cold.madvise_vma_behavior.do_madvise
0.00 +2.0 1.97 ± 78% perf-profile.calltrace.cycles-pp.walk_page_range.madvise_cold.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.00 +2.0 1.98 ± 78% perf-profile.calltrace.cycles-pp.madvise_cold.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
0.56 ± 2% -0.0 0.51 ± 3% perf-profile.children.cycles-pp.kmem_cache_alloc
0.36 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.kthread
0.36 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.ret_from_fork
0.26 ± 2% -0.0 0.24 ± 3% perf-profile.children.cycles-pp.vm_area_dup
0.34 ± 6% +0.1 0.47 ± 12% perf-profile.children.cycles-pp.folio_deactivate
0.06 ± 20% +0.1 0.19 ± 41% perf-profile.children.cycles-pp.__free_one_page
0.26 ± 10% +0.1 0.40 ± 16% perf-profile.children.cycles-pp.free_unref_page_list
0.08 ± 14% +0.2 0.24 ± 46% perf-profile.children.cycles-pp.free_pcppages_bulk
0.23 ± 2% +0.2 0.40 ± 42% perf-profile.children.cycles-pp.rmap_walk_anon
0.41 ± 2% +0.2 0.58 ± 22% perf-profile.children.cycles-pp.lru_gen_del_folio
0.07 ± 5% +0.2 0.25 ± 59% perf-profile.children.cycles-pp.page_counter_uncharge
0.09 ± 4% +0.2 0.29 ± 55% perf-profile.children.cycles-pp.uncharge_batch
0.00 +0.2 0.22 ± 80% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
2.14 ± 3% +0.4 2.54 ± 15% perf-profile.children.cycles-pp.tlb_finish_mmu
2.01 ± 3% +0.4 2.42 ± 15% perf-profile.children.cycles-pp.tlb_batch_pages_flush
2.06 ± 3% +0.4 2.48 ± 15% perf-profile.children.cycles-pp.release_pages
2.34 ± 3% +0.7 3.04 ± 21% perf-profile.children.cycles-pp.folio_batch_move_lru
0.86 ± 5% +1.3 2.15 ± 50% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.86 ± 6% +1.3 2.15 ± 50% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
0.98 ± 5% +1.4 2.33 ± 48% perf-profile.children.cycles-pp.folio_isolate_lru
0.41 ± 5% +1.6 1.98 ± 78% perf-profile.children.cycles-pp.madvise_cold
0.00 +2.3 2.28 ±104% perf-profile.children.cycles-pp.__page_cache_release
0.00 +2.5 2.47 ±103% perf-profile.children.cycles-pp.__folio_put
0.10 ± 7% +2.6 2.75 ± 97% perf-profile.children.cycles-pp.__split_huge_page
0.11 ± 8% +2.8 2.88 ± 97% perf-profile.children.cycles-pp.split_huge_page_to_list
2.15 ± 6% +3.1 5.29 ± 60% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.16 ± 4% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.kmem_cache_alloc
0.18 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.walk_pmd_range
0.14 ± 2% +0.0 0.16 ± 5% perf-profile.self.cycles-pp.madvise_cold_or_pageout_pte_range
0.42 ± 5% +0.1 0.50 ± 6% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.06 ± 18% +0.1 0.18 ± 43% perf-profile.self.cycles-pp.__free_one_page
0.29 ± 2% +0.2 0.45 ± 27% perf-profile.self.cycles-pp.lru_gen_del_folio
0.07 ± 6% +0.2 0.24 ± 59% perf-profile.self.cycles-pp.page_counter_uncharge
View attachment "config-6.5.0-rc4-00026-g2f406263e3e9" of type "text/plain" (229808 bytes)
Powered by blists - more mailing lists