lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z1FjemRX/GjP5EVS@xsang-OptiPlex-9020>
Date: Thu, 5 Dec 2024 16:25:30 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Yu Zhao <yuzhao@...gle.com>
CC: Yin Fengwei <fengwei.yin@...el.com>, <oe-lkp@...ts.linux.dev>,
	<lkp@...el.com>, <linux-kernel@...r.kernel.org>, Andrew Morton
	<akpm@...ux-foundation.org>, Ryan Roberts <ryan.roberts@....com>, "David
 Hildenbrand" <david@...hat.com>, Kefeng Wang <wangkefeng.wang@...wei.com>,
	Matthew Wilcox <willy@...radead.org>, Minchan Kim <minchan@...nel.org>,
	Vishal Moola <vishal.moola@...il.com>, Yang Shi <shy828301@...il.com>,
	<linux-mm@...ck.org>, <oliver.sang@...el.com>
Subject: Re: [linus:master] [madvise] 2f406263e3:
 stress-ng.mremap.ops_per_sec 6.7% regression

hi, Yu Zhao,

On Mon, Dec 02, 2024 at 03:24:18PM -0700, Yu Zhao wrote:
> Hi Oliver,
> 
> On Fri, Nov 29, 2024 at 12:50 AM kernel test robot
> <oliver.sang@...el.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 6.7% regression of stress-ng.mremap.ops_per_sec on:
> >
> >
> > commit: 2f406263e3e954aa24c1248edcfa9be0c1bb30fa ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [still regression on fix commit cc864ebba5f612ce2960e7e09322a193e8fda0d7]
> >
> > testcase: stress-ng
> > config: x86_64-rhel-8.3
> > compiler: gcc-12
> > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > parameters:
> >
> >         nr_threads: 100%
> >         testtime: 60s
> >         test: mremap
> >         cpufreq_governor: performance
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202411291513.ad55672a-lkp@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20241129/202411291513.ad55672a-lkp@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> >   gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mremap/stress-ng/60s
> >
> > commit:
> >   6867c7a332 ("mm: multi-gen LRU: don't spin during memcg release")
> >   2f406263e3 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> 
> The .config you attached shows CONFIG_LRU_GEN_ENABLED is NOT set for
> LKP. So this regression can't be from the first commit above.

is the "first commit above" you mentioned 6867c7a332?
sorry that 6867c7a332 is actually the parent of 2f406263e3, and we reported
this regression upon 2f406263e3.

> 
> Also, I asked you a few times if it's possible to set it to 'y'. It'd
> be great if we could do that :)

thanks a lot for information! recently we updated our config from above
x86_64-rhel-8.3 to x86_64-rhel-9.4, which really contains the
CONFIG_LRU_GEN_ENABLED (the config is attached).

then we rerun test, seems there is still similar regression.
(still 7.2% regression on 2f406263e3 if comparing to its parent 6867c7a332)

just FYI. if you have other config or patch need us to test, please let us know

thanks a lot!

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mremap/stress-ng/60s

commit: 
  6867c7a332 ("mm: multi-gen LRU: don't spin during memcg release")
  2f406263e3 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")

6867c7a3320669cb 2f406263e3e954aa24c1248edcf
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
     38.47 ±  4%      +4.8       43.29 ±  3%  mpstat.cpu.all.sys%
    380.40 ± 27%    +134.4%     891.60 ± 26%  perf-c2c.HITM.local
     75.50 ± 58%    +166.0%     200.80 ± 21%  perf-c2c.HITM.remote
 2.308e+08            -7.2%  2.141e+08 ±  2%  time.minor_page_faults
      1494 ±  4%     +12.9%       1686 ±  3%  time.system_time
      2163 ±  3%      -8.9%       1971 ±  3%  time.user_time
    367631            -7.2%     341043 ±  2%  stress-ng.mremap.ops
      6127            -7.2%       5683 ±  2%  stress-ng.mremap.ops_per_sec
 2.308e+08            -7.2%  2.141e+08 ±  2%  stress-ng.time.minor_page_faults
      1494 ±  4%     +12.9%       1686 ±  3%  stress-ng.time.system_time
      2163 ±  3%      -8.9%       1971 ±  3%  stress-ng.time.user_time
      0.02 ± 10%    +139.7%       0.04 ± 45%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.03 ± 39%    +139.1%       0.08 ± 71%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.02 ±  7%    +273.7%       0.07 ± 87%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.03 ± 12%    +159.2%       0.07 ± 91%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.14 ± 52%   +1075.0%       1.66 ± 79%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      2.54 ± 17%    +104.7%       5.20 ± 21%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.21 ±  2%     +17.3%       0.25 ± 13%  perf-sched.total_sch_delay.average.ms
      1248 ± 17%     +50.5%       1879 ± 16%  perf-sched.wait_and_delay.count.__cond_resched.shrink_folio_list.reclaim_folio_list.reclaim_pages.madvise_cold_or_pageout_pte_range
 2.177e+08            -7.3%  2.019e+08 ±  2%  proc-vmstat.numa_hit
 2.176e+08            -7.3%  2.018e+08 ±  2%  proc-vmstat.numa_local
  4.04e+08            -7.2%  3.749e+08 ±  2%  proc-vmstat.pgalloc_normal
 2.312e+08            -7.2%  2.146e+08 ±  2%  proc-vmstat.pgfault
 4.038e+08            -7.2%  3.747e+08 ±  2%  proc-vmstat.pgfree
    343815            -7.2%     318954 ±  2%  proc-vmstat.thp_deferred_split_page
    367694            -7.2%     341106 ±  2%  proc-vmstat.thp_fault_alloc
     23844          +285.8%      91982 ± 49%  proc-vmstat.thp_split_page
    367723            -7.2%     341134 ±  2%  proc-vmstat.thp_split_pmd
     23844            -7.2%      22116 ±  3%  proc-vmstat.thp_swpout_fallback
     24.78 ±  2%      -5.1%      23.52        perf-stat.i.MPKI
  1.12e+09 ±  2%      -7.0%  1.041e+09        perf-stat.i.cache-misses
 1.663e+09            -6.7%  1.551e+09 ±  2%  perf-stat.i.cache-references
      4.33            +2.1%       4.42        perf-stat.i.cpi
    194.59 ±  3%     -19.7%     156.24 ±  8%  perf-stat.i.cpu-migrations
 4.484e+10            -2.1%  4.389e+10        perf-stat.i.instructions
      0.23            -2.0%       0.23        perf-stat.i.ipc
     24.97 ±  2%      -5.1%      23.71        perf-stat.overall.MPKI
      4.34            +2.1%       4.43        perf-stat.overall.cpi
    174.07 ±  2%      +7.5%     187.04        perf-stat.overall.cycles-between-cache-misses
      0.23            -2.0%       0.23        perf-stat.overall.ipc
 1.102e+09 ±  2%      -7.1%  1.024e+09        perf-stat.ps.cache-misses
 1.637e+09            -6.8%  1.526e+09 ±  2%  perf-stat.ps.cache-references
    190.96 ±  3%     -19.6%     153.53 ±  8%  perf-stat.ps.cpu-migrations
 4.412e+10            -2.1%  4.319e+10        perf-stat.ps.instructions
 2.709e+12            -2.1%  2.651e+12        perf-stat.total.instructions
      0.96 ±  3%      +0.2        1.14 ± 15%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.do_munmap
      1.81 ±  3%      +0.4        2.18 ± 16%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.74 ±  3%      +0.4        2.12 ± 16%  perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      0.71 ±  8%      +0.4        1.11 ± 34%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
      0.83 ±  6%      +1.3        2.10 ± 51%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range
      0.86 ±  6%      +1.3        2.15 ± 50%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range
      0.85 ±  6%      +1.3        2.14 ± 50%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range
      0.98 ±  5%      +1.3        2.32 ± 48%  perf-profile.calltrace.cycles-pp.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range
      0.00            +2.0        1.97 ± 78%  perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.madvise_cold
      0.00            +2.0        1.97 ± 78%  perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_range.madvise_cold.madvise_vma_behavior
      0.00            +2.0        1.97 ± 78%  perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_range.madvise_cold.madvise_vma_behavior.do_madvise
      0.00            +2.0        1.97 ± 78%  perf-profile.calltrace.cycles-pp.walk_page_range.madvise_cold.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      0.00            +2.0        1.98 ± 78%  perf-profile.calltrace.cycles-pp.madvise_cold.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
      0.56 ±  2%      -0.0        0.51 ±  3%  perf-profile.children.cycles-pp.kmem_cache_alloc
      0.36 ±  3%      -0.0        0.34 ±  3%  perf-profile.children.cycles-pp.kthread
      0.36 ±  3%      -0.0        0.34 ±  3%  perf-profile.children.cycles-pp.ret_from_fork
      0.26 ±  2%      -0.0        0.24 ±  3%  perf-profile.children.cycles-pp.vm_area_dup
      0.34 ±  6%      +0.1        0.47 ± 12%  perf-profile.children.cycles-pp.folio_deactivate
      0.06 ± 20%      +0.1        0.19 ± 41%  perf-profile.children.cycles-pp.__free_one_page
      0.26 ± 10%      +0.1        0.40 ± 16%  perf-profile.children.cycles-pp.free_unref_page_list
      0.08 ± 14%      +0.2        0.24 ± 46%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.23 ±  2%      +0.2        0.40 ± 42%  perf-profile.children.cycles-pp.rmap_walk_anon
      0.41 ±  2%      +0.2        0.58 ± 22%  perf-profile.children.cycles-pp.lru_gen_del_folio
      0.07 ±  5%      +0.2        0.25 ± 59%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.09 ±  4%      +0.2        0.29 ± 55%  perf-profile.children.cycles-pp.uncharge_batch
      0.00            +0.2        0.22 ± 80%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge
      2.14 ±  3%      +0.4        2.54 ± 15%  perf-profile.children.cycles-pp.tlb_finish_mmu
      2.01 ±  3%      +0.4        2.42 ± 15%  perf-profile.children.cycles-pp.tlb_batch_pages_flush
      2.06 ±  3%      +0.4        2.48 ± 15%  perf-profile.children.cycles-pp.release_pages
      2.34 ±  3%      +0.7        3.04 ± 21%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.86 ±  5%      +1.3        2.15 ± 50%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.86 ±  6%      +1.3        2.15 ± 50%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      0.98 ±  5%      +1.4        2.33 ± 48%  perf-profile.children.cycles-pp.folio_isolate_lru
      0.41 ±  5%      +1.6        1.98 ± 78%  perf-profile.children.cycles-pp.madvise_cold
      0.00            +2.3        2.28 ±104%  perf-profile.children.cycles-pp.__page_cache_release
      0.00            +2.5        2.47 ±103%  perf-profile.children.cycles-pp.__folio_put
      0.10 ±  7%      +2.6        2.75 ± 97%  perf-profile.children.cycles-pp.__split_huge_page
      0.11 ±  8%      +2.8        2.88 ± 97%  perf-profile.children.cycles-pp.split_huge_page_to_list
      2.15 ±  6%      +3.1        5.29 ± 60%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.16 ±  4%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.kmem_cache_alloc
      0.18 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.walk_pmd_range
      0.14 ±  2%      +0.0        0.16 ±  5%  perf-profile.self.cycles-pp.madvise_cold_or_pageout_pte_range
      0.42 ±  5%      +0.1        0.50 ±  6%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.06 ± 18%      +0.1        0.18 ± 43%  perf-profile.self.cycles-pp.__free_one_page
      0.29 ±  2%      +0.2        0.45 ± 27%  perf-profile.self.cycles-pp.lru_gen_del_folio
      0.07 ±  6%      +0.2        0.24 ± 59%  perf-profile.self.cycles-pp.page_counter_uncharge



View attachment "config-6.5.0-rc4-00026-g2f406263e3e9" of type "text/plain" (229808 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ