lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202310071416.df82eed7-oliver.sang@intel.com>
Date:   Sat, 7 Oct 2023 15:08:48 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Yang Shi <yang@...amperecomputing.com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Oscar Salvador" <osalvador@...e.de>,
        Rafael Aquini <aquini@...hat.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        David Rientjes <rientjes@...gle.com>, <linux-mm@...ck.org>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <fengwei.yin@...el.com>, <oliver.sang@...el.com>
Subject: [linus:master] [mm]  24526268f4:  stress-ng.numa.ops_per_sec 4.7%
 improvement



Hello,

kernel test robot noticed a 4.7% improvement of stress-ng.numa.ops_per_sec on:


commit: 24526268f4e38c9ec0c4a30de4f37ad2a2a84e47 ("mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
parameters:

	nr_threads: 1
	testtime: 60s
	class: cpu
	test: numa
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.numa.ops_per_sec 4.5% improvement                                          |
| test machine     | 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory |
| test parameters  | class=os                                                                                        |
|                  | cpufreq_governor=performance                                                                    |
|                  | disk=1HDD                                                                                       |
|                  | fs=ext4                                                                                         |
|                  | nr_threads=1                                                                                    |
|                  | test=numa                                                                                       |
|                  | testtime=60s                                                                                    |
+------------------+-------------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231007/202310071416.df82eed7-oliver.sang@intel.com

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  cpu/gcc-12/performance/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/numa/stress-ng/60s

commit: 
  45120b1574 ("mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()")
  24526268f4 ("mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified")

45120b15743fa7c0 24526268f4e38c9ec0c4a30de4f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    272.18 ± 77%     -99.9%       0.31 ±220%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
      1089            +4.7%       1141        stress-ng.numa.ops
     18.16            +4.7%      19.01        stress-ng.numa.ops_per_sec
     20387            +5.2%      21456        stress-ng.time.involuntary_context_switches
 2.173e+09            +3.6%  2.251e+09        perf-stat.i.branch-instructions
      0.50            -3.5%       0.48        perf-stat.i.cpi
 1.865e+09            +3.6%  1.932e+09        perf-stat.i.dTLB-loads
  1.06e+10            +3.4%  1.096e+10        perf-stat.i.instructions
      2.02            +3.8%       2.10        perf-stat.i.ipc
    130.34            +3.1%     134.39        perf-stat.i.metric.M/sec
      0.50            -3.6%       0.49        perf-stat.overall.cpi
      1.99            +3.7%       2.06        perf-stat.overall.ipc
 2.139e+09            +3.6%  2.216e+09        perf-stat.ps.branch-instructions
 1.836e+09            +3.6%  1.901e+09        perf-stat.ps.dTLB-loads
 1.043e+10            +3.4%  1.079e+10        perf-stat.ps.instructions
 6.597e+11            +3.4%  6.822e+11        perf-stat.total.instructions
     17.43 ±  5%      -1.9       15.50 ±  2%  perf-profile.calltrace.cycles-pp.queue_folios_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range
     18.49 ±  4%      -1.9       16.61 ±  2%  perf-profile.calltrace.cycles-pp.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range
     19.07 ±  4%      -1.8       17.25 ±  2%  perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range
     19.67 ±  4%      -1.8       17.86 ±  2%  perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.migrate_to_node
      3.76 ±  4%      -0.4        3.33 ±  9%  perf-profile.calltrace.cycles-pp.mt_find.find_vma.queue_pages_test_walk.walk_page_range.migrate_to_node
      3.94 ±  4%      -0.4        3.53 ±  8%  perf-profile.calltrace.cycles-pp.find_vma.queue_pages_test_walk.walk_page_range.migrate_to_node.do_migrate_pages
     17.60 ±  4%      -1.9       15.71 ±  2%  perf-profile.children.cycles-pp.queue_folios_pte_range
     18.50 ±  4%      -1.9       16.63 ±  2%  perf-profile.children.cycles-pp.walk_pmd_range
     19.11 ±  4%      -1.8       17.29 ±  2%  perf-profile.children.cycles-pp.walk_pud_range
     19.69 ±  4%      -1.8       17.88 ±  2%  perf-profile.children.cycles-pp.walk_p4d_range
     20.79 ±  4%      -1.8       19.02 ±  3%  perf-profile.children.cycles-pp.__walk_page_range
      0.08 ± 19%      +0.1        0.15 ± 17%  perf-profile.children.cycles-pp.rcu_all_qs
      0.27 ±  9%      +0.1        0.35 ± 13%  perf-profile.children.cycles-pp.__cond_resched
     11.70 ±  6%      -1.9        9.84        perf-profile.self.cycles-pp.queue_folios_pte_range
      2.01 ± 10%      -0.3        1.72 ±  6%  perf-profile.self.cycles-pp.vm_normal_folio
      0.14 ± 20%      +0.1        0.22 ± 16%  perf-profile.self.cycles-pp.__cond_resched


***************************************************************************************************
lkp-csl-d02: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/numa/stress-ng/60s

commit: 
  45120b1574 ("mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()")
  24526268f4 ("mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified")

45120b15743fa7c0 24526268f4e38c9ec0c4a30de4f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1023 ± 22%     -42.3%     590.75 ± 35%  sched_debug.cpu.nr_switches.min
      1096            +4.5%       1145        stress-ng.numa.ops
     18.26            +4.5%      19.08        stress-ng.numa.ops_per_sec
     20712 ±  2%      +4.6%      21663        stress-ng.time.involuntary_context_switches
      6.57 ± 17%      -1.4        5.17 ± 12%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      5.55 ± 15%      -1.0        4.55 ±  9%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      4.37 ± 17%      -0.8        3.60 ±  8%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
      4.32 ± 17%      -0.7        3.57 ±  8%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      2.54 ± 17%      -0.5        2.08 ± 10%  perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.20 ± 28%      -0.1        0.13 ± 27%  perf-profile.children.cycles-pp.irqtime_account_irq
      0.13 ± 19%      +0.1        0.20 ± 24%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
 2.068e+09            +3.7%  2.143e+09        perf-stat.i.branch-instructions
      0.55            -0.0        0.52        perf-stat.i.branch-miss-rate%
  12019422            -4.1%   11526701        perf-stat.i.branch-misses
      0.50            -3.5%       0.48        perf-stat.i.cpi
 1.767e+09            +3.6%   1.83e+09        perf-stat.i.dTLB-loads
 1.009e+10            +3.5%  1.044e+10        perf-stat.i.instructions
     19534            +2.4%      20010        perf-stat.i.instructions-per-iTLB-miss
      2.03            +3.7%       2.11        perf-stat.i.ipc
    123.98            +3.1%     127.81        perf-stat.i.metric.M/sec
      0.58            -0.0        0.54        perf-stat.overall.branch-miss-rate%
      0.49            -3.6%       0.48        perf-stat.overall.cpi
     17843            +2.3%      18252        perf-stat.overall.instructions-per-iTLB-miss
      2.02            +3.7%       2.10        perf-stat.overall.ipc
 2.035e+09            +3.7%   2.11e+09        perf-stat.ps.branch-instructions
  11834693            -4.1%   11344043        perf-stat.ps.branch-misses
 1.739e+09            +3.6%  1.801e+09        perf-stat.ps.dTLB-loads
    497472            +1.6%     505490        perf-stat.ps.iTLB-loads
 9.932e+09            +3.5%  1.028e+10        perf-stat.ps.instructions
 6.277e+11            +3.7%  6.512e+11        perf-stat.total.instructions





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ