lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202311231611.59bca875-oliver.sang@intel.com>
Date:   Fri, 24 Nov 2023 09:44:44 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Raghavendra K T <raghavendra.kt@....com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>, <ying.huang@...el.com>,
        <feng.tang@...el.com>, <fengwei.yin@...el.com>,
        <aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>,
        <oliver.sang@...el.com>
Subject: [tip:sched/core] [sched/numa]  84db47ca71:
 autonuma-benchmark.numa01_THREAD_ALLOC.seconds -46.2% improvement



Hello,

kernel test robot noticed a -46.2% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on:


commit: 84db47ca7146d7bd00eb5cf2b93989a971c84650 ("sched/numa: Fix mm numa_scan_seq based unconditional scan")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core

testcase: autonuma-benchmark
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	iterations: 4x
	test: numa01_THREAD_ALLOC
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231123/202311231611.59bca875-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
  gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

commit: 
  d6111cf45c ("sched: Use WRITE_ONCE() for p->on_rq")
  84db47ca71 ("sched/numa: Fix mm numa_scan_seq based unconditional scan")

d6111cf45c578728 84db47ca7146d7bd00eb5cf2b93 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1424           -21.6%       1117 ±  2%  uptime.boot
      0.02 ± 38%    +139.7%       0.05 ± 20%  vmstat.procs.b
      0.01 ± 15%      +0.0        0.01 ±  9%  mpstat.cpu.all.iowait%
      0.09 ±  2%      -0.0        0.07 ±  2%  mpstat.cpu.all.soft%
      1.84            +0.4        2.24 ±  4%  mpstat.cpu.all.sys%
      9497 ± 17%     +37.1%      13024 ± 10%  turbostat.C1
 3.161e+08           -20.8%  2.503e+08 ±  2%  turbostat.IRQ
      8.86 ±  8%      -1.7        7.16 ± 14%  turbostat.PKG_%
    646.52            +2.9%     665.41        turbostat.PkgWatt
     52.74           +32.6%      69.93        turbostat.RAMWatt
    258.20           -16.3%     216.21 ±  2%  autonuma-benchmark.numa01.seconds
     78.26           -46.2%      42.10 ±  5%  autonuma-benchmark.numa01_THREAD_ALLOC.seconds
      1381           -22.6%       1069 ±  2%  autonuma-benchmark.time.elapsed_time
      1381           -22.6%       1069 ±  2%  autonuma-benchmark.time.elapsed_time.max
   1090459 ±  2%     -23.1%     838693 ±  3%  autonuma-benchmark.time.involuntary_context_switches
    286141           -23.6%     218671 ±  2%  autonuma-benchmark.time.user_time
      0.00 ±223%  +23983.3%       0.24 ±110%  perf-sched.sch_delay.avg.ms.__cond_resched.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
      0.01 ±223%  +1.3e+05%      15.11 ±179%  perf-sched.sch_delay.max.ms.__cond_resched.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
      3.35 ± 31%     -70.7%       0.98 ±149%  perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
    167.50 ± 32%     -69.8%      50.67 ±102%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      8.50 ± 39%     -80.4%       1.67 ±223%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1.22 ±134%    +215.3%       3.83 ± 30%  perf-sched.wait_time.avg.ms.__cond_resched.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
      1.60 ±136%   +2231.7%      37.39 ±156%  perf-sched.wait_time.max.ms.__cond_resched.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
   1578414            -1.4%    1556033        proc-vmstat.nr_anon_pages
     74441 ± 14%     +85.7%     138261 ±  6%  proc-vmstat.numa_hint_faults
     41582 ± 22%     +91.1%      79461 ±  9%  proc-vmstat.numa_hint_faults_local
     34327 ±  6%    +148.2%      85213 ±  2%  proc-vmstat.numa_huge_pte_updates
   5578006            -8.1%    5123719 ±  4%  proc-vmstat.numa_local
   4649420 ±  4%    +126.1%   10511526 ±  3%  proc-vmstat.numa_pages_migrated
  17706681 ±  6%    +147.2%   43778740 ±  2%  proc-vmstat.numa_pte_updates
   6749037           -10.7%    6027035        proc-vmstat.pgfault
   4649420 ±  4%    +126.1%   10511526 ±  3%  proc-vmstat.pgmigrate_success
    236153           -14.4%     202067 ±  3%  proc-vmstat.pgreuse
      9057 ±  4%    +126.3%      20497 ±  3%  proc-vmstat.thp_migration_success
  30217875 ±  2%     -20.4%   24043125        proc-vmstat.unevictable_pgs_scanned
     10.73 ±  5%    +125.3%      24.17 ± 17%  perf-stat.i.MPKI
 2.427e+08            +4.0%  2.523e+08        perf-stat.i.branch-instructions
     24.45            +3.2       27.63        perf-stat.i.cache-miss-rate%
  14364771 ±  5%    +122.0%   31896201 ± 18%  perf-stat.i.cache-misses
  37679065           +70.9%   64408862 ± 10%  perf-stat.i.cache-references
    545.38            -2.9%     529.43        perf-stat.i.cpi
    221.07            +7.7%     238.10        perf-stat.i.cpu-migrations
    156883 ±  2%     -36.8%      99195        perf-stat.i.cycles-between-cache-misses
 3.331e+08            +3.3%  3.443e+08        perf-stat.i.dTLB-loads
   1031040            +2.6%    1057642        perf-stat.i.dTLB-store-misses
 1.877e+08            +3.5%  1.942e+08        perf-stat.i.dTLB-stores
  1.24e+09            +3.7%  1.286e+09        perf-stat.i.instructions
      0.00 ± 12%     +54.1%       0.00 ± 39%  perf-stat.i.ipc
      2.07           +12.9%       2.33 ±  2%  perf-stat.i.metric.M/sec
      5074 ±  2%     +12.7%       5718        perf-stat.i.minor-faults
     43.56 ±  2%      +4.0       47.59 ±  3%  perf-stat.i.node-load-miss-rate%
    519245 ±  3%     +57.1%     815750 ±  3%  perf-stat.i.node-load-misses
      5074 ±  2%     +12.7%       5718        perf-stat.i.page-faults
     10.68 ±  4%    +124.9%      24.01 ± 18%  perf-stat.overall.MPKI
     37.27 ±  5%     +11.6       48.91 ±  7%  perf-stat.overall.cache-miss-rate%
    504.52            -5.1%     479.03        perf-stat.overall.cpi
     47358 ±  4%     -56.6%      20561 ± 17%  perf-stat.overall.cycles-between-cache-misses
      0.00            +5.4%       0.00        perf-stat.overall.ipc
     42.32 ±  7%      +8.3       50.63 ±  6%  perf-stat.overall.node-load-miss-rate%
 2.384e+08            +4.2%  2.486e+08        perf-stat.ps.branch-instructions
  13020509 ±  5%    +133.4%   30395642 ± 17%  perf-stat.ps.cache-misses
  34948633 ±  2%     +76.6%   61721520 ±  9%  perf-stat.ps.cache-references
    218.01            +7.5%     234.41        perf-stat.ps.cpu-migrations
 3.285e+08            +3.6%  3.402e+08        perf-stat.ps.dTLB-loads
   1021092            +2.9%    1050584        perf-stat.ps.dTLB-store-misses
 1.845e+08            +3.7%  1.914e+08        perf-stat.ps.dTLB-stores
 1.219e+09            +3.9%  1.267e+09        perf-stat.ps.instructions
      4707           +14.8%       5406        perf-stat.ps.minor-faults
    502656 ±  3%     +63.9%     823962 ±  4%  perf-stat.ps.node-load-misses
      4707           +14.8%       5406        perf-stat.ps.page-faults
 1.686e+12           -19.1%  1.363e+12 ±  2%  perf-stat.total.instructions
 1.824e+08 ±  2%     -27.1%   1.33e+08 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.avg
 1.869e+08 ±  2%     -26.9%  1.366e+08 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.max
 1.498e+08 ±  6%     -27.4%  1.087e+08 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.min
   3892383 ±  9%     -20.8%    3081639 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.81 ±  5%     -11.7%       0.72 ±  6%  sched_debug.cfs_rq:/.h_nr_running.min
      4130 ±  3%     +33.5%       5516 ± 14%  sched_debug.cfs_rq:/.load_avg.max
 1.824e+08 ±  2%     -27.1%   1.33e+08 ±  4%  sched_debug.cfs_rq:/.min_vruntime.avg
 1.869e+08 ±  2%     -26.9%  1.366e+08 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
 1.498e+08 ±  6%     -27.4%  1.087e+08 ±  6%  sched_debug.cfs_rq:/.min_vruntime.min
   3892382 ±  9%     -20.8%    3081638 ±  7%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.81 ±  5%     -11.7%       0.72 ±  6%  sched_debug.cfs_rq:/.nr_running.min
     25.69 ± 11%     +33.1%      34.20 ± 20%  sched_debug.cfs_rq:/.removed.util_avg.max
    804.13 ±  6%     -13.2%     697.93 ±  5%  sched_debug.cfs_rq:/.runnable_avg.min
    642.08 ±  7%     -18.7%     522.15 ±  5%  sched_debug.cfs_rq:/.util_avg.min
     30.15 ± 67%   +1148.5%     376.47 ±  4%  sched_debug.cfs_rq:/.util_est_enqueued.avg
    490.48 ± 19%    +140.0%       1177 ± 10%  sched_debug.cfs_rq:/.util_est_enqueued.max
     77.81 ± 47%    +298.2%     309.83 ±  6%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
    840536 ±  3%     -29.6%     592026 ±  6%  sched_debug.cpu.avg_idle.min
    516622 ±  5%     -16.1%     433228 ±  3%  sched_debug.cpu.avg_idle.stddev
    713848 ±  2%     -24.3%     540516 ±  4%  sched_debug.cpu.clock.avg
    715060 ±  2%     -24.3%     541264 ±  4%  sched_debug.cpu.clock.max
    712575 ±  2%     -24.3%     539699 ±  4%  sched_debug.cpu.clock.min
    714.85 ±  8%     -38.1%     442.30 ± 10%  sched_debug.cpu.clock.stddev
    705718 ±  2%     -24.3%     534516 ±  4%  sched_debug.cpu.clock_task.avg
    708387 ±  2%     -24.3%     536010 ±  4%  sched_debug.cpu.clock_task.max
    686460 ±  2%     -24.5%     518080 ±  4%  sched_debug.cpu.clock_task.min
      2041 ± 12%     -32.5%       1377 ±  7%  sched_debug.cpu.clock_task.stddev
     23332 ±  3%     -20.1%      18646 ±  5%  sched_debug.cpu.curr->pid.avg
     26909           -15.7%      22694 ±  2%  sched_debug.cpu.curr->pid.max
     16993 ± 13%     -33.7%      11263 ± 21%  sched_debug.cpu.curr->pid.min
   1393930 ±  3%     -14.1%    1197373 ±  3%  sched_debug.cpu.max_idle_balance_cost.max
    154458 ±  5%     -12.8%     134638 ±  3%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ±  7%     -37.3%       0.00 ± 10%  sched_debug.cpu.next_balance.stddev
      0.82 ±  6%     -12.6%       0.72 ±  9%  sched_debug.cpu.nr_running.min
      7472 ±  2%     -19.9%       5982 ±  3%  sched_debug.cpu.nr_switches.avg
      2879 ±  6%     -15.1%       2445 ±  7%  sched_debug.cpu.nr_switches.min
      5925 ±  5%     -14.3%       5080 ±  4%  sched_debug.cpu.nr_switches.stddev
      6.07 ±  9%     +39.6%       8.48 ±  8%  sched_debug.cpu.nr_uninterruptible.stddev
    712557 ±  2%     -24.3%     539685 ±  4%  sched_debug.cpu_clk
    711344 ±  2%     -24.3%     538474 ±  4%  sched_debug.ktime
      0.15 ± 78%    +111.4%       0.31 ± 33%  sched_debug.rt_rq:.rt_time.avg
     32.86 ± 78%    +111.4%      69.47 ± 33%  sched_debug.rt_rq:.rt_time.max
      2.19 ± 78%    +111.4%       4.63 ± 33%  sched_debug.rt_rq:.rt_time.stddev
    713438 ±  2%     -24.2%     540572 ±  4%  sched_debug.sched_clk
      3.34 ± 33%      -1.5        1.81 ± 18%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.45 ± 39%      -1.1        0.36 ±101%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      1.45 ± 39%      -1.1        0.36 ±101%  perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      1.37 ± 38%      -1.0        0.34 ±101%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      3.04 ± 32%      -1.4        1.60 ± 18%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      2.78 ± 29%      -1.3        1.44 ± 17%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      2.37 ± 28%      -1.2        1.16 ± 23%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      1.95 ± 28%      -1.0        0.97 ± 13%  perf-profile.children.cycles-pp.task_mm_cid_work
      2.17 ± 29%      -1.0        1.19 ± 16%  perf-profile.children.cycles-pp.task_work_run
      0.69 ± 30%      -0.3        0.38 ± 49%  perf-profile.children.cycles-pp.khugepaged
      0.67 ± 31%      -0.3        0.37 ± 49%  perf-profile.children.cycles-pp.khugepaged_scan_mm_slot
      0.67 ± 31%      -0.3        0.37 ± 49%  perf-profile.children.cycles-pp.hpage_collapse_scan_pmd
      0.38 ± 27%      -0.1        0.26 ± 23%  perf-profile.children.cycles-pp.security_file_permission
      0.34 ± 27%      -0.1        0.23 ± 23%  perf-profile.children.cycles-pp.apparmor_file_permission
      0.26 ± 48%      -0.1        0.15 ± 19%  perf-profile.children.cycles-pp.dup_task_struct
      0.24 ± 35%      -0.1        0.15 ± 17%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.11 ± 51%      -0.1        0.04 ± 75%  perf-profile.children.cycles-pp.__vmalloc_node_range
      0.14 ± 33%      -0.1        0.08 ± 14%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.10 ± 32%      -0.0        0.06 ± 19%  perf-profile.children.cycles-pp.move_page_tables
      0.02 ±142%      +0.1        0.10 ± 29%  perf-profile.children.cycles-pp.task_tick_fair
      0.02 ±223%      +0.1        0.13 ± 56%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.04 ±113%      +0.2        0.20 ± 48%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
      0.00            +0.2        0.16 ± 56%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.04 ±107%      +0.2        0.25 ± 42%  perf-profile.children.cycles-pp.scheduler_tick
      0.18 ± 79%      +0.2        0.38 ± 40%  perf-profile.children.cycles-pp.__do_sys_wait4
      0.18 ± 80%      +0.2        0.38 ± 40%  perf-profile.children.cycles-pp.kernel_wait4
      0.17 ± 76%      +0.2        0.38 ± 40%  perf-profile.children.cycles-pp.do_wait
      0.00            +0.3        0.26 ± 77%  perf-profile.children.cycles-pp.intel_idle
      0.06 ±104%      +0.3        0.32 ± 44%  perf-profile.children.cycles-pp.update_process_times
      0.06 ±106%      +0.3        0.33 ± 46%  perf-profile.children.cycles-pp.tick_sched_handle
      0.07 ± 81%      +0.3        0.36 ± 48%  perf-profile.children.cycles-pp.tick_nohz_highres_handler
      0.18 ± 54%      +0.4        0.62 ± 48%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.21 ± 50%      +0.6        0.76 ± 47%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.22 ± 51%      +0.6        0.79 ± 46%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.68 ± 44%      +1.1        1.80 ± 60%  perf-profile.children.cycles-pp.update_sg_lb_stats
      1.46 ± 52%      +1.2        2.62 ± 45%  perf-profile.children.cycles-pp.__schedule
      0.72 ± 46%      +1.2        1.89 ± 60%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.72 ± 46%      +1.2        1.89 ± 60%  perf-profile.children.cycles-pp.find_busiest_group
      0.03 ±141%      +1.3        1.31 ± 75%  perf-profile.children.cycles-pp.start_secondary
      0.01 ±223%      +1.3        1.36 ± 80%  perf-profile.children.cycles-pp.cpuidle_enter
      0.01 ±223%      +1.3        1.36 ± 80%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.77 ± 44%      +1.4        2.15 ± 62%  perf-profile.children.cycles-pp.load_balance
      0.20 ± 61%      +1.4        1.64 ± 77%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.02 ±142%      +1.5        1.50 ± 80%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.04 ±146%      +1.6        1.62 ± 80%  perf-profile.children.cycles-pp.newidle_balance
      0.03 ±141%      +1.6        1.66 ± 79%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
      0.03 ±141%      +1.6        1.66 ± 79%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.03 ±141%      +1.6        1.66 ± 79%  perf-profile.children.cycles-pp.do_idle
      1.94 ± 28%      -1.0        0.95 ± 13%  perf-profile.self.cycles-pp.task_mm_cid_work
      0.13 ± 31%      -0.1        0.08 ± 17%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.00            +0.3        0.26 ± 77%  perf-profile.self.cycles-pp.intel_idle
      0.67 ± 44%      +1.1        1.76 ± 60%  perf-profile.self.cycles-pp.update_sg_lb_stats




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ