lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Dec 2020 16:18:59 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Valentin Schneider <valentin.schneider@....com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...el.com,
        aubrey.li@...ux.intel.com, yu.c.chen@...el.com
Subject: [sched/hotplug]  2558aacff8:  will-it-scale.per_thread_ops -1.6%
 regression


Greeting,

FYI, we noticed a -1.6% regression of will-it-scale.per_thread_ops due to commit:


commit: 2558aacff8586699bcd248b406febb28b0a25de2 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/migrate-disable


in testcase: will-it-scale
on test machine: 144 threads Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: sched_yield
	cpufreq_governor: performance
	ucode: 0x700001e

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/sched_yield/will-it-scale/0x700001e

commit: 
  565790d28b ("sched: Fix balance_callback()")
  2558aacff8 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")

565790d28b1e33ee 2558aacff8586699bcd248b406f 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
          1:4            0%           1:4     perf-profile.children.cycles-pp.error_entry
          0:4            0%           0:4     perf-profile.self.cycles-pp.error_entry
         %stddev     %change         %stddev
             \          |                \  
 4.011e+08            -1.6%  3.945e+08        will-it-scale.144.threads
   2785455            -1.6%    2739520        will-it-scale.per_thread_ops
 4.011e+08            -1.6%  3.945e+08        will-it-scale.workload
     12.05            +2.1       14.18        mpstat.cpu.all.usr%
   1087711 ± 75%     -79.0%     228885 ±  7%  numa-numastat.node1.local_node
   1126029 ± 74%     -74.5%     286894 ±  6%  numa-numastat.node1.numa_hit
     33836            -2.3%      33042        proc-vmstat.nr_slab_reclaimable
     74433            -1.5%      73345        proc-vmstat.nr_slab_unreclaimable
     86.25            -2.3%      84.25        vmstat.cpu.sy
     11.75 ±  3%     +17.0%      13.75 ±  3%  vmstat.cpu.us
    333551 ± 17%     -21.7%     261115 ±  5%  vmstat.system.cs
    329071 ±  3%     -15.4%     278535 ±  4%  sched_debug.cfs_rq:/.spread0.avg
    472614 ±  2%     -11.0%     420678 ±  2%  sched_debug.cfs_rq:/.spread0.max
  17597663 ± 17%     -28.5%   12582107 ± 16%  sched_debug.cpu.nr_switches.max
   1897476 ± 17%     -28.4%    1359264 ± 14%  sched_debug.cpu.nr_switches.stddev
      5628 ±  8%     -10.9%       5012 ±  3%  slabinfo.files_cache.active_objs
      5628 ±  8%     -10.9%       5012 ±  3%  slabinfo.files_cache.num_objs
      3613 ±  2%     -10.9%       3219        slabinfo.kmalloc-rcl-512.active_objs
      3644 ±  2%     -10.9%       3248        slabinfo.kmalloc-rcl-512.num_objs
      3967 ±  4%      -8.3%       3638 ±  2%  slabinfo.sock_inode_cache.active_objs
      3967 ±  4%      -8.3%       3638 ±  2%  slabinfo.sock_inode_cache.num_objs
      0.02 ±  9%     -14.5%       0.02 ±  2%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
     14.28 ± 38%     +48.9%      21.26 ± 24%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      0.02 ± 24%     +38.2%       0.03 ± 14%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
      0.04 ± 13%     -22.8%       0.03 ± 14%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
     47.71 ± 30%     +41.6%      67.54 ± 11%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      3.20 ± 33%     -81.5%       0.59 ± 91%  perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     33.43 ± 27%     +38.4%      46.27 ±  8%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      0.05 ± 43%     -68.9%       0.02 ± 63%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.path_openat
      8.23 ± 10%     -57.9%       3.47 ± 97%  perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     35211 ±169%     -99.6%     129.50 ± 96%  numa-vmstat.node1.nr_active_anon
      8060 ± 17%     -35.0%       5236 ± 31%  numa-vmstat.node1.nr_slab_reclaimable
     35211 ±169%     -99.6%     129.50 ± 96%  numa-vmstat.node1.nr_zone_active_anon
   1053733 ± 53%     -52.7%     498416 ±  7%  numa-vmstat.node1.numa_hit
    946160 ± 58%     -62.7%     352475 ± 12%  numa-vmstat.node1.numa_local
    107572 ± 23%     +35.7%     145940 ±  5%  numa-vmstat.node1.numa_other
      5914 ± 23%     -28.5%       4226 ± 10%  numa-vmstat.node2.nr_slab_reclaimable
     18204 ±  3%     -14.7%      15522 ± 12%  numa-vmstat.node2.nr_slab_unreclaimable
    629428 ±  9%     -14.2%     540085 ±  7%  numa-vmstat.node2.numa_hit
     17302 ± 10%     +26.0%      21807 ±  7%  numa-vmstat.node3.nr_slab_unreclaimable
    140785 ±169%     -99.6%     520.75 ± 95%  numa-meminfo.node1.Active
    140785 ±169%     -99.6%     520.75 ± 95%  numa-meminfo.node1.Active(anon)
     32241 ± 17%     -35.0%      20948 ± 31%  numa-meminfo.node1.KReclaimable
     32241 ± 17%     -35.0%      20948 ± 31%  numa-meminfo.node1.SReclaimable
    101007 ±  5%     -15.7%      85162 ± 13%  numa-meminfo.node1.Slab
     23657 ± 23%     -28.5%      16906 ± 10%  numa-meminfo.node2.KReclaimable
     23657 ± 23%     -28.5%      16906 ± 10%  numa-meminfo.node2.SReclaimable
     72823 ±  3%     -14.7%      62089 ± 12%  numa-meminfo.node2.SUnreclaim
     96481 ±  3%     -18.1%      78996 ± 12%  numa-meminfo.node2.Slab
     69210 ± 10%     +26.0%      87229 ±  7%  numa-meminfo.node3.SUnreclaim
    110579 ±  9%     +26.7%     140158 ± 10%  numa-meminfo.node3.Slab
    388.75 ± 74%   +1147.8%       4851 ±124%  interrupts.33:PCI-MSI.524291-edge.eth0-TxRx-2
      1540 ± 69%     -78.2%     335.75 ± 51%  interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3
    388.75 ± 74%   +1147.8%       4851 ±124%  interrupts.CPU11.33:PCI-MSI.524291-edge.eth0-TxRx-2
    307.50           +66.7%     512.50 ± 50%  interrupts.CPU111.RES:Rescheduling_interrupts
      1540 ± 69%     -78.2%     335.75 ± 51%  interrupts.CPU12.34:PCI-MSI.524292-edge.eth0-TxRx-3
    350.50 ±  8%     -10.3%     314.50 ±  2%  interrupts.CPU122.RES:Rescheduling_interrupts
    424.50 ± 24%     -18.7%     345.00 ± 14%  interrupts.CPU128.RES:Rescheduling_interrupts
      8496           -50.1%       4241        interrupts.CPU29.NMI:Non-maskable_interrupts
      8496           -50.1%       4241        interrupts.CPU29.PMI:Performance_monitoring_interrupts
    314.25            +8.7%     341.50 ±  4%  interrupts.CPU29.RES:Rescheduling_interrupts
      8496           -50.1%       4242        interrupts.CPU30.NMI:Non-maskable_interrupts
      8496           -50.1%       4242        interrupts.CPU30.PMI:Performance_monitoring_interrupts
    311.50           +13.2%     352.75 ±  8%  interrupts.CPU7.RES:Rescheduling_interrupts
     21144 ± 15%     -25.0%      15858 ± 24%  interrupts.CPU72.CAL:Function_call_interrupts
    317.75           +39.2%     442.25 ± 32%  interrupts.CPU82.RES:Rescheduling_interrupts
 8.557e+10            -1.8%  8.399e+10        perf-stat.i.branch-instructions
      0.43            +0.4        0.87        perf-stat.i.branch-miss-rate%
 3.479e+08          +106.5%  7.186e+08        perf-stat.i.branch-misses
    333383 ± 17%     -22.1%     259849 ±  6%  perf-stat.i.context-switches
      1.02            +2.4%       1.05        perf-stat.i.cpi
 1.268e+11            -1.9%  1.243e+11        perf-stat.i.dTLB-loads
 7.506e+10            -1.9%  7.363e+10        perf-stat.i.dTLB-stores
  4.26e+08 ±  2%     -31.3%  2.925e+08        perf-stat.i.iTLB-load-misses
    538538 ± 36%     -79.5%     110207 ± 16%  perf-stat.i.iTLB-loads
 3.983e+11            -1.9%  3.908e+11        perf-stat.i.instructions
    946.16 ±  3%     +43.4%       1356        perf-stat.i.instructions-per-iTLB-miss
      0.99            -2.4%       0.97        perf-stat.i.ipc
      1.22 ±  3%     +30.8%       1.60 ±  3%  perf-stat.i.metric.K/sec
      1996            -1.9%       1958        perf-stat.i.metric.M/sec
      0.41            +0.4        0.86        perf-stat.overall.branch-miss-rate%
      1.01            +2.5%       1.03        perf-stat.overall.cpi
    935.95 ±  2%     +42.8%       1336        perf-stat.overall.instructions-per-iTLB-miss
      0.99            -2.4%       0.97        perf-stat.overall.ipc
 8.527e+10            -1.8%   8.37e+10        perf-stat.ps.branch-instructions
 3.467e+08          +106.5%  7.161e+08        perf-stat.ps.branch-misses
    334637 ± 17%     -21.5%     262794 ±  5%  perf-stat.ps.context-switches
 1.264e+11            -1.9%  1.239e+11        perf-stat.ps.dTLB-loads
  7.48e+10            -1.9%  7.338e+10        perf-stat.ps.dTLB-stores
 4.244e+08 ±  2%     -31.3%  2.915e+08        perf-stat.ps.iTLB-load-misses
    539519 ± 36%     -79.5%     110644 ± 16%  perf-stat.ps.iTLB-loads
 3.969e+11            -1.9%  3.895e+11        perf-stat.ps.instructions
   1.2e+14            -2.0%  1.176e+14        perf-stat.total.instructions
      0.68 ±  2%      -0.1        0.59 ±  3%  perf-profile.calltrace.cycles-pp.orc_find.unwind_next_frame.perf_callchain_kernel.get_perf_callchain.perf_callchain
      0.93            -0.1        0.88 ±  3%  perf-profile.calltrace.cycles-pp.__perf_event_header__init_id.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow.perf_swevent_overflow
      1.22            +0.1        1.30 ±  2%  perf-profile.calltrace.cycles-pp.__orc_find.unwind_next_frame.perf_callchain_kernel.get_perf_callchain.perf_callchain
      1.11            +0.1        1.21 ±  2%  perf-profile.calltrace.cycles-pp.orc_find.unwind_next_frame.__unwind_start.perf_callchain_kernel.get_perf_callchain
      1.51            -0.1        1.44 ±  3%  perf-profile.children.cycles-pp.stack_access_ok
      0.37 ±  3%      -0.1        0.30        perf-profile.children.cycles-pp.__task_pid_nr_ns
      0.47 ±  2%      -0.1        0.41 ±  2%  perf-profile.children.cycles-pp.perf_event_pid_type
      0.30 ±  5%      -0.0        0.25 ± 12%  perf-profile.children.cycles-pp.__list_del_entry_valid
      0.95            -0.0        0.90 ±  3%  perf-profile.children.cycles-pp.__perf_event_header__init_id
      0.10 ± 14%      -0.0        0.07 ± 31%  perf-profile.children.cycles-pp.sched_yield@plt
      0.42            -0.0        0.38 ±  5%  perf-profile.children.cycles-pp.ftrace_graph_ret_addr
      0.10 ±  4%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.is_module_text_address
      0.11 ±  4%      -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.is_ftrace_trampoline
      0.10 ±  4%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.ftrace_ops_trampoline
      0.06 ± 15%      +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.rcu_qs
      0.23 ±  8%      +0.1        0.31 ±  7%  perf-profile.children.cycles-pp.rcu_note_context_switch
      0.35 ±  4%      -0.1        0.29        perf-profile.self.cycles-pp.__task_pid_nr_ns
      1.24            -0.1        1.18 ±  2%  perf-profile.self.cycles-pp.stack_access_ok
      0.23 ±  5%      -0.0        0.18 ± 12%  perf-profile.self.cycles-pp.__list_del_entry_valid
      0.34 ±  4%      -0.0        0.30 ±  2%  perf-profile.self.cycles-pp.perf_tp_event
      0.12 ±  4%      -0.0        0.10 ± 10%  perf-profile.self.cycles-pp.sched_clock_cpu
      0.32            -0.0        0.30 ±  5%  perf-profile.self.cycles-pp.ftrace_graph_ret_addr
      0.08            -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.ftrace_ops_trampoline
      0.34            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.unwind_get_return_address
      0.08            +0.0        0.10 ±  7%  perf-profile.self.cycles-pp.rcu_is_watching
      0.05 ±  8%      +0.0        0.09 ±  7%  perf-profile.self.cycles-pp.rcu_qs
      0.51            +0.0        0.55 ±  3%  perf-profile.self.cycles-pp.bsearch
      0.16 ± 13%      +0.0        0.20 ±  7%  perf-profile.self.cycles-pp.rcu_note_context_switch
      1.44 ±  4%      +0.3        1.76 ± 12%  perf-profile.self.cycles-pp.__sched_yield


                                                                                
                             will-it-scale.per_thread_ops                       
                                                                                
    3e+06 +-----------------------------------------------------------------+   
          | : O +.+.+.+.+.++.+.+.+.+.+ O O O O O O O O O O OO O O O O O O O |   
  2.5e+06 |-:   :                                                           |   
          | :   :                                                           |   
          |  :  :                                                           |   
    2e+06 |-+: :                                                            |   
          |  : :                                                            |   
  1.5e+06 |-+: :                                                            |   
          |  : :                                                            |   
    1e+06 |-+: :                                                            |   
          |  : :                                                            |   
          |   :                                                             |   
   500000 |-+ :                                                             |   
          |   :                                                             |   
        0 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


View attachment "config-5.10.0-rc1-00033-g2558aacff858" of type "text/plain" (171488 bytes)

View attachment "job-script" of type "text/plain" (7947 bytes)

View attachment "job.yaml" of type "text/plain" (5522 bytes)

View attachment "reproduce" of type "text/plain" (343 bytes)

Powered by blists - more mailing lists