lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200715023212.GC3874@shao2-debian>
Date:   Wed, 15 Jul 2020 10:32:13 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Adrian Hunter <adrian.hunter@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: [perf] e17d43b93e: will-it-scale.per_process_ops 5.8% improvement

Greeting,

FYI, we noticed a 5.8% improvement of will-it-scale.per_process_ops due to commit:


commit: e17d43b93e544f5016c0251d2074c15568d5d963 ("perf: Add perf text poke event")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

	nr_task: 50%
	mode: process
	test: signal1
	cpufreq_governor: performance
	ucode: 0x5002f01

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap3/signal1/will-it-scale/0x5002f01

commit: 
  bb85429a9b ("perf/x86/intel/uncore: Add Comet Lake support")
  e17d43b93e ("perf: Add perf text poke event")

bb85429a9bf2e7d3 e17d43b93e544f5016c0251d207 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     63611            +5.8%      67274        will-it-scale.per_process_ops
   6106768            +5.8%    6458348        will-it-scale.workload
      4491 ±  7%  +14935.2%     675232 ±171%  cpuidle.POLL.usage
      1.26 ±  2%      +0.2        1.43        mpstat.cpu.all.usr%
      6376 ± 67%    +195.0%      18808        numa-numastat.node1.other_node
     21360 ±  2%      +9.6%      23410 ±  4%  slabinfo.pid.active_objs
     21361 ±  2%      +9.6%      23410 ±  4%  slabinfo.pid.num_objs
    143377 ±  2%      -3.2%     138781        proc-vmstat.nr_active_anon
    143377 ±  2%      -3.2%     138781        proc-vmstat.nr_zone_active_anon
     40253 ±  2%      -9.3%      36521 ±  6%  proc-vmstat.pgactivate
      5306 ± 13%     +34.2%       7123 ± 31%  softirqs.CPU1.RCU
     21257 ± 80%     -42.9%      12145 ±121%  softirqs.CPU153.SCHED
     29988 ± 37%     -59.0%      12303 ±122%  softirqs.CPU16.SCHED
    359438 ± 10%     -18.4%     293284 ± 14%  numa-meminfo.node1.FilePages
     39267 ± 22%     -34.9%      25548 ± 24%  numa-meminfo.node1.KReclaimable
     39267 ± 22%     -34.9%      25548 ± 24%  numa-meminfo.node1.SReclaimable
     87301 ± 11%     -11.7%      77054 ±  7%  numa-meminfo.node3.SUnreclaim
     89869 ± 10%     -18.4%      73315 ± 14%  numa-vmstat.node1.nr_file_pages
      9817 ± 22%     -34.9%       6387 ± 24%  numa-vmstat.node1.nr_slab_reclaimable
     90821 ±  4%     +14.1%     103653        numa-vmstat.node1.numa_other
     21824 ± 11%     -11.7%      19264 ±  7%  numa-vmstat.node3.nr_slab_unreclaimable
 -11370588           +33.2%  -15141792        sched_debug.cfs_rq:/.spread0.min
     45386 ± 12%     +30.9%      59397 ±  4%  sched_debug.cpu.sched_count.max
      4603 ±  6%     +17.9%       5426 ±  4%  sched_debug.cpu.sched_count.stddev
     23230 ± 12%     +63.1%      37890 ± 38%  sched_debug.cpu.sched_goidle.max
      2434 ±  5%     +35.7%       3302 ± 28%  sched_debug.cpu.sched_goidle.stddev
 6.033e+09            +5.3%  6.351e+09        perf-stat.i.branch-instructions
      0.92            +0.0        0.96        perf-stat.i.branch-miss-rate%
  55838649            +9.6%   61217237        perf-stat.i.branch-misses
      9.96            -4.9%       9.47        perf-stat.i.cpi
 9.222e+09            +5.3%  9.713e+09        perf-stat.i.dTLB-loads
     24424           +10.5%      26979 ±  5%  perf-stat.i.dTLB-store-misses
 5.837e+09            +4.9%  6.122e+09        perf-stat.i.dTLB-stores
 3.025e+10            +5.3%  3.185e+10        perf-stat.i.instructions
    570.56            +5.8%     603.38        perf-stat.i.instructions-per-iTLB-miss
      0.10            +5.2%       0.11        perf-stat.i.ipc
      0.20 ±  3%      +7.8%       0.21 ±  6%  perf-stat.i.metric.K/sec
    110.74            +5.2%     116.49        perf-stat.i.metric.M/sec
   8414841            +3.5%    8707427        perf-stat.i.node-store-misses
      4885 ± 24%     +39.9%       6833 ± 10%  perf-stat.i.node-stores
      0.93            +0.0        0.96        perf-stat.overall.branch-miss-rate%
      9.96            -5.0%       9.46        perf-stat.overall.cpi
    570.27            +5.8%     603.10        perf-stat.overall.instructions-per-iTLB-miss
      0.10            +5.2%       0.11        perf-stat.overall.ipc
 6.013e+09            +5.2%  6.329e+09        perf-stat.ps.branch-instructions
  55660104            +9.6%   61012673        perf-stat.ps.branch-misses
 9.191e+09            +5.3%  9.679e+09        perf-stat.ps.dTLB-loads
     24366           +10.4%      26910 ±  5%  perf-stat.ps.dTLB-store-misses
 5.817e+09            +4.9%  6.101e+09        perf-stat.ps.dTLB-stores
 3.015e+10            +5.3%  3.174e+10        perf-stat.ps.instructions
   8386131            +3.5%    8677133        perf-stat.ps.node-store-misses
      4897 ± 24%     +39.7%       6844 ± 10%  perf-stat.ps.node-stores
 9.097e+12            +5.1%  9.562e+12        perf-stat.total.instructions
      5349 ± 15%     -30.5%       3719 ± 13%  interrupts.CPU112.CAL:Function_call_interrupts
      7185 ± 20%     -45.0%       3951 ± 29%  interrupts.CPU146.NMI:Non-maskable_interrupts
      7185 ± 20%     -45.0%       3951 ± 29%  interrupts.CPU146.PMI:Performance_monitoring_interrupts
      5836 ±  9%     -36.7%       3691 ± 17%  interrupts.CPU172.CAL:Function_call_interrupts
      6135 ±  5%     -33.4%       4088 ± 21%  interrupts.CPU176.CAL:Function_call_interrupts
      4234 ± 17%     +25.1%       5298 ± 11%  interrupts.CPU177.CAL:Function_call_interrupts
      5368 ± 11%     +61.6%       8674        interrupts.CPU177.NMI:Non-maskable_interrupts
      5368 ± 11%     +61.6%       8674        interrupts.CPU177.PMI:Performance_monitoring_interrupts
      4833 ± 12%     -19.1%       3911 ±  9%  interrupts.CPU180.CAL:Function_call_interrupts
      7187 ± 20%     -45.1%       3948 ± 29%  interrupts.CPU182.NMI:Non-maskable_interrupts
      7187 ± 20%     -45.1%       3948 ± 29%  interrupts.CPU182.PMI:Performance_monitoring_interrupts
      7921 ± 16%     -50.2%       3948 ± 29%  interrupts.CPU188.NMI:Non-maskable_interrupts
      7921 ± 16%     -50.2%       3948 ± 29%  interrupts.CPU188.PMI:Performance_monitoring_interrupts
      4174 ± 17%     +48.7%       6206 ± 19%  interrupts.CPU191.CAL:Function_call_interrupts
      6172 ±  5%     -18.3%       5040 ±  8%  interrupts.CPU2.CAL:Function_call_interrupts
      7182 ± 19%     -45.0%       3948 ± 15%  interrupts.CPU20.NMI:Non-maskable_interrupts
      7182 ± 19%     -45.0%       3948 ± 15%  interrupts.CPU20.PMI:Performance_monitoring_interrupts
      6437 ± 19%     -48.5%       3314 ± 46%  interrupts.CPU3.NMI:Non-maskable_interrupts
      6437 ± 19%     -48.5%       3314 ± 46%  interrupts.CPU3.PMI:Performance_monitoring_interrupts
     91.50 ±107%     -98.6%       1.25 ±131%  interrupts.CPU50.RES:Rescheduling_interrupts
      6040 ±  7%     -29.7%       4244 ± 25%  interrupts.CPU53.CAL:Function_call_interrupts
      5800 ± 11%     -35.2%       3757 ± 17%  interrupts.CPU75.CAL:Function_call_interrupts
      7925 ± 16%     -59.3%       3227 ± 47%  interrupts.CPU81.NMI:Non-maskable_interrupts
      7925 ± 16%     -59.3%       3227 ± 47%  interrupts.CPU81.PMI:Performance_monitoring_interrupts
      2.25 ± 96%  +16577.8%     375.25 ±166%  interrupts.CPU81.RES:Rescheduling_interrupts
      6451 ± 19%     -44.3%       3591 ± 44%  interrupts.CPU90.NMI:Non-maskable_interrupts
      6451 ± 19%     -44.3%       3591 ± 44%  interrupts.CPU90.PMI:Performance_monitoring_interrupts
      5714           -42.7%       3273 ± 47%  interrupts.CPU98.NMI:Non-maskable_interrupts
      5714           -42.7%       3273 ± 47%  interrupts.CPU98.PMI:Performance_monitoring_interrupts
      0.56            +0.1        0.69 ±  8%  perf-profile.calltrace.cycles-pp.__fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.58            +0.1        0.72 ±  8%  perf-profile.calltrace.cycles-pp.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
      0.69            +0.2        0.86 ±  8%  perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
      0.66            +0.2        0.83 ±  9%  perf-profile.calltrace.cycles-pp.copy_fpstate_to_sigframe.__setup_rt_frame.do_signal.__prepare_exit_to_usermode.do_syscall_64
      0.80            +0.2        1.01 ±  9%  perf-profile.calltrace.cycles-pp.__setup_rt_frame.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.38 ± 57%      +0.2        0.62 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
      0.38 ± 57%      +0.2        0.62 ±  8%  perf-profile.calltrace.cycles-pp.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
      0.38 ± 57%      +0.2        0.63 ±  9%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.handler
      0.39 ± 57%      +0.3        0.65 ±  9%  perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
      0.13 ±173%      +0.5        0.61 ±  8%  perf-profile.calltrace.cycles-pp.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
      0.00            +0.6        0.61 ±  9%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.raise
      2.08            +0.7        2.79 ±  8%  perf-profile.calltrace.cycles-pp.handler
      2.57 ±  2%      +1.3        3.88 ±  9%  perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal.do_send_sig_info.do_send_specific.do_tkill
      2.56 ±  3%      +1.3        3.88 ±  9%  perf-profile.calltrace.cycles-pp.__sigqueue_free.__dequeue_signal.dequeue_signal.get_signal.do_signal
      2.62 ±  3%      +1.3        3.95 ±  9%  perf-profile.calltrace.cycles-pp.__dequeue_signal.dequeue_signal.get_signal.do_signal.__prepare_exit_to_usermode
      2.62 ±  2%      +1.3        3.96 ±  9%  perf-profile.calltrace.cycles-pp.__send_signal.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill
      2.67 ±  3%      +1.3        4.01 ±  9%  perf-profile.calltrace.cycles-pp.dequeue_signal.get_signal.do_signal.__prepare_exit_to_usermode.do_syscall_64
      2.78 ±  2%      +1.4        4.14 ±  9%  perf-profile.calltrace.cycles-pp.get_signal.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.76 ±  2%      +1.4        4.13 ±  9%  perf-profile.calltrace.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64
      3.60 ±  2%      +1.6        5.17 ±  9%  perf-profile.calltrace.cycles-pp.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
      3.67            +1.6        5.25 ±  9%  perf-profile.calltrace.cycles-pp.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
      0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.__clear_user
      0.11            +0.0        0.13 ±  6%  perf-profile.children.cycles-pp.restore_altstack
      0.08            +0.0        0.11 ±  8%  perf-profile.children.cycles-pp._copy_to_user
      0.09 ±  4%      +0.0        0.13 ±  8%  perf-profile.children.cycles-pp.___might_sleep
      0.14 ±  3%      +0.0        0.18 ±  9%  perf-profile.children.cycles-pp.__task_pid_nr_ns
      0.22 ±  3%      +0.1        0.27 ±  8%  perf-profile.children.cycles-pp.copy_user_generic_unrolled
      0.17            +0.1        0.22 ± 11%  perf-profile.children.cycles-pp.__might_fault
      0.22            +0.1        0.28 ±  7%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.find_task_by_vpid
      0.17 ±  5%      +0.1        0.23 ± 14%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.17 ±  5%      +0.1        0.23 ± 14%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.00            +0.1        0.06 ± 13%  perf-profile.children.cycles-pp.__lock_text_start
      0.21 ±  5%      +0.1        0.29 ± 13%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.26 ±  5%      +0.1        0.33 ± 12%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.36            +0.1        0.45 ±  9%  perf-profile.children.cycles-pp.__set_current_blocked
      0.38 ±  2%      +0.1        0.47 ±  8%  perf-profile.children.cycles-pp.fpu__clear
      0.40            +0.1        0.49 ±  8%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.39            +0.1        0.49 ±  8%  perf-profile.children.cycles-pp._copy_from_user
      0.56            +0.1        0.70 ±  8%  perf-profile.children.cycles-pp.__fpu__restore_sig
      0.54            +0.1        0.67 ±  9%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.51 ±  2%      +0.1        0.66 ±  9%  perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
      0.73            +0.2        0.90 ±  8%  perf-profile.children.cycles-pp.restore_sigcontext
      0.66            +0.2        0.83 ±  8%  perf-profile.children.cycles-pp.copy_fpstate_to_sigframe
      0.81            +0.2        1.01 ±  9%  perf-profile.children.cycles-pp.__setup_rt_frame
      1.08            +0.3        1.34 ±  8%  perf-profile.children.cycles-pp.__x64_sys_rt_sigreturn
      0.77            +0.3        1.07 ±  7%  perf-profile.children.cycles-pp.native_irq_return_iret
      1.36            +0.4        1.80 ±  8%  perf-profile.children.cycles-pp.handler
      2.57 ±  2%      +1.3        3.88 ±  9%  perf-profile.children.cycles-pp.__sigqueue_alloc
      2.56 ±  3%      +1.3        3.88 ±  9%  perf-profile.children.cycles-pp.__sigqueue_free
      2.62 ±  3%      +1.3        3.95 ±  9%  perf-profile.children.cycles-pp.__dequeue_signal
      2.63 ±  2%      +1.3        3.96 ±  9%  perf-profile.children.cycles-pp.__send_signal
      2.67 ±  3%      +1.3        4.01 ±  9%  perf-profile.children.cycles-pp.dequeue_signal
      2.79 ±  2%      +1.3        4.14 ±  9%  perf-profile.children.cycles-pp.get_signal
      2.76 ±  2%      +1.4        4.13 ±  9%  perf-profile.children.cycles-pp.do_send_sig_info
      4.10            +1.7        5.78 ±  9%  perf-profile.children.cycles-pp.do_signal
      4.19            +1.7        5.89 ±  9%  perf-profile.children.cycles-pp.__prepare_exit_to_usermode
      0.06            +0.0        0.08 ±  6%  perf-profile.self.cycles-pp.__set_current_blocked
      0.06            +0.0        0.08 ±  8%  perf-profile.self.cycles-pp.__clear_user
      0.09 ±  4%      +0.0        0.12 ± 10%  perf-profile.self.cycles-pp.___might_sleep
      0.14 ±  3%      +0.0        0.18 ±  9%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.13 ±  3%      +0.0        0.18 ±  9%  perf-profile.self.cycles-pp.__task_pid_nr_ns
      0.21 ±  3%      +0.0        0.26 ±  9%  perf-profile.self.cycles-pp.copy_user_generic_unrolled
      0.23            +0.1        0.28 ±  8%  perf-profile.self.cycles-pp.do_syscall_64
      0.00            +0.1        0.05 ±  9%  perf-profile.self.cycles-pp._copy_from_user
      0.22            +0.1        0.27 ±  7%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.24 ±  2%      +0.1        0.30 ±  8%  perf-profile.self.cycles-pp.fpu__clear
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.__syscall_return_slowpath
      0.29 ±  2%      +0.1        0.36 ±  9%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.40            +0.1        0.49 ±  8%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.37            +0.1        0.47 ±  9%  perf-profile.self.cycles-pp.raise
      0.43            +0.1        0.54 ±  8%  perf-profile.self.cycles-pp.__fpu__restore_sig
      0.50            +0.1        0.63 ±  8%  perf-profile.self.cycles-pp.copy_fpstate_to_sigframe
      0.77 ±  2%      +0.3        1.07 ±  8%  perf-profile.self.cycles-pp.native_irq_return_iret
      2.54 ±  2%      +1.3        3.84 ±  9%  perf-profile.self.cycles-pp.__sigqueue_alloc
      2.54 ±  3%      +1.3        3.87 ±  9%  perf-profile.self.cycles-pp.__sigqueue_free
     21.43            +4.7       26.13 ±  9%  perf-profile.self.cycles-pp.apparmor_task_kill


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  74000 +-------------------------------------------------------------------+   
        |                  O  O  O                       O   O  O           |   
  72000 |-+            O            O  O  O  O  O  O  O                     |   
  70000 |-+O  O  O  O                                                       |   
        |                                                                   |   
  68000 |-+                                                                 |   
        |                                                          O  O  O  |   
  66000 |-+                                                                 |   
        |                                                    +..            |   
  64000 |-+                                                ..   +..+..+     |   
  62000 |-+                                               .                 |   
        |                           +..+..+..+..+..+..+..+                  |   
  60000 |-+                       ..                                        |   
        | .+..+..+..+..+...+..+..+                                          |   
  58000 +-------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


View attachment "config-5.8.0-rc1-00002-ge17d43b93e544" of type "text/plain" (158289 bytes)

View attachment "job-script" of type "text/plain" (7236 bytes)

View attachment "job.yaml" of type "text/plain" (4870 bytes)

View attachment "reproduce" of type "text/plain" (339 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ