lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200305062138.GI5972@shao2-debian>
Date:   Thu, 5 Mar 2020 14:21:38 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Arnd Bergmann <arnd@...db.de>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Deepa Dinamani <deepa.kernel@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: [y2038] 412c53a680: will-it-scale.per_process_ops 11.7% improvement

Greeting,

FYI, we noticed a 11.7% improvement of will-it-scale.per_process_ops due to commit:


commit: 412c53a680a97cb1ae2c0ab60230e193bee86387 ("y2038: remove unused time32 interfaces")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 72 threads Intel(R) Xeon(R) Gold 6139 CPU @ 2.30GHz with 128G memory
with following parameters:

	nr_task: 100%
	mode: process
	test: mmap2
	cpufreq_governor: performance
	ucode: 0x2000065

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------+
| testcase: change | unixbench: unixbench.score 3.8% improvement |
| test machine     | 104 threads Skylake with 192G memory        |
| test parameters  | cpufreq_governor=performance                |
|                  | nr_task=30%                                 |
|                  | runtime=300s                                |
|                  | test=context1                               |
|                  | ucode=0x2000065                             |
+------------------+---------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-skl-2sp7/mmap2/will-it-scale/0x2000065

commit: 
  595abbaff5 ("y2038: remove ktime to/from timespec/timeval conversion")
  412c53a680 ("y2038: remove unused time32 interfaces")

595abbaff5db1214 412c53a680a97cb1ae2c0ab6023 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     16079           +11.7%      17954        will-it-scale.per_process_ops
   1157721           +11.7%    1292722        will-it-scale.workload
    535.67 ±141%    +201.4%       1614        meminfo.Mlocked
      1.01            +0.1        1.12        mpstat.cpu.all.usr%
  61269544 ± 54%    +137.2%  1.453e+08 ± 48%  cpuidle.C1E.time
    143915 ± 36%    +133.1%     335431 ± 57%  cpuidle.C1E.usage
      2261 ±  2%     -15.1%       1920 ±  5%  slabinfo.fsnotify_mark_connector.active_objs
      2261 ±  2%     -15.1%       1920 ±  5%  slabinfo.fsnotify_mark_connector.num_objs
      1649 ±  8%     -12.5%       1443 ±  2%  slabinfo.skbuff_ext_cache.active_objs
      1649 ±  8%     -12.5%       1443 ±  2%  slabinfo.skbuff_ext_cache.num_objs
     14258 ± 12%     +25.5%      17895 ± 13%  numa-meminfo.node0.Mapped
      4132 ± 15%     +20.3%       4970        numa-meminfo.node0.PageTables
      5989 ± 10%     -10.3%       5374 ±  4%  numa-meminfo.node1.KernelStack
     17133 ± 11%     -18.9%      13895 ± 17%  numa-meminfo.node1.Mapped
      4799 ± 12%     -18.1%       3928        numa-meminfo.node1.PageTables
      7936            +1.4%       8050        proc-vmstat.nr_mapped
    133.67 ±141%    +201.5%     403.00        proc-vmstat.nr_mlock
     18121 ±  2%      +5.7%      19153 ±  3%  proc-vmstat.nr_shmem
    743293            +1.2%     752255        proc-vmstat.pgalloc_normal
    700933            +1.4%     710819        proc-vmstat.pgfree
      3601 ± 12%     +27.9%       4605 ± 12%  numa-vmstat.node0.nr_mapped
      1033 ± 15%     +20.3%       1243        numa-vmstat.node0.nr_page_table_pages
    156534 ±  6%     -33.2%     104509 ± 64%  numa-vmstat.node0.numa_other
      5989 ± 10%     -10.3%       5375 ±  4%  numa-vmstat.node1.nr_kernel_stack
     56.00 ±141%    +278.0%     211.67 ± 13%  numa-vmstat.node1.nr_mlock
      1199 ± 12%     -18.1%     982.33        numa-vmstat.node1.nr_page_table_pages
      5818 ± 35%     +43.9%       8370 ±  6%  interrupts.CPU26.NMI:Non-maskable_interrupts
      5818 ± 35%     +43.9%       8370 ±  6%  interrupts.CPU26.PMI:Performance_monitoring_interrupts
    414.67 ± 23%     -24.8%     311.67        interrupts.CPU52.RES:Rescheduling_interrupts
      5819 ± 35%     +43.8%       8366 ±  6%  interrupts.CPU56.NMI:Non-maskable_interrupts
      5819 ± 35%     +43.8%       8366 ±  6%  interrupts.CPU56.PMI:Performance_monitoring_interrupts
      5818 ± 35%     +43.8%       8364 ±  6%  interrupts.CPU59.NMI:Non-maskable_interrupts
      5818 ± 35%     +43.8%       8364 ±  6%  interrupts.CPU59.PMI:Performance_monitoring_interrupts
    471.33 ± 16%     -26.0%     349.00 ±  6%  interrupts.CPU9.RES:Rescheduling_interrupts
    127.33 ±  3%     +11.8%     142.33 ±  2%  interrupts.IWI:IRQ_work_interrupts
     10312 ± 92%     -67.7%       3333 ±  6%  sched_debug.cfs_rq:/.load.stddev
     27.16 ±  7%     -16.2%      22.76 ±  8%  sched_debug.cfs_rq:/.load_avg.avg
     47.12 ±  6%     -17.5%      38.89 ± 10%  sched_debug.cfs_rq:/.load_avg.stddev
      8.99 ± 22%     -46.6%       4.80 ± 41%  sched_debug.cfs_rq:/.removed.load_avg.avg
     37.49 ± 10%     -27.4%      27.22 ± 21%  sched_debug.cfs_rq:/.removed.load_avg.stddev
    413.16 ± 23%     -46.4%     221.38 ± 42%  sched_debug.cfs_rq:/.removed.runnable_sum.avg
      1722 ± 10%     -27.2%       1253 ± 22%  sched_debug.cfs_rq:/.removed.runnable_sum.stddev
      2.71 ± 22%     -41.0%       1.60 ± 16%  sched_debug.cfs_rq:/.removed.util_avg.avg
      1480 ±  4%     +11.4%       1648        sched_debug.cpu.curr->pid.min
      0.19 ±  2%     -13.4%       0.17 ±  7%  sched_debug.cpu.nr_running.stddev
     32567 ±  5%     +13.4%      36931 ±  4%  sched_debug.cpu.nr_switches.max
     62048 ±  5%     +21.0%      75097 ± 10%  softirqs.CPU10.RCU
     60821            +9.1%      66347        softirqs.CPU17.RCU
     59451 ±  3%     +19.2%      70891 ±  2%  softirqs.CPU22.RCU
     59548            +7.6%      64046 ±  2%  softirqs.CPU34.RCU
     59706           +16.4%      69478 ±  6%  softirqs.CPU41.RCU
     61173 ±  4%     +22.1%      74662 ± 12%  softirqs.CPU46.RCU
     59827           +21.6%      72779 ± 14%  softirqs.CPU5.RCU
     60645           +11.0%      67300 ±  5%  softirqs.CPU53.RCU
     58533 ±  2%      +9.0%      63779 ±  2%  softirqs.CPU57.RCU
     60026 ±  2%     +15.7%      69444 ±  5%  softirqs.CPU58.RCU
     61127 ±  2%     +11.4%      68125 ±  4%  softirqs.CPU63.RCU
 4.413e+09            +7.3%  4.733e+09        perf-stat.i.branch-instructions
  19144349            +5.4%   20170738        perf-stat.i.branch-misses
     41.28            -0.5       40.80        perf-stat.i.cache-miss-rate%
  22256771            +3.9%   23133318 ±  2%  perf-stat.i.cache-misses
  53886473            +5.1%   56658720 ±  2%  perf-stat.i.cache-references
     11.77            -7.4%      10.90        perf-stat.i.cpi
      9929            -4.3%       9499 ±  2%  perf-stat.i.cycles-between-cache-misses
      0.05            +0.0        0.05 ±  3%  perf-stat.i.dTLB-load-miss-rate%
   2370068           +11.2%    2635764        perf-stat.i.dTLB-load-misses
 5.229e+09            +7.8%  5.638e+09        perf-stat.i.dTLB-loads
      9837 ±  3%     +11.8%      10995 ±  6%  perf-stat.i.dTLB-store-misses
 1.718e+09           +10.8%  1.903e+09        perf-stat.i.dTLB-stores
     94.23            -9.1       85.12        perf-stat.i.iTLB-load-miss-rate%
   2419173           -33.5%    1607709        perf-stat.i.iTLB-load-misses
    146272 ±  3%     +94.5%     284553 ±  8%  perf-stat.i.iTLB-loads
  1.88e+10            +7.4%  2.019e+10        perf-stat.i.instructions
      7824           +61.4%      12627        perf-stat.i.instructions-per-iTLB-miss
      0.09            +8.0%       0.09        perf-stat.i.ipc
   5549135            +6.7%    5919869        perf-stat.i.node-load-misses
   4803604            +6.0%    5091055        perf-stat.i.node-store-misses
      0.43            -0.0        0.43        perf-stat.overall.branch-miss-rate%
     41.30            -0.5       40.82        perf-stat.overall.cache-miss-rate%
     11.76            -7.2%      10.91        perf-stat.overall.cpi
      9937            -4.1%       9530 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.05            +0.0        0.05        perf-stat.overall.dTLB-load-miss-rate%
     94.30            -9.3       84.96        perf-stat.overall.iTLB-load-miss-rate%
      7770           +61.7%      12560        perf-stat.overall.instructions-per-iTLB-miss
      0.09            +7.8%       0.09        perf-stat.overall.ipc
   4887274            -3.3%    4723803        perf-stat.overall.path-length
 4.398e+09            +7.3%  4.717e+09        perf-stat.ps.branch-instructions
  19086955            +5.4%   20108226        perf-stat.ps.branch-misses
  22182689            +3.9%   23057159 ±  2%  perf-stat.ps.cache-misses
  53712718            +5.2%   56486162 ±  2%  perf-stat.ps.cache-references
   2362866           +11.3%    2630144        perf-stat.ps.dTLB-load-misses
 5.211e+09            +7.8%   5.62e+09        perf-stat.ps.dTLB-loads
      9852 ±  3%     +12.8%      11114 ±  6%  perf-stat.ps.dTLB-store-misses
 1.712e+09           +10.8%  1.897e+09        perf-stat.ps.dTLB-stores
   2411038           -33.5%    1602293        perf-stat.ps.iTLB-load-misses
    145843 ±  3%     +94.6%     283843 ±  8%  perf-stat.ps.iTLB-loads
 1.873e+10            +7.4%  2.013e+10        perf-stat.ps.instructions
   5530540            +6.7%    5900087        perf-stat.ps.node-load-misses
   4787463            +6.0%    5073936        perf-stat.ps.node-store-misses
 5.658e+12            +7.9%  6.107e+12        perf-stat.total.instructions
     47.65            -0.8       46.85        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff
     47.66            -0.8       46.85        perf-profile.calltrace.cycles-pp.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
     49.09            -0.8       48.31        perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     48.84            -0.8       48.06        perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     47.18            -0.8       46.40        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap
     49.24            -0.8       48.47        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
     49.31            -0.8       48.55        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
     46.98            -0.8       46.23        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region
     49.62            -0.7       48.89        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
     49.63            -0.7       48.90        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mmap64
     50.01            -0.7       49.31        perf-profile.calltrace.cycles-pp.mmap64
      1.17            +0.1        1.28        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
      1.27            +0.1        1.40        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
      1.69            +0.2        1.84        perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     46.89            +0.5       47.36        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     46.41            +0.5       46.89        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap
     46.21            +0.5       46.70        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap
      0.00            +0.5        0.53        perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
     49.26            +0.6       49.89        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     49.29            +0.6       49.92        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     48.83            +0.6       49.48        perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     49.59            +0.7       50.26        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     49.60            +0.7       50.27        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.munmap
     49.94            +0.7       50.64        perf-profile.calltrace.cycles-pp.munmap
     47.66            -0.8       46.85        perf-profile.children.cycles-pp.__vm_enough_memory
     49.10            -0.8       48.32        perf-profile.children.cycles-pp.do_mmap
     48.84            -0.8       48.07        perf-profile.children.cycles-pp.mmap_region
     49.24            -0.8       48.48        perf-profile.children.cycles-pp.vm_mmap_pgoff
     49.31            -0.8       48.55        perf-profile.children.cycles-pp.ksys_mmap_pgoff
     50.03            -0.7       49.33        perf-profile.children.cycles-pp.mmap64
     94.55            -0.3       94.21        perf-profile.children.cycles-pp.percpu_counter_add_batch
     93.61            -0.3       93.31        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     93.19            -0.3       92.92        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     99.23            -0.1       99.18        perf-profile.children.cycles-pp.do_syscall_64
     99.26            -0.0       99.21        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.46 ±  2%      -0.0        0.43        perf-profile.children.cycles-pp.vm_area_alloc
      0.43 ±  3%      -0.0        0.40        perf-profile.children.cycles-pp.kmem_cache_alloc
      0.31 ±  2%      -0.0        0.29        perf-profile.children.cycles-pp.apic_timer_interrupt
      0.06            +0.0        0.07        perf-profile.children.cycles-pp.down_write_killable
      0.08 ±  6%      +0.0        0.09        perf-profile.children.cycles-pp.prepend_path
      0.06            +0.0        0.07 ±  6%  perf-profile.children.cycles-pp.shmem_mmap
      0.10 ±  4%      +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.perf_iterate_sb
      0.05            +0.0        0.07        perf-profile.children.cycles-pp.unlink_file_vma
      0.33            +0.0        0.35        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.40 ±  4%      +0.0        0.43        perf-profile.children.cycles-pp.perf_event_mmap
      0.08 ±  5%      +0.0        0.11        perf-profile.children.cycles-pp.free_pgtables
      0.34            +0.0        0.38        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.up_write
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.touch_atime
      0.53            +0.1        0.60        perf-profile.children.cycles-pp.___might_sleep
      1.25            +0.1        1.35        perf-profile.children.cycles-pp.unmap_page_range
      1.28            +0.1        1.40        perf-profile.children.cycles-pp.unmap_vmas
      1.69            +0.2        1.85        perf-profile.children.cycles-pp.unmap_region
     49.29            +0.6       49.93        perf-profile.children.cycles-pp.__x64_sys_munmap
     49.26            +0.6       49.90        perf-profile.children.cycles-pp.__vm_munmap
     48.84            +0.6       49.48        perf-profile.children.cycles-pp.__do_munmap
     49.97            +0.7       50.67        perf-profile.children.cycles-pp.munmap
     93.19            -0.3       92.92        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.42 ±  2%      -0.0        0.38        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.40 ±  3%      -0.0        0.38        perf-profile.self.cycles-pp.kmem_cache_alloc
      0.12            -0.0        0.11        perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.33            +0.0        0.35        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.30            +0.0        0.33        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.61 ±  2%      +0.0        0.66        perf-profile.self.cycles-pp.unmap_page_range
      0.58            +0.1        0.64        perf-profile.self.cycles-pp.do_syscall_64
      0.50            +0.1        0.56        perf-profile.self.cycles-pp.___might_sleep


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  20000 +-------------------------------------------------------------------+   
  18000 |-+O O  O O  O  O O  O  O O  O    O       O O  O    O               |   
        |                                                                   |   
  16000 |..+.+..+.+..+..+.+..+..+.+..+.+..+..+.+..+.+..+..+.+..+..+.+..+    |   
  14000 |-+                                                            :    |   
        |                                                              :   :|   
  12000 |-+                                                             :  :|   
  10000 |-+                                                             :  :|   
   8000 |-+                                                             :  :|   
        |                                                               : : |   
   6000 |-+                                                             : : |   
   4000 |-+                                                             : : |   
        |                                                                :: |   
   2000 |-+                                                              :  |   
      0 +-------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                will-it-scale.workload                          
                                                                                
  1.4e+06 +-----------------------------------------------------------------+   
          |  O O  O O  O O  O O  O O  O    O       O O  O    O              |   
  1.2e+06 |..+.+.. .+..+.+..+.+..+.+..+.  .+..+.+..+.  .+.+..+.+..+.+..+    |   
          |       +                     +.           +.                :    |   
    1e+06 |-+                                                          :    |   
          |                                                             :  :|   
   800000 |-+                                                           :  :|   
          |                                                             :  :|   
   600000 |-+                                                           :  :|   
          |                                                             : : |   
   400000 |-+                                                           : : |   
          |                                                              :: |   
   200000 |-+                                                            :: |   
          |                                                              :  |   
        0 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample

***************************************************************************************************
lkp-skl-fpga01: 104 threads Skylake with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.6/30%/debian-x86_64-20191114.cgz/300s/lkp-skl-fpga01/context1/unixbench/0x2000065

commit: 
  595abbaff5 ("y2038: remove ktime to/from timespec/timeval conversion")
  412c53a680 ("y2038: remove unused time32 interfaces")

595abbaff5db1214 412c53a680a97cb1ae2c0ab6023 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
          1:4          -25%            :4     dmesg.WARNING:at#for_ip_interrupt_entry/0x
         %stddev     %change         %stddev
             \          |                \  
      3093            +3.8%       3209        unixbench.score
    471.47           +36.6%     644.15 ± 40%  unixbench.time.user_time
 3.659e+08            +3.9%  3.801e+08        unixbench.time.voluntary_context_switches
 4.838e+08            +3.8%   5.02e+08        unixbench.workload
   3715604            +3.9%    3859617        vmstat.system.cs
   1493573            -1.1%    1477832        proc-vmstat.numa_local
   1620946            -1.3%    1599963        proc-vmstat.pgalloc_normal
   1586478            -1.6%    1561368        proc-vmstat.pgfree
   4505528 ±  8%     -11.7%    3977138 ±  9%  sched_debug.cfs_rq:/.MIN_vruntime.max
     47683 ±  7%     -11.2%      42356 ±  7%  sched_debug.cfs_rq:/.exec_clock.min
   4505528 ±  8%     -11.7%    3977138 ±  9%  sched_debug.cfs_rq:/.max_vruntime.max
   2750853 ±  7%     -11.4%    2435892 ±  7%  sched_debug.cfs_rq:/.min_vruntime.min
      0.31 ± 13%     -33.1%       0.21 ± 25%  sched_debug.cfs_rq:/.nr_spread_over.avg
      3.18 ± 18%     -28.1%       2.29 ± 22%  sched_debug.cfs_rq:/.nr_spread_over.max
      0.67 ± 11%     -27.6%       0.48 ± 22%  sched_debug.cfs_rq:/.nr_spread_over.stddev
    133566 ± 48%     +65.9%     221588 ± 31%  sched_debug.cfs_rq:/.runnable_weight.max
    899.95 ±173%    +824.1%       8316 ±110%  sched_debug.cpu.max_idle_balance_cost.stddev
    317.01 ±  5%      -9.2%     287.99 ±  7%  sched_debug.cpu.ttwu_local.min
 6.175e+09            +3.2%  6.375e+09        perf-stat.i.branch-instructions
 6.846e+08            +2.4%  7.011e+08        perf-stat.i.cache-references
   3734905            +3.9%    3879483        perf-stat.i.context-switches
 7.582e+09            +3.4%  7.839e+09        perf-stat.i.dTLB-loads
 4.468e+09            +3.6%   4.63e+09        perf-stat.i.dTLB-stores
   4308069            +3.9%    4477501        perf-stat.i.iTLB-load-misses
  28346961            +4.7%   29683898        perf-stat.i.iTLB-loads
 2.823e+10            +3.3%  2.916e+10        perf-stat.i.instructions
   3745153           +21.4%    4545801        perf-stat.i.node-load-misses
     33138           +33.5%      44236        perf-stat.i.node-loads
   3740245            +3.3%    3862099        perf-stat.i.node-store-misses
      3.02            -2.9%       2.94        perf-stat.overall.cpi
      0.33            +3.0%       0.34        perf-stat.overall.ipc
 6.157e+09            +3.3%  6.357e+09        perf-stat.ps.branch-instructions
 6.826e+08            +2.4%  6.991e+08        perf-stat.ps.cache-references
   3723897            +3.9%    3868660        perf-stat.ps.context-switches
  7.56e+09            +3.4%  7.817e+09        perf-stat.ps.dTLB-loads
 4.454e+09            +3.7%  4.617e+09        perf-stat.ps.dTLB-stores
   4295642            +3.9%    4465151        perf-stat.ps.iTLB-load-misses
  28265235            +4.7%   29603109        perf-stat.ps.iTLB-loads
 2.814e+10            +3.3%  2.908e+10        perf-stat.ps.instructions
   3734083           +21.4%    4532943        perf-stat.ps.node-load-misses
     33087           +33.4%      44149        perf-stat.ps.node-loads
   3729103            +3.3%    3851108        perf-stat.ps.node-store-misses
 1.103e+13            +3.3%   1.14e+13        perf-stat.total.instructions
    372.25 ± 48%    +114.4%     798.00 ± 28%  interrupts.41:PCI-MSI.67633156-edge.eth0-TxRx-3
      2920 ±  6%     -12.3%       2561 ± 10%  interrupts.CPU1.NMI:Non-maskable_interrupts
      2920 ±  6%     -12.3%       2561 ± 10%  interrupts.CPU1.PMI:Performance_monitoring_interrupts
      2691 ±  6%     +18.2%       3179 ±  5%  interrupts.CPU103.NMI:Non-maskable_interrupts
      2691 ±  6%     +18.2%       3179 ±  5%  interrupts.CPU103.PMI:Performance_monitoring_interrupts
     33.25 ± 37%    +542.9%     213.75 ± 64%  interrupts.CPU14.TLB:TLB_shootdowns
     71.50 ±115%    +202.8%     216.50 ± 83%  interrupts.CPU16.TLB:TLB_shootdowns
      3048 ± 13%     -26.0%       2256 ± 14%  interrupts.CPU18.NMI:Non-maskable_interrupts
      3048 ± 13%     -26.0%       2256 ± 14%  interrupts.CPU18.PMI:Performance_monitoring_interrupts
      2743 ±  4%     -14.8%       2337 ± 18%  interrupts.CPU19.NMI:Non-maskable_interrupts
      2743 ±  4%     -14.8%       2337 ± 18%  interrupts.CPU19.PMI:Performance_monitoring_interrupts
     26.00 ± 61%   +1373.1%     383.00 ± 50%  interrupts.CPU23.TLB:TLB_shootdowns
    229.00 ± 45%     +87.3%     429.00 ± 29%  interrupts.CPU29.TLB:TLB_shootdowns
    372.25 ± 48%    +114.4%     798.00 ± 28%  interrupts.CPU33.41:PCI-MSI.67633156-edge.eth0-TxRx-3
     28.75 ± 50%    +360.9%     132.50 ± 94%  interrupts.CPU39.TLB:TLB_shootdowns
     41.00 ±101%    +600.0%     287.00 ± 37%  interrupts.CPU49.TLB:TLB_shootdowns
     39.75 ± 79%    +140.9%      95.75 ± 13%  interrupts.CPU50.TLB:TLB_shootdowns
      3103 ±  9%     -16.6%       2589 ± 10%  interrupts.CPU53.NMI:Non-maskable_interrupts
      3103 ±  9%     -16.6%       2589 ± 10%  interrupts.CPU53.PMI:Performance_monitoring_interrupts
    163.75 ± 58%     -58.2%      68.50 ±100%  interrupts.CPU59.TLB:TLB_shootdowns
      2469 ± 20%     +25.1%       3089 ±  6%  interrupts.CPU63.NMI:Non-maskable_interrupts
      2469 ± 20%     +25.1%       3089 ±  6%  interrupts.CPU63.PMI:Performance_monitoring_interrupts
     69.25 ±107%    +210.1%     214.75 ± 57%  interrupts.CPU7.TLB:TLB_shootdowns
     49.75 ± 33%    +165.3%     132.00 ± 74%  interrupts.CPU76.TLB:TLB_shootdowns
     27.75 ± 82%    +272.1%     103.25 ± 66%  interrupts.CPU78.TLB:TLB_shootdowns
      2754 ± 23%     +23.0%       3387 ±  7%  interrupts.CPU84.NMI:Non-maskable_interrupts
      2754 ± 23%     +23.0%       3387 ±  7%  interrupts.CPU84.PMI:Performance_monitoring_interrupts
     22.50 ± 42%    +687.8%     177.25 ±132%  interrupts.CPU84.TLB:TLB_shootdowns
     19.75 ± 35%    +812.7%     180.25 ± 74%  interrupts.CPU85.TLB:TLB_shootdowns
     15.75 ± 18%    +303.2%      63.50 ± 81%  interrupts.CPU99.TLB:TLB_shootdowns
     31.35            -0.6       30.72        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.activate_task
     30.60            -0.6       29.98        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
     34.93            -0.5       34.42        perf-profile.calltrace.cycles-pp.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate
     36.88            -0.5       36.42        perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
     37.21            -0.4       36.77        perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
     37.18            -0.4       36.75        perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
     37.22            -0.4       36.79        perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
     35.18            -0.4       34.79        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     35.30            -0.4       34.93        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     39.42            -0.4       39.06        perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
     39.26            -0.4       38.91        perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
     39.64            -0.3       39.30        perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write
     39.95            -0.3       39.62        perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write
     41.14            -0.3       40.83        perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     41.08            -0.3       40.77        perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64
      0.79 ±  7%      -0.1        0.69 ±  9%  perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
      0.73 ±  8%      -0.1        0.64 ±  9%  perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle
      0.69            +0.0        0.72        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
      0.70 ±  2%      +0.0        0.73        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
      0.75            +0.0        0.78        perf-profile.calltrace.cycles-pp.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry
      0.98 ±  2%      +0.0        1.02        perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__sched_text_start.schedule_idle.do_idle
      1.26            +0.0        1.31        perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule.pipe_read
      1.47            +0.1        1.52        perf-profile.calltrace.cycles-pp.dequeue_task_fair.__sched_text_start.schedule.pipe_read.new_sync_read
      2.38            +0.1        2.50        perf-profile.calltrace.cycles-pp.__sched_text_start.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      2.43            +0.1        2.56        perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
      5.47            +0.1        5.61        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.32            +0.1        3.47        perf-profile.calltrace.cycles-pp.schedule.pipe_read.new_sync_read.vfs_read.ksys_read
      5.29            +0.2        5.46        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.94 ±  2%      +0.2        6.14        perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.83 ±  2%      +0.2        6.04        perf-profile.calltrace.cycles-pp.pipe_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
      3.11 ±  8%      +0.3        3.38        perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.pipe_read.new_sync_read.vfs_read
      0.13 ±173%      +0.4        0.53        perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule
      2.47 ±  9%      +0.4        2.89        perf-profile.calltrace.cycles-pp.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
     30.79            -0.6       30.14        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     32.21            -0.6       31.62        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     34.94            -0.5       34.43        perf-profile.children.cycles-pp.__account_scheduler_latency
     36.94            -0.5       36.48        perf-profile.children.cycles-pp.enqueue_entity
     37.19            -0.4       36.75        perf-profile.children.cycles-pp.enqueue_task_fair
     37.22            -0.4       36.78        perf-profile.children.cycles-pp.activate_task
     37.23            -0.4       36.79        perf-profile.children.cycles-pp.ttwu_do_activate
     39.27            -0.4       38.91        perf-profile.children.cycles-pp.try_to_wake_up
     39.65            -0.4       39.29        perf-profile.children.cycles-pp.__wake_up_common
     39.42            -0.4       39.06        perf-profile.children.cycles-pp.autoremove_wake_function
     39.95            -0.3       39.62        perf-profile.children.cycles-pp.__wake_up_common_lock
     41.16            -0.3       40.85        perf-profile.children.cycles-pp.new_sync_write
     41.08            -0.3       40.77        perf-profile.children.cycles-pp.pipe_write
     41.45            -0.3       41.16        perf-profile.children.cycles-pp.vfs_write
     41.59            -0.3       41.32        perf-profile.children.cycles-pp.ksys_write
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.apparmor_file_permission
      0.17 ±  4%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.tick_nohz_idle_enter
      0.33            +0.0        0.35 ±  3%  perf-profile.children.cycles-pp.select_idle_sibling
      0.41            +0.0        0.43 ±  2%  perf-profile.children.cycles-pp.__next_timer_interrupt
      0.25 ±  3%      +0.0        0.27 ±  3%  perf-profile.children.cycles-pp.rcu_idle_exit
      0.38            +0.0        0.40 ±  2%  perf-profile.children.cycles-pp.__switch_to_asm
      0.49 ±  2%      +0.0        0.51        perf-profile.children.cycles-pp.update_rq_clock
      0.23 ±  3%      +0.0        0.26 ±  3%  perf-profile.children.cycles-pp.common_file_perm
      0.57            +0.0        0.60 ±  2%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.45 ±  2%      +0.0        0.48 ±  2%  perf-profile.children.cycles-pp.stack_trace_consume_entry_nosched
      0.14 ± 13%      +0.0        0.17 ±  7%  perf-profile.children.cycles-pp.clockevents_program_event
      0.70 ±  2%      +0.0        0.73        perf-profile.children.cycles-pp.update_curr
      0.44 ±  4%      +0.0        0.48 ±  2%  perf-profile.children.cycles-pp.copy_page_to_iter
      0.76            +0.0        0.80        perf-profile.children.cycles-pp.tick_nohz_next_event
      1.00 ±  2%      +0.0        1.04        perf-profile.children.cycles-pp.set_next_entity
      0.86            +0.0        0.90        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.94            +0.0        0.98        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.33 ±  2%      +0.0        0.38 ±  3%  perf-profile.children.cycles-pp.security_file_permission
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.tick_nohz_tick_stopped
      1.45 ±  2%      +0.1        1.51        perf-profile.children.cycles-pp.pick_next_task_fair
      0.36 ±  8%      +0.1        0.42 ±  6%  perf-profile.children.cycles-pp.ktime_get
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.__x64_sys_read
      1.25 ±  3%      +0.1        1.31        perf-profile.children.cycles-pp.update_load_avg
      1.56 ±  2%      +0.1        1.63        perf-profile.children.cycles-pp.dequeue_entity
      2.83            +0.1        2.91        perf-profile.children.cycles-pp.arch_stack_walk
      1.78 ±  2%      +0.1        1.88        perf-profile.children.cycles-pp.dequeue_task_fair
      2.45            +0.1        2.57        perf-profile.children.cycles-pp.schedule_idle
      3.33            +0.1        3.47        perf-profile.children.cycles-pp.schedule
      5.94 ±  2%      +0.2        6.14        perf-profile.children.cycles-pp.new_sync_read
      5.85            +0.2        6.06        perf-profile.children.cycles-pp.pipe_read
      6.58 ±  2%      +0.2        6.80        perf-profile.children.cycles-pp.ksys_read
      6.39 ±  2%      +0.2        6.63        perf-profile.children.cycles-pp.vfs_read
      5.70            +0.2        5.95        perf-profile.children.cycles-pp.__sched_text_start
     30.79            -0.6       30.14        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.09            -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.__fsnotify_parent
      0.09            -0.0        0.08        perf-profile.self.cycles-pp.ksys_read
      0.08 ±  5%      +0.0        0.09        perf-profile.self.cycles-pp.in_sched_functions
      0.15 ±  2%      +0.0        0.17 ±  7%  perf-profile.self.cycles-pp.pipe_write
      0.21 ±  2%      +0.0        0.23 ±  3%  perf-profile.self.cycles-pp._find_next_bit
      0.20 ±  2%      +0.0        0.22 ±  3%  perf-profile.self.cycles-pp.common_file_perm
      0.38            +0.0        0.40 ±  2%  perf-profile.self.cycles-pp.__switch_to_asm
      0.04 ± 57%      +0.0        0.06        perf-profile.self.cycles-pp.apparmor_file_permission
      0.30 ±  2%      +0.0        0.32 ±  2%  perf-profile.self.cycles-pp.stack_trace_consume_entry_nosched
      0.14 ±  3%      +0.0        0.16 ±  7%  perf-profile.self.cycles-pp.vfs_read
      0.18 ±  2%      +0.0        0.20 ±  4%  perf-profile.self.cycles-pp.__account_scheduler_latency
      1.57            +0.0        1.61        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      1.15            +0.0        1.18        perf-profile.self.cycles-pp.__sched_text_start
      0.84            +0.0        0.88        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.94            +0.0        0.98        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.ksys_write
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.rcu_idle_exit
      0.18 ± 13%      +0.1        0.23 ± 10%  perf-profile.self.cycles-pp.ktime_get
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.__x64_sys_read
      1.59 ±  2%      +0.1        1.66        perf-profile.self.cycles-pp.do_syscall_64





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


View attachment "config-5.6.0-rc2-00057-g412c53a680a97" of type "text/plain" (203572 bytes)

View attachment "job-script" of type "text/plain" (7578 bytes)

View attachment "job.yaml" of type "text/plain" (5246 bytes)

View attachment "reproduce" of type "text/plain" (309 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ