lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201203051918.GC27350@xsang-OptiPlex-9020>
Date:   Thu, 3 Dec 2020 13:19:18 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Nadav Amit <nadav.amit@...il.com>
Cc:     0day robot <lkp@...el.com>, Jens Axboe <axboe@...nel.dk>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        ying.huang@...el.com, feng.tang@...el.com, zhengjun.xing@...el.com,
        linux-fsdevel@...r.kernel.org, Nadav Amit <namit@...are.com>,
        io-uring@...r.kernel.org, linux-mm@...ck.org
Subject: [fs/userfaultfd]  fec9227821:  will-it-scale.per_process_ops -5.5%
 regression


Greeting,

FYI, we noticed a -5.5% regression of will-it-scale.per_process_ops due to commit:


commit: fec92278217ba01b4a3b9f9ec0f6a392069cdbd0 ("[RFC PATCH 12/13] fs/userfaultfd: kmem-cache for wait-queue objects")
url: https://github.com/0day-ci/linux/commits/Nadav-Amit/fs-userfaultfd-support-iouring-and-polling/20201129-085119
base: https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next

in testcase: will-it-scale
on test machine: 104 threads Skylake with 192G memory
with following parameters:

	nr_task: 50%
	mode: process
	test: brk1
	cpufreq_governor: performance
	ucode: 0x2006a08

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -11.0% regression            |
| test machine     | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
| test parameters  | cpufreq_governor=performance                                              |
|                  | mode=process                                                              |
|                  | nr_task=16                                                                |
|                  | test=brk1                                                                 |
|                  | ucode=0x5003003                                                           |
+------------------+---------------------------------------------------------------------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-skl-fpga01/brk1/will-it-scale/0x2006a08

commit: 
  ddfa740e9c ("fs/userfaultfd: complete write asynchronously")
  fec9227821 ("fs/userfaultfd: kmem-cache for wait-queue objects")

ddfa740e9caf7642 fec92278217ba01b4a3b9f9ec0f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  65219467            -5.5%   61607693        will-it-scale.52.processes
   1254220            -5.5%    1184763        will-it-scale.per_process_ops
  65219467            -5.5%   61607693        will-it-scale.workload
     20.00            -5.0%      19.00        vmstat.cpu.us
     34.22            -4.0%      32.85 ±  2%  boot-time.boot
      3146            -4.3%       3010 ±  2%  boot-time.idle
    654.25 ± 20%     -39.0%     399.25 ± 25%  numa-vmstat.node0.nr_active_anon
    654.25 ± 20%     -39.0%     399.25 ± 25%  numa-vmstat.node0.nr_zone_active_anon
     10140 ±  9%     +27.1%      12889 ± 10%  numa-vmstat.node1.nr_slab_reclaimable
     21388 ±  3%     +13.2%      24204 ±  7%  numa-vmstat.node1.nr_slab_unreclaimable
      1096 ±  8%    +304.6%       4434        slabinfo.dmaengine-unmap-16.active_objs
      1096 ±  8%    +304.6%       4434        slabinfo.dmaengine-unmap-16.num_objs
      4838 ±  4%     -17.0%       4018 ±  3%  slabinfo.eventpoll_pwq.active_objs
      4838 ±  4%     -17.0%       4018 ±  3%  slabinfo.eventpoll_pwq.num_objs
      2689 ± 18%     -37.7%       1675 ± 22%  numa-meminfo.node0.Active
      2617 ± 20%     -38.9%       1599 ± 25%  numa-meminfo.node0.Active(anon)
     40564 ±  9%     +27.1%      51560 ± 10%  numa-meminfo.node1.KReclaimable
     40564 ±  9%     +27.1%      51560 ± 10%  numa-meminfo.node1.SReclaimable
     85552 ±  3%     +13.2%      96818 ±  7%  numa-meminfo.node1.SUnreclaim
    126118 ±  4%     +17.7%     148380 ±  8%  numa-meminfo.node1.Slab
      7.12 ± 17%     -87.9%       0.86 ±100%  sched_debug.cfs_rq:/.removed.load_avg.avg
     33.50 ±  8%     -74.6%       8.50 ±100%  sched_debug.cfs_rq:/.removed.load_avg.stddev
      2.76 ± 27%     -88.1%       0.33 ±102%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     80.58 ±  9%     -60.1%      32.12 ±102%  sched_debug.cfs_rq:/.removed.runnable_avg.max
     13.39 ± 18%     -75.9%       3.23 ±102%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
      2.76 ± 28%     -88.1%       0.33 ±102%  sched_debug.cfs_rq:/.removed.util_avg.avg
     80.58 ±  9%     -60.1%      32.12 ±102%  sched_debug.cfs_rq:/.removed.util_avg.max
     13.39 ± 18%     -75.9%       3.23 ±102%  sched_debug.cfs_rq:/.removed.util_avg.stddev
      1036 ±  8%     +14.0%       1181 ±  8%  sched_debug.cpu.nr_switches.min
    -22.25           -30.3%     -15.50        sched_debug.cpu.nr_uninterruptible.min
      2.50 ± 91%   +7990.0%     202.25 ±166%  interrupts.CPU1.TLB:TLB_shootdowns
    451.00           +12.8%     508.75 ±  5%  interrupts.CPU100.CAL:Function_call_interrupts
    457.50 ±  3%     +12.3%     514.00 ±  8%  interrupts.CPU103.CAL:Function_call_interrupts
     48.75 ±130%     -89.7%       5.00 ±122%  interrupts.CPU15.RES:Rescheduling_interrupts
      3195 ± 18%    +140.3%       7678        interrupts.CPU24.NMI:Non-maskable_interrupts
      3195 ± 18%    +140.3%       7678        interrupts.CPU24.PMI:Performance_monitoring_interrupts
      8.25 ± 41%   +1009.1%      91.50 ± 49%  interrupts.CPU24.RES:Rescheduling_interrupts
    694.25 ± 28%     +89.6%       1316 ± 24%  interrupts.CPU3.CAL:Function_call_interrupts
      3946 ± 46%     +86.3%       7352 ± 12%  interrupts.CPU30.NMI:Non-maskable_interrupts
      3946 ± 46%     +86.3%       7352 ± 12%  interrupts.CPU30.PMI:Performance_monitoring_interrupts
     30.00 ±115%    +200.8%      90.25 ± 51%  interrupts.CPU36.RES:Rescheduling_interrupts
      7.50 ± 14%   +1123.3%      91.75 ± 51%  interrupts.CPU40.RES:Rescheduling_interrupts
     10.50 ± 38%    +590.5%      72.50 ± 60%  interrupts.CPU42.RES:Rescheduling_interrupts
    449.00          +214.1%       1410 ±107%  interrupts.CPU76.CAL:Function_call_interrupts
    448.75           +99.8%     896.75 ± 51%  interrupts.CPU82.CAL:Function_call_interrupts
    453.25           +78.7%     809.75 ± 50%  interrupts.CPU86.CAL:Function_call_interrupts
    456.00          +145.0%       1117 ± 93%  interrupts.CPU90.CAL:Function_call_interrupts
     72.75 ± 82%     -89.7%       7.50 ± 33%  interrupts.CPU92.RES:Rescheduling_interrupts
      2.00 ± 79%   +1737.5%      36.75 ±146%  interrupts.CPU92.TLB:TLB_shootdowns
      5545 ± 32%     +32.6%       7353 ± 12%  interrupts.CPU93.NMI:Non-maskable_interrupts
      5545 ± 32%     +32.6%       7353 ± 12%  interrupts.CPU93.PMI:Performance_monitoring_interrupts
     10.50 ± 10%    +514.3%      64.50 ± 76%  interrupts.CPU93.RES:Rescheduling_interrupts
 2.683e+10            +3.7%  2.781e+10        perf-stat.i.branch-instructions
      0.68            -0.1        0.63        perf-stat.i.branch-miss-rate%
 1.811e+08            -5.2%  1.718e+08        perf-stat.i.branch-misses
      1.12            -4.6%       1.07        perf-stat.i.cpi
      0.17            -0.0        0.15        perf-stat.i.dTLB-load-miss-rate%
  64926279            -5.5%   61335249        perf-stat.i.dTLB-load-misses
 3.779e+10            +5.6%   3.99e+10        perf-stat.i.dTLB-loads
   2.1e+10            +2.7%  2.157e+10        perf-stat.i.dTLB-stores
 1.292e+11            +4.6%  1.352e+11        perf-stat.i.instructions
      1957            +3.7%       2029        perf-stat.i.instructions-per-iTLB-miss
      0.89            +4.8%       0.94        perf-stat.i.ipc
    823.71            +4.3%     858.87        perf-stat.i.metric.M/sec
      0.67            -0.1        0.62        perf-stat.overall.branch-miss-rate%
      1.12            -4.6%       1.07        perf-stat.overall.cpi
      0.17            -0.0        0.15        perf-stat.overall.dTLB-load-miss-rate%
      1933            +3.6%       2004        perf-stat.overall.instructions-per-iTLB-miss
      0.89            +4.8%       0.94        perf-stat.overall.ipc
     82.14            +1.7       83.85        perf-stat.overall.node-store-miss-rate%
    597331           +10.8%     662119        perf-stat.overall.path-length
 2.674e+10            +3.7%  2.772e+10        perf-stat.ps.branch-instructions
 1.804e+08            -5.2%   1.71e+08        perf-stat.ps.branch-misses
  64722645            -5.5%   61153001        perf-stat.ps.dTLB-load-misses
 3.766e+10            +5.6%  3.976e+10        perf-stat.ps.dTLB-loads
 2.093e+10            +2.7%   2.15e+10        perf-stat.ps.dTLB-stores
 1.288e+11            +4.6%  1.347e+11        perf-stat.ps.instructions
 3.896e+13            +4.7%  4.079e+13        perf-stat.total.instructions
     19290 ± 14%     -31.0%      13316 ±  5%  softirqs.CPU13.RCU
     22289 ± 79%     -44.0%      12473 ±110%  softirqs.CPU18.SCHED
     19387 ± 12%     -26.7%      14206 ±  6%  softirqs.CPU21.RCU
     14997 ±  5%     +51.6%      22739 ±  2%  softirqs.CPU24.RCU
     39995 ±  3%     -88.9%       4457        softirqs.CPU24.SCHED
     22221 ± 79%     -73.2%       5963 ± 42%  softirqs.CPU28.SCHED
     18559 ± 24%     -28.7%      13237 ±  7%  softirqs.CPU33.RCU
     16004 ± 19%     +31.9%      21107 ±  4%  softirqs.CPU34.RCU
     22675 ±  7%     -31.0%      15655 ± 18%  softirqs.CPU35.RCU
      4273 ± 17%    +620.7%      30798 ± 48%  softirqs.CPU35.SCHED
     20207 ± 16%     -23.6%      15448 ± 19%  softirqs.CPU37.RCU
     15311 ± 19%     +37.4%      21044 ±  7%  softirqs.CPU4.RCU
     30669 ± 48%     -68.4%       9687 ± 89%  softirqs.CPU40.SCHED
     20195 ± 15%     -23.5%      15442 ± 20%  softirqs.CPU41.RCU
     22191 ± 25%     -37.8%      13806 ± 10%  softirqs.CPU43.RCU
     16782 ± 14%     -21.8%      13122 ±  4%  softirqs.CPU47.RCU
     22290 ±  8%     -22.0%      17381 ± 22%  softirqs.CPU49.RCU
     22338 ± 79%     -79.7%       4526        softirqs.CPU61.SCHED
     30860 ± 49%     -85.3%       4533        softirqs.CPU65.SCHED
     24975 ± 57%     -82.2%       4447        softirqs.CPU73.SCHED
     20318 ±  6%     -39.8%      12236 ±  2%  softirqs.CPU76.RCU
      4615 ±  5%    +761.7%      39773 ±  2%  softirqs.CPU76.SCHED
     21142 ±  3%     -29.2%      14979 ±  9%  softirqs.CPU82.RCU
     13144 ±113%    +199.0%      39305 ±  3%  softirqs.CPU86.SCHED
     39713 ±  4%     -67.4%      12956 ±110%  softirqs.CPU87.SCHED
     17739 ± 16%     -22.2%      13795 ±  4%  softirqs.CPU88.RCU
     18651 ± 15%     -27.5%      13514 ± 11%  softirqs.CPU92.RCU
     30590 ± 48%     -57.5%      12998 ±111%  softirqs.CPU93.SCHED
     15264 ± 17%     +26.7%      19337 ±  5%  softirqs.CPU95.RCU
      1.33 ± 10%      -0.1        1.20 ± 10%  perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.80 ± 11%      -0.1        0.69 ± 11%  perf-profile.calltrace.cycles-pp.security_mmap_addr.get_unmapped_area.do_brk_flags.__x64_sys_brk.do_syscall_64
      0.00            +0.8        0.76 ±  9%  perf-profile.calltrace.cycles-pp.memset_erms.kmem_cache_alloc.userfaultfd_unmap_complete.__x64_sys_brk.do_syscall_64
      0.00            +0.9        0.94 ±  4%  perf-profile.calltrace.cycles-pp.kmem_cache_free.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +2.3        2.29 ± 13%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc.userfaultfd_unmap_complete.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.5        2.51 ± 12%  perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.55 ± 10%      -0.3        0.28 ± 14%  perf-profile.children.cycles-pp.vma_merge
      1.81 ± 10%      -0.2        1.59 ± 10%  perf-profile.children.cycles-pp.get_unmapped_area
      1.72 ± 10%      -0.2        1.54 ± 10%  perf-profile.children.cycles-pp.find_vma
      0.30 ±  9%      -0.1        0.15 ± 11%  perf-profile.children.cycles-pp.cap_capable
      0.82 ± 11%      -0.1        0.70 ± 11%  perf-profile.children.cycles-pp.security_mmap_addr
      0.57 ± 11%      -0.1        0.50 ±  9%  perf-profile.children.cycles-pp.obj_cgroup_charge
      0.32 ± 10%      -0.1        0.25 ± 11%  perf-profile.children.cycles-pp.__vm_enough_memory
      0.32 ± 12%      -0.1        0.26 ±  9%  perf-profile.children.cycles-pp.__x86_retpoline_rax
      0.46 ±  9%      -0.1        0.41 ± 11%  perf-profile.children.cycles-pp.vmacache_find
      0.22 ± 11%      -0.0        0.19 ± 10%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.24 ±  9%      -0.0        0.21 ± 11%  perf-profile.children.cycles-pp.free_pgd_range
      0.00            +0.1        0.08 ± 10%  perf-profile.children.cycles-pp.should_failslab
      2.83 ± 11%      +0.7        3.49 ±  7%  perf-profile.children.cycles-pp.kmem_cache_free
      0.00            +0.8        0.77 ±  9%  perf-profile.children.cycles-pp.memset_erms
      4.08 ± 11%      +1.9        6.03 ± 11%  perf-profile.children.cycles-pp.kmem_cache_alloc
      0.21 ± 10%      +2.3        2.52 ± 12%  perf-profile.children.cycles-pp.userfaultfd_unmap_complete
      0.53 ±  9%      -0.3        0.27 ± 14%  perf-profile.self.cycles-pp.vma_merge
      0.28 ± 11%      -0.1        0.14 ± 11%  perf-profile.self.cycles-pp.cap_capable
      0.99 ± 10%      -0.1        0.88 ± 11%  perf-profile.self.cycles-pp.unmap_page_range
      0.78 ± 11%      -0.1        0.69 ±  9%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.70 ± 11%      -0.1        0.62 ± 10%  perf-profile.self.cycles-pp.vm_area_alloc
      0.41 ± 11%      -0.1        0.34 ± 12%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.55 ± 12%      -0.1        0.49 ±  9%  perf-profile.self.cycles-pp.obj_cgroup_charge
      0.44 ±  9%      -0.1        0.39 ± 11%  perf-profile.self.cycles-pp.vmacache_find
      0.25 ± 12%      -0.1        0.20 ± 10%  perf-profile.self.cycles-pp.__x86_retpoline_rax
      0.36 ± 11%      -0.0        0.31 ± 10%  perf-profile.self.cycles-pp.security_mmap_addr
      0.19 ± 11%      -0.0        0.16 ± 10%  perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.10 ± 12%      -0.0        0.08 ± 13%  perf-profile.self.cycles-pp.__vm_enough_memory
      0.48 ± 10%      +0.1        0.61 ±  9%  perf-profile.self.cycles-pp.cap_vm_enough_memory
      0.00            +0.7        0.73 ± 10%  perf-profile.self.cycles-pp.memset_erms
      1.86 ± 11%      +0.8        2.62 ±  7%  perf-profile.self.cycles-pp.kmem_cache_free
      1.91 ± 11%      +0.8        2.74 ± 12%  perf-profile.self.cycles-pp.kmem_cache_alloc


                                                                                
                              will-it-scale.52.processes                        
                                                                                
   6.6e+07 +----------------------------------------------------------------+   
  6.55e+07 |.+..+.+.+..                          .+..+.+.+..+.+.            |   
           |                            .+..+.+.+               +..+.+.+    |   
   6.5e+07 |-+         +.+.  .+.+.+..+.+                                    |   
  6.45e+07 |-+             +.                                               |   
           |                                                                |   
   6.4e+07 |-+                                                              |   
  6.35e+07 |-+                                                              |   
   6.3e+07 |-+                                                              |   
           |                                                                |   
  6.25e+07 |-+                                                              |   
   6.2e+07 |-+                                                              |   
           |           O   O  O O O           O   O  O O O  O O O  O O O  O |   
  6.15e+07 |-O  O O O    O           O O O  O                               |   
   6.1e+07 +----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                             will-it-scale.per_process_ops                      
                                                                                
  1.27e+06 +----------------------------------------------------------------+   
  1.26e+06 |.+..+.+.+..                          .+..+.+.+..+.+.            |   
           |                            .+..+.+.+               +..+.+.+    |   
  1.25e+06 |-+         +.+.+..+.+.+..+.+                                    |   
  1.24e+06 |-+                                                              |   
           |                                                                |   
  1.23e+06 |-+                                                              |   
  1.22e+06 |-+                                                              |   
  1.21e+06 |-+                                                              |   
           |                                                                |   
   1.2e+06 |-+                                                              |   
  1.19e+06 |-+                                         O        O           |   
           |    O O O  O O O  O O O  O O    O O   O  O   O  O O    O O O  O |   
  1.18e+06 |-O                           O      O                           |   
  1.17e+06 +----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                will-it-scale.workload                          
                                                                                
   6.6e+07 +----------------------------------------------------------------+   
  6.55e+07 |.+..+.+.+..                          .+..+.+.+..+.+.            |   
           |                            .+..+.+.+               +..+.+.+    |   
   6.5e+07 |-+         +.+.  .+.+.+..+.+                                    |   
  6.45e+07 |-+             +.                                               |   
           |                                                                |   
   6.4e+07 |-+                                                              |   
  6.35e+07 |-+                                                              |   
   6.3e+07 |-+                                                              |   
           |                                                                |   
  6.25e+07 |-+                                                              |   
   6.2e+07 |-+                                                              |   
           |           O   O  O O O           O   O  O O O  O O O  O O O  O |   
  6.15e+07 |-O  O O O    O           O O O  O                               |   
   6.1e+07 +----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample

***************************************************************************************************
lkp-csl-2ap2: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/brk1/will-it-scale/0x5003003

commit: 
  ddfa740e9c ("fs/userfaultfd: complete write asynchronously")
  fec9227821 ("fs/userfaultfd: kmem-cache for wait-queue objects")

ddfa740e9caf7642 fec92278217ba01b4a3b9f9ec0f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  46606610           -11.0%   41486565        will-it-scale.16.processes
   2912912           -11.0%    2592909        will-it-scale.per_process_ops
  46606610           -11.0%   41486565        will-it-scale.workload
      0.72            -0.1        0.65        mpstat.cpu.all.usr%
     17741            -4.4%      16964        proc-vmstat.nr_shmem
   -116535          -515.3%     484006 ± 50%  sched_debug.cfs_rq:/.spread0.avg
      1380 ±  6%    +495.6%       8222        slabinfo.dmaengine-unmap-16.active_objs
     32.50 ±  7%    +500.0%     195.00        slabinfo.dmaengine-unmap-16.active_slabs
      1380 ±  6%    +495.6%       8222        slabinfo.dmaengine-unmap-16.num_objs
     32.50 ±  7%    +500.0%     195.00        slabinfo.dmaengine-unmap-16.num_slabs
     11962 ±  7%     -17.3%       9891 ± 12%  softirqs.CPU10.RCU
     10075 ± 23%     +28.9%      12985 ±  4%  softirqs.CPU110.RCU
     42801 ±  4%      -5.8%      40327 ±  2%  softirqs.CPU136.SCHED
     42633 ±  4%     -15.2%      36169 ± 18%  softirqs.CPU137.SCHED
     42786 ±  4%      -6.8%      39864        softirqs.CPU156.SCHED
     11795 ±  8%     -16.6%       9835 ± 11%  softirqs.CPU2.RCU
     42004 ±  4%      -5.9%      39537 ±  3%  softirqs.CPU25.SCHED
     39956 ±  4%     -65.4%      13836 ±110%  softirqs.CPU5.SCHED
      9734 ±  8%     -13.2%       8450 ±  8%  softirqs.CPU68.RCU
     41424 ±  4%     -14.7%      35347 ± 19%  softirqs.CPU87.SCHED
 1.935e+10            -2.0%  1.895e+10        perf-stat.i.branch-instructions
      0.61            +2.3%       0.62        perf-stat.i.cpi
 1.494e+10            -2.9%  1.451e+10        perf-stat.i.dTLB-stores
 9.271e+10            -1.1%   9.17e+10        perf-stat.i.instructions
      1.64            -2.2%       1.61        perf-stat.i.ipc
    320.23            -1.4%     315.65        perf-stat.i.metric.M/sec
      0.61            +2.3%       0.62        perf-stat.overall.cpi
      1.65            -2.2%       1.61        perf-stat.overall.ipc
    601140           +10.9%     666775        perf-stat.overall.path-length
 1.928e+10            -2.0%  1.889e+10        perf-stat.ps.branch-instructions
 1.489e+10            -2.9%  1.446e+10        perf-stat.ps.dTLB-stores
  9.24e+10            -1.1%  9.139e+10        perf-stat.ps.instructions
 2.802e+13            -1.3%  2.766e+13        perf-stat.total.instructions
      0.01 ± 25%    +188.2%       0.02 ± 57%  perf-sched.sch_delay.avg.ms.do_syslog.part.0.kmsg_read.vfs_read
      0.01 ± 15%     -46.6%       0.01 ± 42%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_for_completion.__flush_work.lru_add_drain_all
      0.01 ± 22%    +324.4%       0.05 ± 67%  perf-sched.sch_delay.max.ms.do_syslog.part.0.kmsg_read.vfs_read
      0.01 ± 15%     -43.1%       0.01 ± 41%  perf-sched.sch_delay.max.ms.schedule_timeout.wait_for_completion.__flush_work.lru_add_drain_all
      0.03 ± 23%     -78.0%       0.01 ±173%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
    605.16 ±  7%     +13.3%     685.54 ±  5%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
      4.35 ± 10%     +19.2%       5.19 ±  4%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
     54.50 ±  9%     -18.8%      44.25 ±  5%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
      2295 ± 10%     -17.2%       1900 ±  4%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      0.43 ±143%     -92.6%       0.03 ±173%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
     85.77 ± 63%    +111.9%     181.78 ± 16%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      0.03 ± 23%     -25.0%       0.02 ± 11%  perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
    605.15 ±  7%     +13.3%     685.54 ±  5%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
      4.34 ± 10%     +19.1%       5.17 ±  4%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      4.24 ± 10%     +64.4%       6.97 ± 49%  perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork
     85.73 ± 63%    +112.0%     181.73 ± 16%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      8753           -54.6%       3974 ± 70%  interrupts.CPU101.NMI:Non-maskable_interrupts
      8753           -54.6%       3974 ± 70%  interrupts.CPU101.PMI:Performance_monitoring_interrupts
      1.75 ± 47%   +8342.9%     147.75 ±168%  interrupts.CPU137.RES:Rescheduling_interrupts
    112.75 ±  8%     +40.1%     158.00 ± 19%  interrupts.CPU145.NMI:Non-maskable_interrupts
    112.75 ±  8%     +40.1%     158.00 ± 19%  interrupts.CPU145.PMI:Performance_monitoring_interrupts
      1251 ± 31%    +151.4%       3145 ± 43%  interrupts.CPU149.CAL:Function_call_interrupts
    117.50 ±  7%     +27.7%     150.00 ±  9%  interrupts.CPU159.NMI:Non-maskable_interrupts
    117.50 ±  7%     +27.7%     150.00 ±  9%  interrupts.CPU159.PMI:Performance_monitoring_interrupts
    115.25 ±  9%     -26.7%      84.50 ± 20%  interrupts.CPU161.NMI:Non-maskable_interrupts
    115.25 ±  9%     -26.7%      84.50 ± 20%  interrupts.CPU161.PMI:Performance_monitoring_interrupts
      8756           -50.5%       4334 ± 58%  interrupts.CPU2.NMI:Non-maskable_interrupts
      8756           -50.5%       4334 ± 58%  interrupts.CPU2.PMI:Performance_monitoring_interrupts
    113.75 ±  8%     +26.6%     144.00 ±  8%  interrupts.CPU49.NMI:Non-maskable_interrupts
    113.75 ±  8%     +26.6%     144.00 ±  8%  interrupts.CPU49.PMI:Performance_monitoring_interrupts
     98.75 ± 22%     +44.3%     142.50 ± 19%  interrupts.CPU66.NMI:Non-maskable_interrupts
     98.75 ± 22%     +44.3%     142.50 ± 19%  interrupts.CPU66.PMI:Performance_monitoring_interrupts
      1.50 ±110%   +4266.7%      65.50 ±129%  interrupts.CPU98.RES:Rescheduling_interrupts
    228023 ±  7%     -16.3%     190922 ±  7%  interrupts.NMI:Non-maskable_interrupts
    228023 ±  7%     -16.3%     190922 ±  7%  interrupts.PMI:Performance_monitoring_interrupts
      0.66 ± 31%      +0.2        0.90 ± 30%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.asm_call_sysvec_on_stack.sysvec_apic_timer_interrupt
      0.87 ±  9%      +0.3        1.18 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.brk
      1.06 ± 16%      +0.4        1.42 ± 21%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.asm_call_sysvec_on_stack.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      1.08 ± 16%      +0.4        1.46 ± 22%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.asm_call_sysvec_on_stack.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      1.09 ± 16%      +0.4        1.47 ± 23%  perf-profile.calltrace.cycles-pp.asm_call_sysvec_on_stack.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
      0.00            +0.6        0.58 ±  3%  perf-profile.calltrace.cycles-pp.___might_sleep.kmem_cache_alloc.userfaultfd_unmap_complete.__x64_sys_brk.do_syscall_64
      0.00            +1.7        1.67 ±  3%  perf-profile.calltrace.cycles-pp.memset_erms.kmem_cache_alloc.userfaultfd_unmap_complete.__x64_sys_brk.do_syscall_64
      0.00            +1.8        1.79 ±  4%  perf-profile.calltrace.cycles-pp.kmem_cache_free.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +4.5        4.46 ±  5%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc.userfaultfd_unmap_complete.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +5.0        4.96 ±  4%  perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     47.85 ±  9%      +7.2       55.00 ±  3%  perf-profile.calltrace.cycles-pp.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     49.25 ±  9%      +7.4       56.63 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     51.01 ±  9%      +7.5       58.48 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
      0.44 ± 11%      -0.2        0.27 ±  4%  perf-profile.children.cycles-pp.cap_capable
      0.05 ±  8%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.08            +0.0        0.10 ± 10%  perf-profile.children.cycles-pp.sched_clock
      0.08 ±  6%      +0.0        0.10 ± 10%  perf-profile.children.cycles-pp.native_sched_clock
      0.09 ±  4%      +0.0        0.11 ± 17%  perf-profile.children.cycles-pp.read_tsc
      0.10 ± 14%      +0.0        0.13 ±  8%  perf-profile.children.cycles-pp.lapic_next_deadline
      0.09            +0.0        0.12 ± 10%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.04 ± 57%      +0.0        0.07 ± 17%  perf-profile.children.cycles-pp.get_next_timer_interrupt
      0.00            +0.1        0.05 ±  9%  perf-profile.children.cycles-pp.memset
      0.04 ±115%      +0.1        0.10 ± 31%  perf-profile.children.cycles-pp.tick_nohz_irq_exit
      0.26 ± 18%      +0.1        0.33 ± 11%  perf-profile.children.cycles-pp.clockevents_program_event
      0.04 ± 58%      +0.1        0.16 ±  2%  perf-profile.children.cycles-pp.should_failslab
      0.14 ± 42%      +0.1        0.28 ± 19%  perf-profile.children.cycles-pp.tick_nohz_next_event
      0.21 ± 31%      +0.2        0.36 ±  9%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.54 ± 21%      +0.2        0.72 ± 17%  perf-profile.children.cycles-pp.update_process_times
      0.54 ± 10%      +0.2        0.72 ±  4%  perf-profile.children.cycles-pp.rcu_all_qs
      0.65 ± 20%      +0.2        0.84 ± 17%  perf-profile.children.cycles-pp.tick_sched_timer
      0.56 ± 24%      +0.2        0.76 ± 20%  perf-profile.children.cycles-pp.tick_sched_handle
      0.93 ± 23%      +0.3        1.20 ± 22%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.87 ± 10%      +0.3        1.15 ±  2%  perf-profile.children.cycles-pp.__might_sleep
      1.09 ± 11%      +0.4        1.46 ±  5%  perf-profile.children.cycles-pp._cond_resched
      1.39 ± 13%      +0.4        1.79 ± 16%  perf-profile.children.cycles-pp.hrtimer_interrupt
      1.43 ± 13%      +0.4        1.83 ± 17%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      1.68 ± 13%      +0.5        2.15 ± 19%  perf-profile.children.cycles-pp.asm_call_sysvec_on_stack
      1.94 ± 10%      +0.6        2.54 ±  3%  perf-profile.children.cycles-pp.___might_sleep
      0.00            +1.7        1.67 ±  3%  perf-profile.children.cycles-pp.memset_erms
      4.88 ±  8%      +1.7        6.63 ±  4%  perf-profile.children.cycles-pp.kmem_cache_free
      6.56 ± 10%      +4.6       11.13 ±  3%  perf-profile.children.cycles-pp.kmem_cache_alloc
      0.37 ±  9%      +4.6        4.99 ±  4%  perf-profile.children.cycles-pp.userfaultfd_unmap_complete
     48.02 ±  9%      +7.1       55.14 ±  3%  perf-profile.children.cycles-pp.__x64_sys_brk
     49.49 ±  9%      +7.3       56.81 ±  3%  perf-profile.children.cycles-pp.do_syscall_64
     51.22 ±  9%      +7.4       58.66 ±  3%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.42 ± 11%      -0.2        0.24 ±  6%  perf-profile.self.cycles-pp.cap_capable
      0.07 ±  5%      +0.0        0.09 ± 14%  perf-profile.self.cycles-pp.native_sched_clock
      0.04 ± 57%      +0.0        0.07 ±  7%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.10 ± 14%      +0.0        0.13 ±  8%  perf-profile.self.cycles-pp.lapic_next_deadline
      0.00            +0.1        0.05 ±  9%  perf-profile.self.cycles-pp.memset
      0.01 ±173%      +0.1        0.08 ± 23%  perf-profile.self.cycles-pp.tick_nohz_next_event
      0.34 ± 10%      +0.1        0.41 ±  6%  perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      0.00            +0.1        0.08 ±  5%  perf-profile.self.cycles-pp.should_failslab
      0.40 ±  9%      +0.1        0.48 ±  8%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.37 ± 12%      +0.1        0.49 ±  5%  perf-profile.self.cycles-pp.rcu_all_qs
      0.53 ± 11%      +0.2        0.70 ±  3%  perf-profile.self.cycles-pp._cond_resched
      0.21 ±  8%      +0.2        0.44 ±  6%  perf-profile.self.cycles-pp.do_syscall_64
      0.46 ±  8%      +0.2        0.70 ±  6%  perf-profile.self.cycles-pp.cap_vm_enough_memory
      0.81 ± 10%      +0.3        1.09 ±  2%  perf-profile.self.cycles-pp.__might_sleep
      1.88 ± 10%      +0.6        2.46 ±  3%  perf-profile.self.cycles-pp.___might_sleep
      0.00            +1.6        1.61 ±  3%  perf-profile.self.cycles-pp.memset_erms
      3.19 ± 10%      +1.7        4.86 ±  6%  perf-profile.self.cycles-pp.kmem_cache_alloc
      3.31 ±  7%      +1.7        5.05 ±  4%  perf-profile.self.cycles-pp.kmem_cache_free





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


View attachment "config-5.10.0-rc1-00026-gfec92278217b" of type "text/plain" (170398 bytes)

View attachment "job-script" of type "text/plain" (7702 bytes)

View attachment "job.yaml" of type "text/plain" (5169 bytes)

View attachment "reproduce" of type "text/plain" (336 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ