lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <8f10e63f-c95a-4771-b215-12e2b263d083@default>
Date:   Wed, 25 Nov 2020 10:56:01 -0800 (PST)
From:   Alex Kogan <alex.kogan@...cle.com>
To:     <oliver.sang@...el.com>
Cc:     <tglx@...utronix.de>, <lkp@...ts.01.org>, <ying.huang@...el.com>,
        <lkp@...el.com>, <linux@...linux.org.uk>, <feng.tang@...el.com>,
        <hpa@...or.com>, <dave.dice@...cle.com>, <mingo@...hat.com>,
        <will.deacon@....com>, <arnd@...db.de>, <jglauber@...vell.com>,
        <guohanjun@...wei.com>, <x86@...nel.org>,
        <zhengjun.xing@...el.com>, <daniel.m.jordan@...cle.com>,
        <steven.sistare@...cle.com>, <bp@...en8.de>,
        <linux-arm-kernel@...ts.infradead.org>, <longman@...hat.com>,
        <linux-kernel@...r.kernel.org>, <peterz@...radead.org>,
        <linux-arch@...r.kernel.org>
Subject: Re: [locking/qspinlock]  6f9a39a437:  unixbench.score -17.3%
 regression

Oliver, thank you for this report.

All, with nr_task=30%, the benchmark hits the sweet spot on the contention curve 
amplifying the overhead of shuffling threads between waiting queues without 
reaping the locality overhead. I was able to reproduce the regression on our 
machine, though to a lesser extent of about 10% of the performance drop for 
the given test.

Luckily, we have a solution for this exact scenario, which we call the 
shuffle reduction optimization, or SRO. It was a part of the series until v9, 
but since it did not provide much benefit in my benchmarks in v10, it was 
dropped. Now, with SRO, the regression on unixbench shrinks to about 1%, 
while other performance numbers do not change much.

I attach the SRO patch here. IMHO, it is pretty straight-forward. 
It uses randomization, but only to throttle the creation of a secondary queue.
In particular, it does not introduce any extra delays for threads waiting
in that queue once it is created.

Anyway, any feedback is welcome!
Unless I hear any objections, I will plan to post another version of the series 
with SRO included.

Thanks,
-- Alex

----- Original Message -----
From: oliver.sang@...el.com
To: alex.kogan@...cle.com
Cc: linux@...linux.org.uk, peterz@...radead.org, mingo@...hat.com, will.deacon@....com, arnd@...db.de, longman@...hat.com, linux-arch@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org, tglx@...utronix.de, bp@...en8.de, hpa@...or.com, x86@...nel.org, guohanjun@...wei.com, jglauber@...vell.com, steven.sistare@...cle.com, daniel.m.jordan@...cle.com, alex.kogan@...cle.com, dave.dice@...cle.com, lkp@...el.com, lkp@...ts.01.org, ying.huang@...el.com, feng.tang@...el.com, zhengjun.xing@...el.com
Sent: Sunday, November 22, 2020 4:33:52 AM GMT -05:00 US/Canada Eastern
Subject: [locking/qspinlock]  6f9a39a437:  unixbench.score -17.3% regression


Greeting,

FYI, we noticed a -17.3% regression of unixbench.score due to commit:


commit: 6f9a39a4372e37907ac1fc7ede6c90932a88d174 ("[PATCH v12 5/5] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA")
url: https://urldefense.com/v3/__https://github.com/0day-ci/linux/commits/Alex-Kogan/Add-NUMA-awareness-to-qspinlock/20201118-072506__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZiGJB2Kl$ 
base: https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZn0AlnmE$  932f8c64d38bb08f69c8c26a2216ba0c36c6daa8

in testcase: unixbench
on test machine: 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory
with following parameters:

	runtime: 300s
	nr_task: 30%
	test: context1
	cpufreq_governor: performance
	ucode: 0x4003003

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://urldefense.com/v3/__https://github.com/kdlucas/byte-unixbench__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZlLfqDIS$ 



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://urldefense.com/v3/__https://github.com/intel/lkp-tests.git__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZjvM7lRy$ 
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp4/context1/unixbench/0x4003003

commit: 
  eaf522d564 ("locking/qspinlock: Introduce starvation avoidance into CNA")
  6f9a39a437 ("locking/qspinlock: Avoid moving certain threads between waiting queues in CNA")

eaf522d56432e0e5 6f9a39a4372e37907ac1fc7ede6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      3715           -17.3%       3070        unixbench.score
     11584           +13.2%      13118        unixbench.time.involuntary_context_switches
      1830            +4.7%       1916        unixbench.time.percent_of_cpu_this_job_got
      7012            +5.1%       7373        unixbench.time.system_time
    141.44           -15.6%     119.37        unixbench.time.user_time
 4.338e+08           -16.4%  3.627e+08        unixbench.time.voluntary_context_switches
 5.807e+08           -17.5%  4.793e+08        unixbench.workload
    139.00 ± 67%     -71.0%      40.25        numa-vmstat.node1.nr_mlock
      1.08            -0.1        0.94        mpstat.cpu.all.irq%
      0.48 ±  2%      -0.1        0.40        mpstat.cpu.all.usr%
    956143 ±  7%     +11.0%    1060959 ±  3%  numa-meminfo.node0.MemUsed
   1185909 ±  5%      -8.8%    1081277 ±  3%  numa-meminfo.node1.MemUsed
   4402315           -16.3%    3682692        vmstat.system.cs
    235535            -4.6%     224625        vmstat.system.in
  6.42e+09           +16.4%  7.471e+09        cpuidle.C1.time
 1.941e+10 ±  7%     -20.0%  1.553e+10 ± 21%  cpuidle.C1E.time
  94497227 ±  5%     -63.8%   34185071 ± 15%  cpuidle.C1E.usage
  2.62e+08 ±  8%     -90.1%   26020649        cpuidle.POLL.time
  81581001 ±  9%     -96.1%    3221876        cpuidle.POLL.usage
     84602 ±  3%     +12.7%      95329 ±  5%  softirqs.CPU65.SCHED
     86631 ±  5%     +10.9%      96057 ±  6%  softirqs.CPU67.SCHED
     81448 ±  3%     +12.6%      91708        softirqs.CPU70.SCHED
     99715            +8.1%     107808 ±  2%  softirqs.CPU75.SCHED
     91997 ±  4%     +15.5%     106236 ±  2%  softirqs.CPU81.SCHED
    417904 ±  6%     +43.6%     600289 ± 16%  sched_debug.cfs_rq:/.MIN_vruntime.avg
   3142033            +9.7%    3446986 ±  4%  sched_debug.cfs_rq:/.MIN_vruntime.max
    969106           +20.4%    1166681 ±  8%  sched_debug.cfs_rq:/.MIN_vruntime.stddev
     44659 ± 12%     +21.1%      54091 ±  3%  sched_debug.cfs_rq:/.exec_clock.min
     12198 ± 12%     +24.5%      15181 ±  9%  sched_debug.cfs_rq:/.load.avg
    417904 ±  6%     +43.6%     600289 ± 16%  sched_debug.cfs_rq:/.max_vruntime.avg
   3142033            +9.7%    3446986 ±  4%  sched_debug.cfs_rq:/.max_vruntime.max
    969106           +20.4%    1166681 ±  8%  sched_debug.cfs_rq:/.max_vruntime.stddev
   1926443 ± 12%     +25.6%    2419565 ±  3%  sched_debug.cfs_rq:/.min_vruntime.min
      0.41 ±  2%     +16.3%       0.47 ±  3%  sched_debug.cfs_rq:/.nr_running.avg
    322.15 ±  2%     +13.5%     365.49 ±  4%  sched_debug.cfs_rq:/.util_est_enqueued.avg
     58399 ± 49%     -62.5%      21882 ± 74%  sched_debug.cpu.avg_idle.min
      3.74 ± 14%     -20.1%       2.99 ±  3%  sched_debug.cpu.clock.stddev
     20770 ± 50%     -65.0%       7271 ± 39%  sched_debug.cpu.max_idle_balance_cost.stddev
   8250432           -16.5%    6887763        sched_debug.cpu.nr_switches.avg
  11243220 ±  4%     -21.5%    8826971        sched_debug.cpu.nr_switches.max
   1603956 ± 26%     -52.5%     761566 ±  4%  sched_debug.cpu.nr_switches.stddev
   8248654           -16.5%    6885987        sched_debug.cpu.sched_count.avg
  11240496 ±  4%     -21.5%    8823964        sched_debug.cpu.sched_count.max
   1603802 ± 26%     -52.5%     761522 ±  4%  sched_debug.cpu.sched_count.stddev
   4123397           -16.5%    3441927        sched_debug.cpu.sched_goidle.avg
   5619132 ±  4%     -21.5%    4410755        sched_debug.cpu.sched_goidle.max
    801761 ± 26%     -52.5%     380727 ±  4%  sched_debug.cpu.sched_goidle.stddev
   4124921           -16.5%    3443709        sched_debug.cpu.ttwu_count.avg
   5620396 ±  4%     -21.5%    4412427        sched_debug.cpu.ttwu_count.max
    801796 ± 26%     -52.5%     380615 ±  4%  sched_debug.cpu.ttwu_count.stddev
  7.45e+09           -14.3%  6.382e+09        perf-stat.i.branch-instructions
      1.33            -0.1        1.24        perf-stat.i.branch-miss-rate%
  91615750           -22.0%   71469356        perf-stat.i.branch-misses
      3.80            +2.5        6.31 ± 13%  perf-stat.i.cache-miss-rate%
   8753636 ±  4%    +109.7%   18358392        perf-stat.i.cache-misses
 7.691e+08           -14.2%  6.597e+08        perf-stat.i.cache-references
   4428060           -16.4%    3704052        perf-stat.i.context-switches
      2.87           +11.2%       3.20        perf-stat.i.cpi
 8.789e+10            -5.6%  8.294e+10        perf-stat.i.cpu-cycles
     16303 ±  7%     -74.2%       4204 ±  2%  perf-stat.i.cycles-between-cache-misses
  8.94e+09           -14.0%  7.685e+09        perf-stat.i.dTLB-loads
 4.951e+09           -16.2%  4.149e+09        perf-stat.i.dTLB-stores
  57458394           -17.3%   47543962        perf-stat.i.iTLB-load-misses
  30827890           -15.9%   25930501        perf-stat.i.iTLB-loads
 3.327e+10           -14.6%  2.842e+10        perf-stat.i.instructions
    581.15            +3.3%     600.28        perf-stat.i.instructions-per-iTLB-miss
      0.36            -9.4%       0.33        perf-stat.i.ipc
      0.92            -5.6%       0.86        perf-stat.i.metric.GHz
      1.01 ±  4%     +17.6%       1.18 ±  4%  perf-stat.i.metric.K/sec
    230.75           -14.6%     197.02        perf-stat.i.metric.M/sec
     87.41            +8.0       95.42        perf-stat.i.node-load-miss-rate%
   1718045 ±  3%    +125.3%    3871440        perf-stat.i.node-load-misses
    227252 ±  3%     -71.5%      64814 ± 10%  perf-stat.i.node-loads
   1686277 ±  4%    +120.6%    3720452        perf-stat.i.node-store-misses
      1.23            -0.1        1.12        perf-stat.overall.branch-miss-rate%
      1.14 ±  5%      +1.6        2.78        perf-stat.overall.cache-miss-rate%
      2.64           +10.5%       2.92        perf-stat.overall.cpi
     10070 ±  4%     -55.1%       4519        perf-stat.overall.cycles-between-cache-misses
    579.14            +3.2%     597.84        perf-stat.overall.instructions-per-iTLB-miss
      0.38            -9.5%       0.34        perf-stat.overall.ipc
     88.31           +10.0       98.35        perf-stat.overall.node-load-miss-rate%
     97.96            +1.3       99.24        perf-stat.overall.node-store-miss-rate%
     22430            +3.3%      23175        perf-stat.overall.path-length
 7.434e+09           -14.4%  6.365e+09        perf-stat.ps.branch-instructions
  91428244           -22.0%   71275228        perf-stat.ps.branch-misses
   8723893 ±  4%    +109.8%   18304568        perf-stat.ps.cache-misses
 7.674e+08           -14.3%  6.578e+08        perf-stat.ps.cache-references
   4418679           -16.4%    3693530        perf-stat.ps.context-switches
  8.77e+10            -5.7%  8.271e+10        perf-stat.ps.cpu-cycles
 8.921e+09           -14.1%  7.664e+09        perf-stat.ps.dTLB-loads
  4.94e+09           -16.3%  4.137e+09        perf-stat.ps.dTLB-stores
  57330404           -17.3%   47408036        perf-stat.ps.iTLB-load-misses
  30765981           -15.9%   25859786        perf-stat.ps.iTLB-loads
  3.32e+10           -14.6%  2.834e+10        perf-stat.ps.instructions
   1712299 ±  3%    +125.4%    3860240        perf-stat.ps.node-load-misses
    226568 ±  3%     -71.4%      64722 ± 10%  perf-stat.ps.node-loads
   1680387 ±  4%    +120.8%    3709583        perf-stat.ps.node-store-misses
 1.302e+13           -14.7%  1.111e+13        perf-stat.total.instructions
   3591158 ±  5%     -25.1%    2688593        interrupts.CAL:Function_call_interrupts
      2328 ± 19%     +42.8%       3323 ±  3%  interrupts.CPU0.NMI:Non-maskable_interrupts
      2328 ± 19%     +42.8%       3323 ±  3%  interrupts.CPU0.PMI:Performance_monitoring_interrupts
    110354 ±  9%     -20.0%      88244 ±  4%  interrupts.CPU0.RES:Rescheduling_interrupts
    128508 ± 14%     -27.1%      93721 ±  3%  interrupts.CPU1.RES:Rescheduling_interrupts
      2180 ± 30%     +47.0%       3205 ± 15%  interrupts.CPU10.NMI:Non-maskable_interrupts
      2180 ± 30%     +47.0%       3205 ± 15%  interrupts.CPU10.PMI:Performance_monitoring_interrupts
    133107 ±  8%     -25.7%      98924 ±  2%  interrupts.CPU10.RES:Rescheduling_interrupts
    133955 ± 13%     -28.9%      95305 ±  6%  interrupts.CPU11.RES:Rescheduling_interrupts
    129709 ± 10%     -24.9%      97452 ±  8%  interrupts.CPU12.RES:Rescheduling_interrupts
    130073 ± 10%     -21.2%     102507 ±  2%  interrupts.CPU13.RES:Rescheduling_interrupts
    136313 ± 10%     -27.4%      99010 ±  3%  interrupts.CPU14.RES:Rescheduling_interrupts
    139937 ±  7%     -29.9%      98077 ±  7%  interrupts.CPU15.RES:Rescheduling_interrupts
    143424 ± 11%     -28.4%     102678 ±  7%  interrupts.CPU16.RES:Rescheduling_interrupts
    138084 ± 10%     -25.7%     102625 ±  5%  interrupts.CPU17.RES:Rescheduling_interrupts
    136238 ±  6%     -26.3%     100366 ±  7%  interrupts.CPU18.RES:Rescheduling_interrupts
    140011 ± 10%     -28.4%     100232 ±  4%  interrupts.CPU19.RES:Rescheduling_interrupts
    129720 ±  7%     -28.8%      92405 ±  7%  interrupts.CPU2.RES:Rescheduling_interrupts
     43177 ± 33%     -34.6%      28234 ±  5%  interrupts.CPU20.CAL:Function_call_interrupts
    143060 ±  6%     -28.5%     102289 ±  7%  interrupts.CPU20.RES:Rescheduling_interrupts
     39911 ± 20%     -30.4%      27788 ±  4%  interrupts.CPU21.CAL:Function_call_interrupts
    144644 ±  9%     -27.6%     104676 ±  6%  interrupts.CPU21.RES:Rescheduling_interrupts
     38543 ± 21%     -35.1%      25019 ± 14%  interrupts.CPU22.CAL:Function_call_interrupts
    144984 ±  7%     -29.9%     101700 ±  2%  interrupts.CPU22.RES:Rescheduling_interrupts
     37835 ± 15%     -22.9%      29155 ±  5%  interrupts.CPU23.CAL:Function_call_interrupts
      2089 ± 19%     +70.6%       3565 ± 20%  interrupts.CPU23.NMI:Non-maskable_interrupts
      2089 ± 19%     +70.6%       3565 ± 20%  interrupts.CPU23.PMI:Performance_monitoring_interrupts
    130192 ±  7%     -22.1%     101416 ±  5%  interrupts.CPU23.RES:Rescheduling_interrupts
     37142 ±  6%     -32.8%      24974 ±  6%  interrupts.CPU24.CAL:Function_call_interrupts
    142384 ±  5%     -31.7%      97277 ±  6%  interrupts.CPU24.RES:Rescheduling_interrupts
     32664 ±  9%     -22.2%      25422 ±  6%  interrupts.CPU25.CAL:Function_call_interrupts
    141175 ±  5%     -31.2%      97084 ±  2%  interrupts.CPU25.RES:Rescheduling_interrupts
     31023 ± 21%     -24.8%      23330 ±  7%  interrupts.CPU26.CAL:Function_call_interrupts
    131921 ±  4%     -28.9%      93831 ±  3%  interrupts.CPU26.RES:Rescheduling_interrupts
     32946 ± 19%     -26.2%      24303 ±  5%  interrupts.CPU27.CAL:Function_call_interrupts
    144853 ±  4%     -35.7%      93190 ±  2%  interrupts.CPU27.RES:Rescheduling_interrupts
    136419 ±  4%     -31.3%      93690        interrupts.CPU28.RES:Rescheduling_interrupts
     36609 ± 20%     -35.3%      23696 ±  5%  interrupts.CPU29.CAL:Function_call_interrupts
    145284 ± 10%     -36.1%      92871        interrupts.CPU29.RES:Rescheduling_interrupts
    122699 ±  7%     -23.8%      93459 ± 10%  interrupts.CPU3.RES:Rescheduling_interrupts
    250.50 ± 40%     -79.9%      50.25 ± 99%  interrupts.CPU3.TLB:TLB_shootdowns
     35689 ± 19%     -36.1%      22793 ± 11%  interrupts.CPU30.CAL:Function_call_interrupts
    152345 ±  4%     -40.3%      90991 ±  3%  interrupts.CPU30.RES:Rescheduling_interrupts
     33895 ± 10%     -15.1%      28774 ±  8%  interrupts.CPU31.CAL:Function_call_interrupts
    150590 ±  5%     -35.5%      97092 ±  7%  interrupts.CPU31.RES:Rescheduling_interrupts
     50156 ± 28%     -45.8%      27170 ±  7%  interrupts.CPU32.CAL:Function_call_interrupts
      3757 ±  7%     -43.6%       2120 ± 32%  interrupts.CPU32.NMI:Non-maskable_interrupts
      3757 ±  7%     -43.6%       2120 ± 32%  interrupts.CPU32.PMI:Performance_monitoring_interrupts
    150142 ±  3%     -36.3%      95673        interrupts.CPU32.RES:Rescheduling_interrupts
     39957 ± 25%     -34.5%      26158 ±  4%  interrupts.CPU33.CAL:Function_call_interrupts
    147066 ±  8%     -34.4%      96521 ±  2%  interrupts.CPU33.RES:Rescheduling_interrupts
    168.25 ±137%     -86.9%      22.00 ± 59%  interrupts.CPU33.TLB:TLB_shootdowns
     38357 ± 13%     -29.9%      26881 ±  5%  interrupts.CPU34.CAL:Function_call_interrupts
      3757 ±  5%     -28.5%       2686 ± 19%  interrupts.CPU34.NMI:Non-maskable_interrupts
      3757 ±  5%     -28.5%       2686 ± 19%  interrupts.CPU34.PMI:Performance_monitoring_interrupts
    140734 ±  2%     -33.3%      93841 ±  3%  interrupts.CPU34.RES:Rescheduling_interrupts
     37965 ± 17%     -25.8%      28175 ±  4%  interrupts.CPU35.CAL:Function_call_interrupts
      3934 ±  8%     -39.3%       2389 ± 13%  interrupts.CPU35.NMI:Non-maskable_interrupts
      3934 ±  8%     -39.3%       2389 ± 13%  interrupts.CPU35.PMI:Performance_monitoring_interrupts
    146074 ± 10%     -33.2%      97630 ±  2%  interrupts.CPU35.RES:Rescheduling_interrupts
     34131 ±  8%     -18.8%      27704 ±  9%  interrupts.CPU36.CAL:Function_call_interrupts
    149093 ±  3%     -35.0%      96945 ±  4%  interrupts.CPU36.RES:Rescheduling_interrupts
     44333 ± 47%     -39.7%      26745 ±  7%  interrupts.CPU37.CAL:Function_call_interrupts
    149936 ±  4%     -34.3%      98542 ±  3%  interrupts.CPU37.RES:Rescheduling_interrupts
     41199 ± 28%     -30.2%      28741 ±  6%  interrupts.CPU38.CAL:Function_call_interrupts
    154224 ±  3%     -31.6%     105443 ±  7%  interrupts.CPU38.RES:Rescheduling_interrupts
     36925 ±  8%     -24.3%      27942 ±  5%  interrupts.CPU39.CAL:Function_call_interrupts
    150490 ±  2%     -32.5%     101625 ±  4%  interrupts.CPU39.RES:Rescheduling_interrupts
    122742 ± 15%     -25.4%      91596 ±  5%  interrupts.CPU4.RES:Rescheduling_interrupts
    143639 ±  9%     -29.4%     101407 ±  2%  interrupts.CPU40.RES:Rescheduling_interrupts
     43235 ± 10%     -30.9%      29877 ±  4%  interrupts.CPU41.CAL:Function_call_interrupts
    158981 ±  5%     -32.8%     106760 ±  4%  interrupts.CPU41.RES:Rescheduling_interrupts
     47792 ± 33%     -37.7%      29769 ±  5%  interrupts.CPU42.CAL:Function_call_interrupts
      3455 ± 11%     -32.2%       2343 ± 36%  interrupts.CPU42.NMI:Non-maskable_interrupts
      3455 ± 11%     -32.2%       2343 ± 36%  interrupts.CPU42.PMI:Performance_monitoring_interrupts
    160241 ±  5%     -34.0%     105793 ±  4%  interrupts.CPU42.RES:Rescheduling_interrupts
     54419 ± 52%     -44.1%      30408 ±  2%  interrupts.CPU43.CAL:Function_call_interrupts
      3726 ± 11%     -38.7%       2285 ± 39%  interrupts.CPU43.NMI:Non-maskable_interrupts
      3726 ± 11%     -38.7%       2285 ± 39%  interrupts.CPU43.PMI:Performance_monitoring_interrupts
    156010           -32.4%     105516 ±  2%  interrupts.CPU43.RES:Rescheduling_interrupts
     69033 ± 79%     -56.0%      30393 ±  7%  interrupts.CPU44.CAL:Function_call_interrupts
    152478 ±  6%     -30.4%     106187 ±  4%  interrupts.CPU44.RES:Rescheduling_interrupts
     49434 ± 49%     -38.5%      30404 ±  9%  interrupts.CPU45.CAL:Function_call_interrupts
    153770 ±  7%     -32.2%     104200 ±  3%  interrupts.CPU45.RES:Rescheduling_interrupts
     56303 ± 52%     -50.4%      27914 ±  4%  interrupts.CPU46.CAL:Function_call_interrupts
      3924 ± 20%     -48.7%       2012 ± 50%  interrupts.CPU46.NMI:Non-maskable_interrupts
      3924 ± 20%     -48.7%       2012 ± 50%  interrupts.CPU46.PMI:Performance_monitoring_interrupts
    152891 ± 11%     -31.7%     104494 ±  5%  interrupts.CPU46.RES:Rescheduling_interrupts
     42970 ± 30%     -29.9%      30107 ±  9%  interrupts.CPU47.CAL:Function_call_interrupts
      3940 ±  8%     -40.8%       2332 ± 38%  interrupts.CPU47.NMI:Non-maskable_interrupts
      3940 ±  8%     -40.8%       2332 ± 38%  interrupts.CPU47.PMI:Performance_monitoring_interrupts
    146615 ±  5%     -27.7%     106013 ±  4%  interrupts.CPU47.RES:Rescheduling_interrupts
    146863 ±  5%     -18.4%     119774 ±  3%  interrupts.CPU48.RES:Rescheduling_interrupts
    136692 ±  8%     -16.3%     114405 ±  7%  interrupts.CPU49.RES:Rescheduling_interrupts
     29311 ±  6%     -12.4%      25673 ±  4%  interrupts.CPU5.CAL:Function_call_interrupts
    129497 ±  7%     -27.1%      94375 ±  6%  interrupts.CPU5.RES:Rescheduling_interrupts
    143797 ± 11%     -21.0%     113564 ±  4%  interrupts.CPU50.RES:Rescheduling_interrupts
      2891 ± 16%     +31.3%       3797 ± 12%  interrupts.CPU51.NMI:Non-maskable_interrupts
      2891 ± 16%     +31.3%       3797 ± 12%  interrupts.CPU51.PMI:Performance_monitoring_interrupts
    139766 ±  2%     -19.6%     112352 ±  8%  interrupts.CPU51.RES:Rescheduling_interrupts
    137319 ±  4%     -20.3%     109422 ±  5%  interrupts.CPU52.RES:Rescheduling_interrupts
    138705 ±  5%     -21.3%     109158 ±  8%  interrupts.CPU53.RES:Rescheduling_interrupts
      2426 ± 28%     +42.8%       3464 ± 19%  interrupts.CPU54.NMI:Non-maskable_interrupts
      2426 ± 28%     +42.8%       3464 ± 19%  interrupts.CPU54.PMI:Performance_monitoring_interrupts
    140683 ± 11%     -24.0%     106919 ±  4%  interrupts.CPU54.RES:Rescheduling_interrupts
     38238 ± 13%     -22.9%      29493 ±  6%  interrupts.CPU55.CAL:Function_call_interrupts
      3043 ±  8%     +18.7%       3612 ±  7%  interrupts.CPU55.NMI:Non-maskable_interrupts
      3043 ±  8%     +18.7%       3612 ±  7%  interrupts.CPU55.PMI:Performance_monitoring_interrupts
    143657 ± 10%     -25.0%     107806 ±  6%  interrupts.CPU55.RES:Rescheduling_interrupts
    131036 ±  8%     -21.3%     103177 ±  4%  interrupts.CPU56.RES:Rescheduling_interrupts
    131204 ± 12%     -21.2%     103444 ± 10%  interrupts.CPU57.RES:Rescheduling_interrupts
    122041 ± 12%     -15.9%     102674 ±  7%  interrupts.CPU58.RES:Rescheduling_interrupts
    167.25 ± 65%     -64.7%      59.00 ±157%  interrupts.CPU58.TLB:TLB_shootdowns
      1883 ± 33%     +61.6%       3042 ±  3%  interrupts.CPU6.NMI:Non-maskable_interrupts
      1883 ± 33%     +61.6%       3042 ±  3%  interrupts.CPU6.PMI:Performance_monitoring_interrupts
    132101 ± 12%     -27.0%      96457 ±  8%  interrupts.CPU6.RES:Rescheduling_interrupts
      1832 ± 24%     +69.3%       3102 ± 32%  interrupts.CPU64.NMI:Non-maskable_interrupts
      1832 ± 24%     +69.3%       3102 ± 32%  interrupts.CPU64.PMI:Performance_monitoring_interrupts
    107979 ±  8%     -11.6%      95452        interrupts.CPU66.RES:Rescheduling_interrupts
     97965 ±  3%     -15.1%      83199 ±  2%  interrupts.CPU69.RES:Rescheduling_interrupts
    126380 ± 11%     -24.6%      95257 ±  5%  interrupts.CPU7.RES:Rescheduling_interrupts
      1820 ± 40%     +60.9%       2929 ± 35%  interrupts.CPU70.NMI:Non-maskable_interrupts
      1820 ± 40%     +60.9%       2929 ± 35%  interrupts.CPU70.PMI:Performance_monitoring_interrupts
    171279 ±  5%     -29.4%     120994 ±  5%  interrupts.CPU72.RES:Rescheduling_interrupts
     50761 ± 40%     -35.0%      32979 ±  7%  interrupts.CPU73.CAL:Function_call_interrupts
    173132 ±  7%     -31.5%     118555 ±  5%  interrupts.CPU73.RES:Rescheduling_interrupts
     43479 ± 17%     -25.8%      32276 ±  3%  interrupts.CPU74.CAL:Function_call_interrupts
      3755 ±  9%     -31.7%       2564 ± 31%  interrupts.CPU74.NMI:Non-maskable_interrupts
      3755 ±  9%     -31.7%       2564 ± 31%  interrupts.CPU74.PMI:Performance_monitoring_interrupts
    167124 ±  7%     -28.8%     119063 ±  4%  interrupts.CPU74.RES:Rescheduling_interrupts
    164069 ±  7%     -26.6%     120499 ±  4%  interrupts.CPU75.RES:Rescheduling_interrupts
    166858 ±  6%     -28.4%     119453 ±  4%  interrupts.CPU76.RES:Rescheduling_interrupts
    157535 ±  6%     -25.5%     117419 ±  4%  interrupts.CPU77.RES:Rescheduling_interrupts
    165642 ±  8%     -25.9%     122719 ±  8%  interrupts.CPU78.RES:Rescheduling_interrupts
    162781 ±  5%     -29.0%     115600 ±  3%  interrupts.CPU79.RES:Rescheduling_interrupts
    132224 ± 11%     -26.6%      97010        interrupts.CPU8.RES:Rescheduling_interrupts
    167082 ±  9%     -30.7%     115794 ±  4%  interrupts.CPU80.RES:Rescheduling_interrupts
     49639 ± 37%     -35.1%      32228 ±  2%  interrupts.CPU81.CAL:Function_call_interrupts
    144305 ±  5%     -18.3%     117926 ±  4%  interrupts.CPU81.RES:Rescheduling_interrupts
    151333 ±  7%     -23.2%     116159 ±  3%  interrupts.CPU82.RES:Rescheduling_interrupts
    142398 ±  8%     -21.1%     112399 ±  7%  interrupts.CPU83.RES:Rescheduling_interrupts
    144455 ±  2%     -20.5%     114911        interrupts.CPU84.RES:Rescheduling_interrupts
    149850 ±  9%     -24.3%     113396 ±  5%  interrupts.CPU85.RES:Rescheduling_interrupts
     34458 ±  4%     -14.4%      29487 ±  8%  interrupts.CPU86.CAL:Function_call_interrupts
    138603 ±  6%     -22.7%     107133 ±  2%  interrupts.CPU86.RES:Rescheduling_interrupts
     39228 ±  7%     -25.5%      29231 ±  4%  interrupts.CPU87.CAL:Function_call_interrupts
    151814 ±  8%     -31.1%     104629 ±  5%  interrupts.CPU87.RES:Rescheduling_interrupts
    137356 ±  8%     -20.2%     109634 ±  3%  interrupts.CPU88.RES:Rescheduling_interrupts
    143613 ± 10%     -28.9%     102166 ± 10%  interrupts.CPU89.RES:Rescheduling_interrupts
    122375 ±  8%     -19.2%      98901 ±  3%  interrupts.CPU9.RES:Rescheduling_interrupts
    140781 ±  6%     -25.0%     105531 ±  3%  interrupts.CPU90.RES:Rescheduling_interrupts
    138917 ± 12%     -24.9%     104264 ±  5%  interrupts.CPU91.RES:Rescheduling_interrupts
    146814 ± 14%     -29.2%     103902 ±  4%  interrupts.CPU92.RES:Rescheduling_interrupts
    132220 ± 15%     -21.3%     104095 ±  2%  interrupts.CPU93.RES:Rescheduling_interrupts
    133.00 ± 88%     -87.6%      16.50 ± 72%  interrupts.CPU93.TLB:TLB_shootdowns
    125991 ±  5%     -19.0%     101995 ±  2%  interrupts.CPU94.RES:Rescheduling_interrupts
    115838 ±  9%     -17.2%      95959 ±  3%  interrupts.CPU95.RES:Rescheduling_interrupts
  13255498 ±  2%     -25.6%    9859155        interrupts.RES:Rescheduling_interrupts
      7.59 ±  2%      -1.5        6.04        perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.43 ±  2%      -1.5        5.91        perf-profile.calltrace.cycles-pp.pipe_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
      6.03 ±  4%      -1.0        5.06        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.90 ±  4%      -1.0        4.95        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.44 ±  3%      -0.9        3.51        perf-profile.calltrace.cycles-pp.schedule.pipe_read.new_sync_read.vfs_read.ksys_read
      2.29 ±  4%      -0.9        1.38 ±  2%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.new_sync_read
      4.07 ±  3%      -0.9        3.21        perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.new_sync_read.vfs_read
      2.62 ±  3%      -0.9        1.76 ±  4%  perf-profile.calltrace.cycles-pp.read
      3.68 ±  2%      -0.8        2.83        perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      2.06 ±  4%      -0.8        1.22        perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.pipe_read
      3.58 ±  2%      -0.8        2.76        perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      2.37 ±  3%      -0.8        1.58 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
      2.29 ±  3%      -0.8        1.53 ±  4%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      2.26 ±  3%      -0.8        1.50 ±  4%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      2.21 ±  3%      -0.7        1.47 ±  4%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      4.25 ±  3%      -0.7        3.51        perf-profile.calltrace.cycles-pp.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate
      2.14 ±  4%      -0.6        1.52        perf-profile.calltrace.cycles-pp.unwind_next_frame.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity
      3.48 ±  4%      -0.6        2.90 ±  2%  perf-profile.calltrace.cycles-pp.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
      1.93 ±  3%      -0.5        1.48        perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule_idle.do_idle.cpu_startup_entry
      1.54 ±  4%      -0.4        1.18        perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      1.38 ±  3%      -0.3        1.04 ±  2%  perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule_idle.do_idle
      0.72 ±  4%      -0.1        0.58 ±  3%  perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary
      0.66 ±  4%      -0.1        0.54 ±  2%  perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.new_sync_read.vfs_read.ksys_read
     46.28            +0.5       46.74        perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
      0.14 ±173%      +0.5        0.66 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.14 ±173%      +0.5        0.66 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel
      0.15 ±173%      +0.6        0.71 ±  8%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.15 ±173%      +0.6        0.71 ±  8%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.15 ±173%      +0.6        0.71 ±  8%  perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64_no_verify
      7.85 ±  2%      +0.8        8.64 ±  3%  perf-profile.calltrace.cycles-pp.write
      7.77 ±  2%      +0.8        8.58 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
      7.73 ±  2%      +0.8        8.55 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.69 ±  3%      +0.8        8.53 ±  3%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.64 ±  3%      +0.9        8.49 ±  3%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     35.29            +0.9       36.15        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     35.15            +0.9       36.02        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     42.35            +1.8       44.15        perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     42.22            +1.8       44.06        perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64
     38.77            +1.9       40.67        perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     38.65            +1.9       40.56        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
     40.84            +2.1       42.96        perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write
     40.50            +2.1       42.65        perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write
     40.15            +2.2       42.36        perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
     40.07            +2.2       42.29        perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
     37.50            +2.7       40.20        perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
     37.47            +2.7       40.18        perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
     36.96            +2.9       39.84        perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
     36.62 ±  2%      +3.2       39.86        perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
     34.50            +3.3       37.80        perf-profile.calltrace.cycles-pp.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
     29.96 ±  2%      +4.1       34.04        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate
     29.13 ±  2%      +4.1       33.22        perf-profile.calltrace.cycles-pp.__cna_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
      8.30 ±  2%      -1.7        6.58        perf-profile.children.cycles-pp.ksys_read
      8.12 ±  2%      -1.7        6.42        perf-profile.children.cycles-pp.vfs_read
      7.75 ±  2%      -1.7        6.06        perf-profile.children.cycles-pp.__schedule
      7.59 ±  2%      -1.5        6.05        perf-profile.children.cycles-pp.new_sync_read
      7.45 ±  2%      -1.5        5.94        perf-profile.children.cycles-pp.pipe_read
      4.44 ±  3%      -0.9        3.52        perf-profile.children.cycles-pp.schedule
      2.65 ±  3%      -0.9        1.78 ±  4%  perf-profile.children.cycles-pp.read
      3.70 ±  2%      -0.8        2.87        perf-profile.children.cycles-pp.schedule_idle
      4.28 ±  3%      -0.7        3.54        perf-profile.children.cycles-pp.stack_trace_save_tsk
      0.80 ± 35%      -0.7        0.13 ±  5%  perf-profile.children.cycles-pp.poll_idle
      3.54 ±  3%      -0.6        2.94 ±  2%  perf-profile.children.cycles-pp.arch_stack_walk
      2.02 ±  3%      -0.6        1.43 ±  2%  perf-profile.children.cycles-pp.update_load_avg
      2.15 ±  3%      -0.5        1.67        perf-profile.children.cycles-pp.pick_next_task_fair
      2.30 ±  4%      -0.5        1.82        perf-profile.children.cycles-pp.dequeue_task_fair
      2.10 ±  4%      -0.5        1.63 ±  2%  perf-profile.children.cycles-pp.dequeue_entity
      1.56 ±  4%      -0.4        1.20        perf-profile.children.cycles-pp.menu_select
      1.39 ±  3%      -0.3        1.06 ±  2%  perf-profile.children.cycles-pp.set_next_entity
      0.46 ± 13%      -0.3        0.15 ±  3%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.92 ±  3%      -0.2        0.70 ±  2%  perf-profile.children.cycles-pp.prepare_to_wait_event
      1.13            -0.2        0.92 ±  3%  perf-profile.children.cycles-pp.asm_call_sysvec_on_stack
      0.33 ±  9%      -0.2        0.12 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.32 ± 10%      -0.2        0.11 ±  3%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.61 ±  3%      -0.2        0.41 ±  4%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.32 ± 10%      -0.2        0.11 ±  4%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.47 ±  6%      -0.2        0.28        perf-profile.children.cycles-pp.finish_task_switch
      0.56 ±  5%      -0.2        0.36 ±  3%  perf-profile.children.cycles-pp.unwind_get_return_address
      0.50 ±  6%      -0.2        0.32 ±  4%  perf-profile.children.cycles-pp.__kernel_text_address
      0.96 ±  5%      -0.2        0.78        perf-profile.children.cycles-pp.update_curr
      0.44 ±  6%      -0.2        0.27 ±  4%  perf-profile.children.cycles-pp.kernel_text_address
      2.17 ±  4%      -0.2        2.00        perf-profile.children.cycles-pp.unwind_next_frame
      0.73 ±  3%      -0.2        0.56 ±  4%  perf-profile.children.cycles-pp.select_task_rq_fair
      0.95            -0.2        0.79 ±  2%  perf-profile.children.cycles-pp.update_rq_clock
      0.74 ±  4%      -0.1        0.59 ±  4%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.53 ±  3%      -0.1        0.40 ±  5%  perf-profile.children.cycles-pp.ktime_get
      0.41 ±  4%      -0.1        0.28 ±  3%  perf-profile.children.cycles-pp.stack_trace_consume_entry_nosched
      0.71            -0.1        0.59 ±  3%  perf-profile.children.cycles-pp.mutex_lock
      0.50 ±  2%      -0.1        0.38 ±  3%  perf-profile.children.cycles-pp.tick_nohz_idle_exit
      0.44            -0.1        0.33        perf-profile.children.cycles-pp.__orc_find
      0.52 ±  2%      -0.1        0.41 ±  3%  perf-profile.children.cycles-pp.copy_page_to_iter
      0.15 ± 19%      -0.1        0.05 ±  8%  perf-profile.children.cycles-pp.flush_smp_call_function_from_idle
      0.44 ±  4%      -0.1        0.34 ±  2%  perf-profile.children.cycles-pp.security_file_permission
      0.53 ±  2%      -0.1        0.43        perf-profile.children.cycles-pp.__switch_to
      0.48 ±  3%      -0.1        0.38 ±  3%  perf-profile.children.cycles-pp.__switch_to_asm
      0.37 ±  3%      -0.1        0.27 ±  4%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.67 ±  2%      -0.1        0.57 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
      0.32 ±  4%      -0.1        0.22 ±  4%  perf-profile.children.cycles-pp.copy_page_from_iter
      0.38 ±  4%      -0.1        0.29 ±  5%  perf-profile.children.cycles-pp.select_idle_sibling
      0.45 ±  5%      -0.1        0.37 ±  4%  perf-profile.children.cycles-pp.tick_nohz_next_event
      0.29 ±  4%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.tick_nohz_idle_enter
      0.64 ±  2%      -0.1        0.57 ±  3%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.38 ±  3%      -0.1        0.31 ±  4%  perf-profile.children.cycles-pp.copyout
      0.27 ±  6%      -0.1        0.19 ±  6%  perf-profile.children.cycles-pp.orc_find
      0.40 ±  2%      -0.1        0.33 ±  5%  perf-profile.children.cycles-pp.copy_user_generic_unrolled
      0.35 ±  4%      -0.1        0.28        perf-profile.children.cycles-pp.pick_next_entity
      0.38 ±  4%      -0.1        0.31        perf-profile.children.cycles-pp.update_cfs_group
      0.22 ±  4%      -0.1        0.16 ±  5%  perf-profile.children.cycles-pp.___perf_sw_event
      0.30 ±  5%      -0.1        0.23 ±  3%  perf-profile.children.cycles-pp.__unwind_start
      0.32 ±  4%      -0.1        0.26        perf-profile.children.cycles-pp.ttwu_do_wakeup
      0.20 ±  4%      -0.1        0.14 ±  9%  perf-profile.children.cycles-pp.__might_sleep
      0.28 ±  6%      -0.1        0.22 ±  5%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.27 ±  4%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.common_file_perm
      0.18 ±  3%      -0.1        0.12 ±  3%  perf-profile.children.cycles-pp.in_sched_functions
      0.30 ±  4%      -0.1        0.24        perf-profile.children.cycles-pp.check_preempt_curr
      0.22 ±  4%      -0.1        0.17 ±  4%  perf-profile.children.cycles-pp.rcu_idle_exit
      0.34 ±  3%      -0.1        0.28 ±  2%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.30 ±  4%      -0.1        0.24 ±  4%  perf-profile.children.cycles-pp.update_ts_time_stats
      0.31 ±  5%      -0.1        0.25 ±  4%  perf-profile.children.cycles-pp.nr_iowait_cpu
      0.31 ±  3%      -0.1        0.26 ±  3%  perf-profile.children.cycles-pp.sched_clock
      0.21 ±  5%      -0.1        0.16 ±  7%  perf-profile.children.cycles-pp.cpus_share_cache
      0.17 ± 10%      -0.1        0.11 ±  7%  perf-profile.children.cycles-pp.place_entity
      0.28 ±  3%      -0.1        0.23 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.18 ±  4%      -0.1        0.13 ±  3%  perf-profile.children.cycles-pp.resched_curr
      0.33 ±  2%      -0.0        0.28 ±  2%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.29 ±  3%      -0.0        0.24 ±  2%  perf-profile.children.cycles-pp.mutex_unlock
      0.23 ±  3%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.26 ±  3%      -0.0        0.21        perf-profile.children.cycles-pp.___might_sleep
      0.20 ±  6%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__list_del_entry_valid
      0.29 ±  5%      -0.0        0.25 ±  3%  perf-profile.children.cycles-pp.native_sched_clock
      0.24 ±  5%      -0.0        0.19 ±  5%  perf-profile.children.cycles-pp.get_next_timer_interrupt
      0.12 ±  5%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.cpuidle_governor_latency_req
      0.23 ±  8%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.hrtimer_next_event_without
      0.21 ±  3%      -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.read_tsc
      0.14 ±  3%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.rcu_eqs_exit
      0.12 ±  4%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.__entry_text_start
      0.19 ±  2%      -0.0        0.15 ±  5%  perf-profile.children.cycles-pp.__fdget_pos
      0.08 ±  6%      -0.0        0.04 ± 58%  perf-profile.children.cycles-pp.rcu_dynticks_eqs_exit
      0.07 ± 10%      -0.0        0.04 ± 57%  perf-profile.children.cycles-pp.put_prev_entity
      0.11 ± 13%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.put_prev_task_fair
      0.17 ±  4%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.15 ±  7%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.hrtimer_get_next_event
      0.16 ±  2%      -0.0        0.14 ±  6%  perf-profile.children.cycles-pp.__fget_light
      0.13 ± 10%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.is_bpf_text_address
      0.11 ±  6%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.file_update_time
      0.14 ±  6%      -0.0        0.11 ± 11%  perf-profile.children.cycles-pp.__wrgsbase_inactive
      0.14 ±  8%      -0.0        0.11 ±  7%  perf-profile.children.cycles-pp.available_idle_cpu
      0.09 ±  4%      -0.0        0.06 ± 13%  perf-profile.children.cycles-pp.menu_reflect
      0.13 ±  9%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.stack_access_ok
      0.14 ±  5%      -0.0        0.12 ±  7%  perf-profile.children.cycles-pp.switch_fpu_return
      0.10 ±  8%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.current_time
      0.09 ±  9%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.__rdgsbase_inactive
      0.10            -0.0        0.08        perf-profile.children.cycles-pp.__calc_delta
      0.09 ± 10%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.bpf_ksym_find
      0.07 ± 10%      -0.0        0.05        perf-profile.children.cycles-pp.pick_next_task_idle
      0.18 ±  3%      -0.0        0.16 ±  2%  perf-profile.children.cycles-pp.fsnotify
      0.17 ±  5%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.copy_fpregs_to_fpstate
      0.07 ±  6%      -0.0        0.05        perf-profile.children.cycles-pp.put_task_stack
      0.07 ±  6%      -0.0        0.05        perf-profile.children.cycles-pp.apparmor_file_permission
      0.07 ± 12%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.07 ±  6%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.update_min_vruntime
      0.17 ±  2%      -0.0        0.16        perf-profile.children.cycles-pp.anon_pipe_buf_release
      0.07 ±  5%      -0.0        0.06        perf-profile.children.cycles-pp.atime_needs_update
      0.08 ±  5%      -0.0        0.07        perf-profile.children.cycles-pp.finish_wait
      0.48 ± 14%      +0.2        0.71 ±  8%  perf-profile.children.cycles-pp.start_kernel
     46.28            +0.5       46.74        perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     46.28            +0.5       46.74        perf-profile.children.cycles-pp.cpu_startup_entry
     46.25            +0.5       46.71        perf-profile.children.cycles-pp.do_idle
      7.88 ±  2%      +0.8        8.65 ±  3%  perf-profile.children.cycles-pp.write
     42.99            +1.7       44.69        perf-profile.children.cycles-pp.ksys_write
     42.80            +1.7       44.53        perf-profile.children.cycles-pp.vfs_write
     42.37            +1.8       44.16        perf-profile.children.cycles-pp.new_sync_write
     42.23            +1.8       44.06        perf-profile.children.cycles-pp.pipe_write
     39.21            +2.1       41.33        perf-profile.children.cycles-pp.cpuidle_enter
     40.84            +2.1       42.96        perf-profile.children.cycles-pp.__wake_up_common_lock
     39.20            +2.1       41.32        perf-profile.children.cycles-pp.cpuidle_enter_state
     40.50            +2.2       42.65        perf-profile.children.cycles-pp.__wake_up_common
     40.15            +2.2       42.36        perf-profile.children.cycles-pp.autoremove_wake_function
     40.09            +2.2       42.30        perf-profile.children.cycles-pp.try_to_wake_up
     37.97            +2.4       40.36        perf-profile.children.cycles-pp.ttwu_do_activate
     37.94            +2.4       40.33        perf-profile.children.cycles-pp.enqueue_task_fair
     37.50            +2.5       40.05        perf-profile.children.cycles-pp.enqueue_entity
     36.91            +2.9       39.86        perf-profile.children.cycles-pp.intel_idle
     34.95            +3.0       37.95        perf-profile.children.cycles-pp.__account_scheduler_latency
     31.46            +3.5       35.00        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     29.71 ±  2%      +3.8       33.52        perf-profile.children.cycles-pp.__cna_queued_spin_lock_slowpath
      0.71 ± 39%      -0.7        0.05 ±  8%  perf-profile.self.cycles-pp.poll_idle
      1.08 ±  3%      -0.3        0.78        perf-profile.self.cycles-pp.update_load_avg
      1.24 ±  2%      -0.2        1.02 ±  2%  perf-profile.self.cycles-pp.__schedule
      1.86            -0.2        1.65        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.59 ±  3%      -0.2        0.40 ±  4%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.95 ±  3%      -0.2        0.75 ±  2%  perf-profile.self.cycles-pp.set_next_entity
      0.66 ±  4%      -0.2        0.51 ±  6%  perf-profile.self.cycles-pp.menu_select
      0.43 ±  5%      -0.1        0.28 ±  3%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.53 ±  3%      -0.1        0.40 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock
      0.67 ±  2%      -0.1        0.54 ±  2%  perf-profile.self.cycles-pp.stack_trace_save_tsk
      0.77 ±  2%      -0.1        0.64 ±  2%  perf-profile.self.cycles-pp.update_rq_clock
      0.72 ±  8%      -0.1        0.60        perf-profile.self.cycles-pp.update_curr
      0.44            -0.1        0.33        perf-profile.self.cycles-pp.__orc_find
      0.56 ±  2%      -0.1        0.45 ±  3%  perf-profile.self.cycles-pp.pipe_read
      0.33 ±  4%      -0.1        0.22        perf-profile.self.cycles-pp.prepare_to_wait_event
      0.48 ±  3%      -0.1        0.38 ±  3%  perf-profile.self.cycles-pp.__switch_to_asm
      0.32 ±  2%      -0.1        0.22 ±  7%  perf-profile.self.cycles-pp.ktime_get
      0.47            -0.1        0.38 ±  2%  perf-profile.self.cycles-pp.__switch_to
      0.35 ±  2%      -0.1        0.26 ±  5%  perf-profile.self.cycles-pp.select_task_rq_fair
      0.28 ±  5%      -0.1        0.20 ±  3%  perf-profile.self.cycles-pp.dequeue_entity
      0.23 ±  6%      -0.1        0.15 ±  3%  perf-profile.self.cycles-pp.stack_trace_consume_entry_nosched
      0.46 ±  3%      -0.1        0.39 ±  5%  perf-profile.self.cycles-pp.mutex_lock
      0.32 ±  4%      -0.1        0.25 ±  4%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.39 ±  3%      -0.1        0.32 ±  6%  perf-profile.self.cycles-pp.copy_user_generic_unrolled
      0.45 ±  3%      -0.1        0.38        perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.19 ±  6%      -0.1        0.12 ±  8%  perf-profile.self.cycles-pp.vfs_read
      0.34 ±  4%      -0.1        0.27 ±  2%  perf-profile.self.cycles-pp.pick_next_entity
      0.84 ±  2%      -0.1        0.77        perf-profile.self.cycles-pp.enqueue_entity
      0.28 ±  5%      -0.1        0.21 ±  5%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.19 ±  4%      -0.1        0.12 ± 10%  perf-profile.self.cycles-pp.__might_sleep
      0.35 ±  3%      -0.1        0.29        perf-profile.self.cycles-pp.__wake_up_common
      0.19 ±  4%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.___perf_sw_event
      0.47 ±  2%      -0.1        0.41 ±  2%  perf-profile.self.cycles-pp.do_idle
      0.27 ±  4%      -0.1        0.21 ±  3%  perf-profile.self.cycles-pp.__unwind_start
      0.22 ±  6%      -0.1        0.16 ±  2%  perf-profile.self.cycles-pp.finish_task_switch
      0.34 ±  3%      -0.1        0.29        perf-profile.self.cycles-pp.schedule
      0.35 ±  6%      -0.1        0.29 ±  2%  perf-profile.self.cycles-pp.update_cfs_group
      0.24 ±  6%      -0.1        0.19 ±  4%  perf-profile.self.cycles-pp.orc_find
      0.21 ±  5%      -0.1        0.16 ±  7%  perf-profile.self.cycles-pp.cpus_share_cache
      0.30 ±  7%      -0.1        0.25 ±  5%  perf-profile.self.cycles-pp.nr_iowait_cpu
      0.18 ±  4%      -0.1        0.13        perf-profile.self.cycles-pp.resched_curr
      0.29 ±  3%      -0.1        0.24 ±  2%  perf-profile.self.cycles-pp.mutex_unlock
      0.16 ±  9%      -0.1        0.11 ±  6%  perf-profile.self.cycles-pp.place_entity
      0.32 ±  5%      -0.0        0.27        perf-profile.self.cycles-pp.cpuidle_enter_state
      0.23 ±  3%      -0.0        0.18 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.22 ±  4%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.common_file_perm
      0.12 ±  3%      -0.0        0.08 ± 11%  perf-profile.self.cycles-pp.in_sched_functions
      0.28 ±  3%      -0.0        0.24 ±  3%  perf-profile.self.cycles-pp.native_sched_clock
      0.20 ±  4%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.__list_del_entry_valid
      0.12 ±  5%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.new_sync_write
      0.25            -0.0        0.21        perf-profile.self.cycles-pp.___might_sleep
      0.20 ±  4%      -0.0        0.16 ±  5%  perf-profile.self.cycles-pp.vfs_write
      0.07 ±  7%      -0.0        0.03 ±100%  perf-profile.self.cycles-pp.main
      0.29 ±  2%      -0.0        0.25 ±  4%  perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.21 ±  2%      -0.0        0.17 ±  4%  perf-profile.self.cycles-pp.read_tsc
      0.07 ±  5%      -0.0        0.04 ± 58%  perf-profile.self.cycles-pp.rcu_dynticks_eqs_exit
      0.12 ±  6%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.new_sync_read
      0.21 ±  2%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12 ±  6%      -0.0        0.10 ±  9%  perf-profile.self.cycles-pp.arch_stack_walk
      0.07 ±  6%      -0.0        0.04 ± 57%  perf-profile.self.cycles-pp.update_min_vruntime
      0.11 ±  4%      -0.0        0.08 ± 10%  perf-profile.self.cycles-pp.kernel_text_address
      0.23 ±  7%      -0.0        0.21 ±  5%  perf-profile.self.cycles-pp.__account_scheduler_latency
      0.14 ±  6%      -0.0        0.11 ± 11%  perf-profile.self.cycles-pp.__wrgsbase_inactive
      0.09 ±  9%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.__entry_text_start
      0.08 ±  5%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.copy_page_to_iter
      0.19 ±  6%      -0.0        0.17 ±  5%  perf-profile.self.cycles-pp.pipe_write
      0.15 ±  3%      -0.0        0.13 ±  5%  perf-profile.self.cycles-pp.__fget_light
      0.06 ±  6%      -0.0        0.04 ± 57%  perf-profile.self.cycles-pp.unwind_get_return_address
      0.14 ±  7%      -0.0        0.12 ±  7%  perf-profile.self.cycles-pp.switch_fpu_return
      0.09 ±  9%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.tick_nohz_next_event
      0.08 ± 11%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.__hrtimer_next_event_base
      0.16            -0.0        0.14 ±  6%  perf-profile.self.cycles-pp.pick_next_task_fair
      0.09 ±  9%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.__rdgsbase_inactive
      0.06            -0.0        0.04 ± 57%  perf-profile.self.cycles-pp.copy_page_from_iter
      0.14 ±  6%      -0.0        0.11 ±  7%  perf-profile.self.cycles-pp.available_idle_cpu
      0.08 ± 16%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.call_cpuidle
      0.10 ±  8%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.09 ±  5%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.rcu_idle_exit
      0.19 ±  3%      -0.0        0.17 ±  4%  perf-profile.self.cycles-pp.dequeue_task_fair
      0.10 ±  4%      -0.0        0.08        perf-profile.self.cycles-pp.__calc_delta
      0.17 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.anon_pipe_buf_release
      0.17 ±  4%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.copy_fpregs_to_fpstate
      0.06 ±  6%      -0.0        0.05        perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.06 ±  6%      -0.0        0.05        perf-profile.self.cycles-pp.put_task_stack
     36.91            +2.9       39.86        perf-profile.self.cycles-pp.intel_idle
     29.30 ±  2%      +3.9       33.15        perf-profile.self.cycles-pp.__cna_queued_spin_lock_slowpath


                                                                                
                       unixbench.time.voluntary_context_switches                
                                                                                
  4.4e+08 +-----------------------------------------------------------------+   
          |                                                    +..  +..+. ..|   
  4.3e+08 |-+                                                  :   +     +  |   
  4.2e+08 |-+                                                 :   +         |   
          |                                                   :             |   
  4.1e+08 |-+                                                 :             |   
    4e+08 |-+               +.            .+.. .+..+.+..+.   :              |   
          |               ..  +..+.+..+.+.    +           +..+              |   
  3.9e+08 |..+.+..+.+..+.+                                                  |   
  3.8e+08 |-+                                                               |   
          |                                                                 |   
  3.7e+08 |-+                      O    O  O    O  O                        |   
  3.6e+08 |-+                         O       O      O  O O  O O  O O  O O  |   
          |  O      O  O O  O O                                             |   
  3.5e+08 +-----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                  unixbench.score                               
                                                                                
  3800 +--------------------------------------------------------------------+   
       |                                                              .+.  .|   
  3700 |-+                                                     +.  .+.   +. |   
  3600 |-+                                                    :  +.         |   
       |                                                      :             |   
  3500 |-+                                                   :              |   
  3400 |-+                                    .+.+..+..+.    :              |   
       |                 .+.+..+..+.+..+..+.+.           +..+               |   
  3300 |..+.+..+..+.+..+.                                                   |   
  3200 |-+                                                                  |   
       |                                                                    |   
  3100 |-+                        O O  O  O O  O O  O  O    O  O O     O O  |   
  3000 |-+O         O  O    O                            O          O       |   
       |    O  O  O       O    O                                            |   
  2900 +--------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                  unixbench.workload                            
                                                                                
    6e+08 +-----------------------------------------------------------------+   
          |                                                                 |   
  5.8e+08 |-+                                                  +..  +..+. ..|   
          |                                                    :   +     +  |   
  5.6e+08 |-+                                                 :   +         |   
          |                                                   :             |   
  5.4e+08 |-+                                                 :             |   
          |                .+.    .+..    .+..+.+..+.+..+.   :              |   
  5.2e+08 |.. .+..    .+.+.   +..+    +.+.                +..+              |   
          |  +    +.+.                                                      |   
    5e+08 |-+                                                               |   
          |                        O    O  O    O  O                        |   
  4.8e+08 |-+                         O       O      O  O O  O O  O O  O O  |   
          |  O O    O  O O  O O  O                                          |   
  4.6e+08 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


View attachment "0006-locking-qspinlock-Introduce-the-shuffle-reduction-op.patch" of type "text/x-patch" (3049 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ