linux-kernel - [perf/x86] 81ec3f3c4c: will-it-scale.per_process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200205123216.GO12867@shao2-debian>
Date:   Wed, 5 Feb 2020 20:32:16 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Vince Weaver <vincent.weaver@...ne.edu>,
        Jiri Olsa <jolsa@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
        Ravi Bangoria <ravi.bangoria@...ux.ibm.com>,
        Stephane Eranian <eranian@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: [perf/x86] 81ec3f3c4c: will-it-scale.per_process_ops -5.5% regression

Greeting,

FYI, we noticed a -5.5% regression of will-it-scale.per_process_ops due to commit:


commit: 81ec3f3c4c4d78f2d3b6689c9816bfbdf7417dbb ("perf/x86: Add check_period PMU callback")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

	nr_task: 100%
	mode: process
	test: signal1
	cpufreq_governor: performance
	ucode: 0x500002c

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <rong.a.chen@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-csl-2ap4/signal1/will-it-scale/0x500002c

commit: 
  v5.0-rc6
  81ec3f3c4c ("perf/x86: Add check_period PMU callback")

        v5.0-rc6 81ec3f3c4c4d78f2d3b6689c981 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     17987            -5.5%      16997        will-it-scale.per_process_ops
   3453717            -5.5%    3263556        will-it-scale.workload
 3.032e+08 ± 22%     -56.2%  1.329e+08 ± 71%  cpuidle.C6.time
    435366 ± 25%     -51.4%     211628 ± 35%  cpuidle.C6.usage
      5620 ± 50%     -33.6%       3731 ±  4%  softirqs.CPU187.RCU
      7972 ± 42%     -44.7%       4407 ±  3%  softirqs.CPU51.RCU
    381824 ± 27%     -66.5%     128076 ± 91%  turbostat.C6
      0.48 ± 24%      -0.3        0.18 ± 93%  turbostat.C6%
    640.83 ±  6%     +14.9%     736.62 ±  6%  sched_debug.cfs_rq:/.util_avg.min
     56437 ±  9%     -17.3%      46662 ±  7%  sched_debug.cpu.nr_switches.max
      5224 ±  5%      -7.1%       4852 ±  6%  sched_debug.cpu.nr_switches.stddev
     54643 ±  9%     -17.9%      44853 ±  7%  sched_debug.cpu.sched_count.max
     26160 ± 10%     -17.6%      21557 ±  7%  sched_debug.cpu.ttwu_count.max
     25875 ±  9%     -17.3%      21398 ±  7%  sched_debug.cpu.ttwu_local.max
    745.75 ± 16%     -46.8%     396.75 ± 38%  interrupts.33:PCI-MSI.524291-edge.eth0-TxRx-2
    952.50 ±108%     -93.2%      65.00 ± 50%  interrupts.CPU1.RES:Rescheduling_interrupts
    745.75 ± 16%     -46.8%     396.75 ± 38%  interrupts.CPU11.33:PCI-MSI.524291-edge.eth0-TxRx-2
    740.50 ±171%    -100.0%       0.25 ±173%  interrupts.CPU166.RES:Rescheduling_interrupts
      3207 ±  6%     +27.7%       4095 ±  5%  interrupts.CPU185.CAL:Function_call_interrupts
    152.75 ±168%     -99.2%       1.25 ± 34%  interrupts.CPU50.RES:Rescheduling_interrupts
    698.25 ±166%     -98.7%       9.25 ±135%  interrupts.CPU70.RES:Rescheduling_interrupts
      3367 ±  2%     +13.5%       3821 ±  6%  interrupts.CPU89.CAL:Function_call_interrupts
    202.75 ±117%     -85.1%      30.25 ± 83%  interrupts.CPU96.RES:Rescheduling_interrupts
     71307 ±  3%     -11.0%      63441        numa-vmstat.node2.nr_file_pages
      7081 ±  3%     -19.6%       5696        numa-vmstat.node2.nr_kernel_stack
      1854 ±  7%     -40.2%       1109        numa-vmstat.node2.nr_mapped
      2524 ±  6%     -18.1%       2068 ±  2%  numa-vmstat.node2.nr_page_table_pages
      5132 ± 15%     -38.5%       3159 ± 10%  numa-vmstat.node2.nr_slab_reclaimable
     15668 ±  9%     -22.2%      12192 ±  5%  numa-vmstat.node2.nr_slab_unreclaimable
     70254 ±  2%      -9.9%      63317        numa-vmstat.node2.nr_unevictable
     70254 ±  2%      -9.9%      63317        numa-vmstat.node2.nr_zone_unevictable
    361503 ± 20%     -23.9%     274980 ±  6%  numa-vmstat.node2.numa_hit
    275672 ± 26%     -31.3%     189397 ± 10%  numa-vmstat.node2.numa_local
    285230 ±  3%     -11.0%     253766        numa-meminfo.node2.FilePages
      1707 ± 11%     -73.2%     457.50 ±118%  numa-meminfo.node2.Inactive
     20532 ± 15%     -38.4%      12638 ± 10%  numa-meminfo.node2.KReclaimable
      7082 ±  3%     -19.6%       5696        numa-meminfo.node2.KernelStack
      7112 ±  5%     -37.6%       4436        numa-meminfo.node2.Mapped
    590073 ±  6%     -15.9%     496353 ±  9%  numa-meminfo.node2.MemUsed
     10101 ±  6%     -18.0%       8282 ±  2%  numa-meminfo.node2.PageTables
     20532 ± 15%     -38.4%      12638 ± 10%  numa-meminfo.node2.SReclaimable
     62680 ±  9%     -22.2%      48775 ±  5%  numa-meminfo.node2.SUnreclaim
     83213 ±  8%     -26.2%      61414 ±  6%  numa-meminfo.node2.Slab
    281021 ±  2%      -9.9%     253271        numa-meminfo.node2.Unevictable
 3.322e+09            -5.1%  3.152e+09        perf-stat.i.branch-instructions
      1.17            +0.0        1.18        perf-stat.i.branch-miss-rate%
  38492834            -4.2%   36888923        perf-stat.i.branch-misses
     42.83            -0.7       42.11        perf-stat.i.cache-miss-rate%
  43547238            -6.0%   40916641 ±  2%  perf-stat.i.cache-misses
 1.014e+08            -4.5%   96860566        perf-stat.i.cache-references
     34.91            +5.3%      36.76        perf-stat.i.cpi
     13610            +6.2%      14457 ±  2%  perf-stat.i.cycles-between-cache-misses
 5.049e+09            -5.4%  4.777e+09        perf-stat.i.dTLB-loads
      0.00 ±  8%      +0.0        0.00 ±  2%  perf-stat.i.dTLB-store-miss-rate%
     17697 ±  5%     +90.1%      33643 ±  3%  perf-stat.i.dTLB-store-misses
 3.162e+09            -5.3%  2.994e+09        perf-stat.i.dTLB-stores
  26478258            -8.6%   24200892        perf-stat.i.iTLB-load-misses
 1.682e+10            -5.1%  1.596e+10        perf-stat.i.instructions
    640.74            +3.6%     664.01 ±  2%  perf-stat.i.instructions-per-iTLB-miss
   3611851            -4.9%    3435033        perf-stat.i.node-load-misses
   7022617            -3.0%    6811102        perf-stat.i.node-store-misses
      1.16            +0.0        1.17        perf-stat.overall.branch-miss-rate%
     42.96            -0.7       42.24        perf-stat.overall.cache-miss-rate%
     34.96            +5.3%      36.81        perf-stat.overall.cpi
     13501            +6.4%      14364 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.00 ±  5%      +0.0        0.00 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
    635.29            +3.8%     659.66        perf-stat.overall.instructions-per-iTLB-miss
      0.03            -5.0%       0.03        perf-stat.overall.ipc
 3.308e+09            -5.1%  3.139e+09        perf-stat.ps.branch-instructions
  38324306            -4.1%   36739072        perf-stat.ps.branch-misses
  43365638            -6.0%   40745255 ±  2%  perf-stat.ps.cache-misses
  1.01e+08            -4.5%   96451003        perf-stat.ps.cache-references
 5.028e+09            -5.4%  4.757e+09        perf-stat.ps.dTLB-loads
     17634 ±  5%     +90.1%      33526 ±  3%  perf-stat.ps.dTLB-store-misses
 3.149e+09            -5.3%  2.982e+09        perf-stat.ps.dTLB-stores
  26369184            -8.6%   24103019        perf-stat.ps.iTLB-load-misses
 1.675e+10            -5.1%   1.59e+10        perf-stat.ps.instructions
   3597149            -4.9%    3420963        perf-stat.ps.node-load-misses
   6994250            -3.0%    6784176        perf-stat.ps.node-store-misses
 5.199e+12            -5.2%  4.931e+12        perf-stat.total.instructions


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  19500 +-+-----------------------------------------------------------------+   
        |..+.+..+.                                                          |   
  19000 +-+       +..                                                       |   
        |                                                                   |   
  18500 +-+          +..+.     .+.+..+.+..+..+                              |   
        |                 +..+.               +                             |   
  18000 +-+                                    +..+.+..+..+.+..+..+         |   
        |                                                                   |   
  17500 +-+          O  O                                                   |   
        |                                 O                                 |   
  17000 +-+                                  O            O O  O  O O  O O  O   
        |                       O O                                         |   
  16500 O-+O O  O O                  O O       O  O    O                    |   
        |                 O  O                      O                       |   
  16000 +-+-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


View attachment "config-5.0.0-rc6-00001-g81ec3f3c4c4d7" of type "text/plain" (187481 bytes)

View attachment "job-script" of type "text/plain" (7614 bytes)

View attachment "job.yaml" of type "text/plain" (5198 bytes)

View attachment "reproduce" of type "text/plain" (312 bytes)