linux-kernel - [lkp-robot] [sched/fair] 4c77b18cf8: hackbench.throughput -14.4% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <87lgsh3hbn.fsf@yhuang-dev.intel.com>
Date:   Tue, 07 Mar 2017 11:18:36 +0800
From:   kernel test robot <ying.huang@...ux.intel.com>
TO:     Peter Zijlstra <peterz@...radead.org>
CC:     Ingo Molnar <mingo@...nel.org>, kitsunyan <kitsunyan@...ox.ru>,
        Chris Mason <clm@...com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mike Galbraith <efault@....de>,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>, lkp@...org
Subject: [lkp-robot] [sched/fair]  4c77b18cf8:  hackbench.throughput -14.4%
 regression

Greeting,

FYI, we noticed a -14.4% regression of hackbench.throughput due to commit:


commit: 4c77b18cf8b7ab37c7d5737b4609010d2ceec5f0 ("sched/fair: Make select_idle_cpu() more aggressive")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: hackbench
on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:

	nr_threads: 50%
	mode: process
	ipc: pipe
	cpufreq_governor: performance

test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c

In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_tps -33.8% regression                     |
| test machine     | 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory |
| test parameters  | cluster=cs-localhost                                                  |
|                  | cpufreq_governor=performance                                          |
|                  | ip=ipv4                                                               |
|                  | nr_threads=200%                                                       |
|                  | runtime=300s                                                          |
|                  | test=SCTP_RR                                                          |
+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_tps -50.8% regression                     |
| test machine     | 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory |
| test parameters  | cluster=cs-localhost                                                  |
|                  | cpufreq_governor=performance                                          |
|                  | ip=ipv4                                                               |
|                  | nr_threads=200%                                                       |
|                  | runtime=300s                                                          |
|                  | test=TCP_RR                                                           |
+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps -8.7% regression                     |
| test machine     | 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory       |
| test parameters  | cluster=cs-localhost                                                  |
|                  | cpufreq_governor=performance                                          |
|                  | ip=ipv4                                                               |
|                  | nr_threads=200%                                                       |
|                  | runtime=300s                                                          |
|                  | send_size=10K                                                         |
|                  | test=SCTP_STREAM_MANY                                                 |
+------------------+-----------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput 12.1% improvement                     |
| test machine     | 8 threads Ivy Bridge with 16G memory                                  |
| test parameters  | cpufreq_governor=performance                                          |
|                  | ipc=pipe                                                              |
|                  | mode=process                                                          |
|                  | nr_threads=50%                                                        |
+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps -2.5% regression                     |
| test machine     | 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory |
| test parameters  | cluster=cs-localhost                                                  |
|                  | cpufreq_governor=performance                                          |
|                  | ip=ipv4                                                               |
|                  | nr_threads=200%                                                       |
|                  | runtime=300s                                                          |
|                  | send_size=10K                                                         |
|                  | test=SCTP_STREAM_MANY                                                 |
+------------------+-----------------------------------------------------------------------+


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/01org/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

testcase/path_params/tbox_group/run: hackbench/50%-process-pipe-performance/ivb42

4977ab6e92e267af  4c77b18cf8b7ab37c7d5737b46  
----------------  --------------------------  
    179106             -14%     153395        hackbench.throughput
 5.036e+08              21%  6.113e+08        hackbench.time.involuntary_context_switches
      4523               3%       4675        hackbench.time.percent_of_cpu_this_job_got
     27089               3%      27956        hackbench.time.system_time
      1394             -10%       1252        hackbench.time.user_time
 2.501e+09             -11%  2.223e+09        hackbench.time.voluntary_context_switches
    779669             -14%     667478        hackbench.time.minor_page_faults
    319399               3%     329894        interrupts.CAL:Function_call_interrupts
    884938             -22%     692644        vmstat.system.in
   5224554              -9%    4736985        vmstat.system.cs
      2880                        2955        turbostat.Avg_MHz
     96.25                       98.77        turbostat.%Busy
      6.59             -14%       5.63        turbostat.RAMWatt
 2.009e+08              98%  3.986e+08        perf-stat.cpu-migrations
      0.67               8%       0.73        perf-stat.branch-miss-rate%
 5.046e+11              13%  5.722e+11        perf-stat.cache-references
 5.897e+10               5%   6.22e+10        perf-stat.branch-misses
      3851              16%       4471        perf-stat.instructions-per-iTLB-miss
     38.80             -11%      34.53        perf-stat.node-store-miss-rate%
 8.697e+13                   8.833e+13        perf-stat.cpu-cycles
   1928944              -8%    1777815        perf-stat.page-faults
   1928944              -8%    1777789        perf-stat.minor-faults
 1.332e+10 ±  3%       -18%  1.098e+10 ± 16%  perf-stat.dTLB-store-misses
      1.87 ±  4%       -20%       1.50 ± 19%  perf-stat.dTLB-load-miss-rate%
 2.654e+11 ±  4%       -25%  1.988e+11 ± 20%  perf-stat.dTLB-load-misses
      0.53              -6%       0.50        perf-stat.ipc
 8.738e+12                   8.565e+12        perf-stat.branch-instructions
 3.299e+09             -10%  2.968e+09        perf-stat.context-switches
 4.586e+13              -4%  4.398e+13        perf-stat.instructions
     64.05              31%      84.13        perf-stat.iTLB-load-miss-rate%
 1.392e+13              -6%  1.306e+13        perf-stat.dTLB-loads
 8.613e+12             -10%  7.773e+12        perf-stat.dTLB-stores
 1.135e+10 ±  4%       -45%  6.254e+09 ±  4%  perf-stat.node-loads
 1.878e+10 ±  3%       -46%  1.016e+10 ±  4%  perf-stat.cache-misses
   1.1e+10 ±  4%       -46%  5.949e+09 ±  4%  perf-stat.node-load-misses
 1.191e+10             -17%  9.836e+09        perf-stat.iTLB-load-misses
 7.431e+09 ±  4%       -48%  3.875e+09 ±  4%  perf-stat.node-stores
      3.72 ±  4%       -52%       1.78 ±  4%  perf-stat.cache-miss-rate%
 4.711e+09 ±  4%       -57%  2.044e+09 ±  3%  perf-stat.node-store-misses
 6.682e+09             -72%  1.856e+09 ±  3%  perf-stat.iTLB-loads


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang

View attachment "config-4.10.0-11074-g4c77b18" of type "text/plain" (157299 bytes)

View attachment "job-script" of type "text/plain" (6579 bytes)

View attachment "job.yaml" of type "text/plain" (4205 bytes)

View attachment "reproduce" of type "text/plain" (970 bytes)