linux-kernel - Re: [LKP] [PM] 8234f6734c: will-it-scale.per_process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtB4cHpF2JcUiOLbmczDVSLEmBCpNDcYqCdAqwYZ2LAsRg@mail.gmail.com>
Date:   Tue, 15 Jan 2019 14:13:47 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     kernel test robot <rong.a.chen@...el.com>
Cc:     "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKP <lkp@...org>, Ladislav Michl <ladis@...ux-mips.org>
Subject: Re: [LKP] [PM] 8234f6734c: will-it-scale.per_process_ops -3.6% regression

Hi Rong,

On Tue, 15 Jan 2019 at 04:24, kernel test robot <rong.a.chen@...el.com> wrote:
>
> Greeting,
>
> FYI, we noticed a -3.6% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: 8234f6734c5d74ac794e5517437f51c57d65f865 ("PM-runtime: Switch autosuspend over to using hrtimers")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>

Could you rerun with the patch :
https://lore.kernel.org/patchwork/patch/1030857/ ?
It optimizes autosuspend by reducing the number of call to ktime_get

Regards,
Vincent

> in testcase: will-it-scale
> on test machine: 104 threads Skylake with 192G memory
> with following parameters:
>
>         nr_task: 100%
>         mode: process
>         test: poll2
>         cpufreq_governor: performance
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install job.yaml  # job file is attached in this email
>         bin/lkp run     job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-fpga01/poll2/will-it-scale
>
> commit:
>   v4.20-rc7
>   8234f6734c ("PM-runtime: Switch autosuspend over to using hrtimers")
>
>        v4.20-rc7 8234f6734c5d74ac794e551743
> ---------------- --------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :2           50%           1:4     dmesg.WARNING:at#for_ip_interrupt_entry/0x
>          %stddev     %change         %stddev
>              \          |                \
>     240408            -3.6%     231711        will-it-scale.per_process_ops
>   25002520            -3.6%   24097991        will-it-scale.workload
>     351914            -1.7%     345882        interrupts.CAL:Function_call_interrupts
>       1.77 ± 45%      -1.1        0.64        mpstat.cpu.idle%
>     106164 ± 24%     -23.2%      81494 ± 28%  numa-meminfo.node0.AnonHugePages
>     326430 ±  8%     -11.3%     289513        softirqs.SCHED
>       1294            -2.0%       1268        vmstat.system.cs
>       3178           +48.4%       4716 ± 16%  slabinfo.eventpoll_pwq.active_objs
>       3178           +48.4%       4716 ± 16%  slabinfo.eventpoll_pwq.num_objs
>     336.32          -100.0%       0.00        uptime.boot
>       3192          -100.0%       0.00        uptime.idle
>  3.456e+08 ± 76%     -89.9%   34913819 ± 62%  cpuidle.C1E.time
>     747832 ± 72%     -87.5%      93171 ± 45%  cpuidle.C1E.usage
>      16209 ± 26%     -38.2%      10021 ± 44%  cpuidle.POLL.time
>       6352 ± 32%     -39.5%       3843 ± 48%  cpuidle.POLL.usage
>     885259 ±  2%     -13.8%     763434 ±  7%  numa-vmstat.node0.numa_hit
>     865117 ±  2%     -13.9%     744992 ±  7%  numa-vmstat.node0.numa_local
>     405085 ±  7%     +38.0%     558905 ±  9%  numa-vmstat.node1.numa_hit
>     254056 ± 11%     +59.7%     405824 ± 13%  numa-vmstat.node1.numa_local
>     738158 ± 73%     -88.5%      85078 ± 47%  turbostat.C1E
>       1.07 ± 76%      -1.0        0.11 ± 62%  turbostat.C1E%
>       1.58 ± 49%     -65.4%       0.55 ±  6%  turbostat.CPU%c1
>       0.15 ± 13%     -35.0%       0.10 ± 38%  turbostat.CPU%c6
>     153.97 ± 16%     -54.7       99.31        turbostat.PKG_%
>      64141            +1.5%      65072        proc-vmstat.nr_anon_pages
>      19541            -7.0%      18178 ±  8%  proc-vmstat.nr_shmem
>      18296            +1.1%      18506        proc-vmstat.nr_slab_reclaimable
>     713938            -2.3%     697489        proc-vmstat.numa_hit
>     693688            -2.4%     677228        proc-vmstat.numa_local
>     772220            -1.9%     757334        proc-vmstat.pgalloc_normal
>     798565            -1.8%     784042        proc-vmstat.pgfault
>     732336            -2.7%     712661        proc-vmstat.pgfree
>      20.33 ±  4%      -7.0%      18.92        sched_debug.cfs_rq:/.runnable_load_avg.max
>     160603           -44.5%      89108 ± 38%  sched_debug.cfs_rq:/.spread0.avg
>     250694           -29.3%     177358 ± 18%  sched_debug.cfs_rq:/.spread0.max
>       1109 ±  4%      -7.0%       1031        sched_debug.cfs_rq:/.util_avg.max
>      20.33 ±  4%      -7.2%      18.88        sched_debug.cpu.cpu_load[0].max
>     -10.00           +35.0%     -13.50        sched_debug.cpu.nr_uninterruptible.min
>       3.56 ± 10%     +44.2%       5.14 ± 18%  sched_debug.cpu.nr_uninterruptible.stddev
>      87.10 ± 24%     -34.0%      57.44 ± 37%  sched_debug.cpu.sched_goidle.avg
>     239.48           -25.6%     178.07 ± 18%  sched_debug.cpu.sched_goidle.stddev
>     332.67 ±  7%     -25.5%     247.83 ± 13%  sched_debug.cpu.ttwu_count.min
>     231.67 ±  8%     -15.4%     195.96 ± 12%  sched_debug.cpu.ttwu_local.min
>      95.47           -95.5        0.00        perf-profile.calltrace.cycles-pp.poll
>      90.26           -90.3        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.poll
>      90.08           -90.1        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
>      89.84           -89.8        0.00        perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
>      88.04           -88.0        0.00        perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
>       2.66            -0.1        2.54        perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.90            -0.1        1.81        perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
>       2.56            +0.1        2.64        perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +2.3        2.29        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
>       0.00            +2.3        2.34        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
>      17.45            +3.8       21.24        perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00           +92.7       92.66        perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00           +94.5       94.51        perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00           +94.8       94.75        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00           +94.9       94.92        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>      96.03           -96.0        0.00        perf-profile.children.cycles-pp.poll
>      90.29           -90.3        0.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      90.11           -90.1        0.00        perf-profile.children.cycles-pp.do_syscall_64
>      89.87           -89.9        0.00        perf-profile.children.cycles-pp.__x64_sys_poll
>      89.39           -89.4        0.00        perf-profile.children.cycles-pp.do_sys_poll
>      16.19           -16.2        0.00        perf-profile.children.cycles-pp.__fget_light
>      68.59           -68.6        0.00        perf-profile.self.cycles-pp.do_sys_poll
>      14.84           -14.8        0.00        perf-profile.self.cycles-pp.__fget_light
>  1.759e+13          -100.0%       0.00        perf-stat.branch-instructions
>       0.28            -0.3        0.00        perf-stat.branch-miss-rate%
>  4.904e+10          -100.0%       0.00        perf-stat.branch-misses
>       6.79 ±  3%      -6.8        0.00        perf-stat.cache-miss-rate%
>  1.071e+08 ±  4%    -100.0%       0.00        perf-stat.cache-misses
>  1.578e+09          -100.0%       0.00        perf-stat.cache-references
>     385311 ±  2%    -100.0%       0.00        perf-stat.context-switches
>       1.04          -100.0%       0.00        perf-stat.cpi
>  8.643e+13          -100.0%       0.00        perf-stat.cpu-cycles
>      13787          -100.0%       0.00        perf-stat.cpu-migrations
>       0.00 ±  4%      -0.0        0.00        perf-stat.dTLB-load-miss-rate%
>   23324811 ±  5%    -100.0%       0.00        perf-stat.dTLB-load-misses
>  1.811e+13          -100.0%       0.00        perf-stat.dTLB-loads
>       0.00            -0.0        0.00        perf-stat.dTLB-store-miss-rate%
>    2478029          -100.0%       0.00        perf-stat.dTLB-store-misses
>  8.775e+12          -100.0%       0.00        perf-stat.dTLB-stores
>      99.66           -99.7        0.00        perf-stat.iTLB-load-miss-rate%
>  7.527e+09          -100.0%       0.00        perf-stat.iTLB-load-misses
>   25540468 ± 39%    -100.0%       0.00        perf-stat.iTLB-loads
>   8.33e+13          -100.0%       0.00        perf-stat.instructions
>      11066          -100.0%       0.00        perf-stat.instructions-per-iTLB-miss
>       0.96          -100.0%       0.00        perf-stat.ipc
>     777357          -100.0%       0.00        perf-stat.minor-faults
>      81.69           -81.7        0.00        perf-stat.node-load-miss-rate%
>   20040093          -100.0%       0.00        perf-stat.node-load-misses
>    4491667 ±  7%    -100.0%       0.00        perf-stat.node-loads
>      75.23 ± 10%     -75.2        0.00        perf-stat.node-store-miss-rate%
>    3418662 ± 30%    -100.0%       0.00        perf-stat.node-store-misses
>    1027183 ± 11%    -100.0%       0.00        perf-stat.node-stores
>     777373          -100.0%       0.00        perf-stat.page-faults
>    3331644          -100.0%       0.00        perf-stat.path-length
>
>
>
>                             will-it-scale.per_process_ops
>
>   242000 +-+----------------------------------------------------------------+
>          |                      +.+..   .+..+.      .+.+..+.+.+.    .+.+..  |
>   240000 +-+                   +     +.+      +.+..+            +..+      +.|
>   238000 +-+..+.+.  .+.   .+..+                                             |
>          |        +.   +.+                                                  |
>   236000 +-+                                                                |
>          |                                                                  |
>   234000 +-+                                                                |
>          |                                  O O O  O                        |
>   232000 +-+             O O  O O                      O  O O O O  O O O  O |
>   230000 +-+           O          O  O O O           O                      |
>          |           O                                                      |
>   228000 O-+    O O                                                         |
>          | O  O                                                             |
>   226000 +-+----------------------------------------------------------------+
>
>
>                                 will-it-scale.workload
>
>   2.52e+07 +-+--------------------------------------------------------------+
>            |                     +..+.   .+..+.      .+. .+.+..+.   .+..+.  |
>    2.5e+07 +-+                  +     +.+      +.+.+.   +        +.+      +.|
>   2.48e+07 +-+.+..+. .+.    .+.+                                            |
>            |        +   +..+                                                |
>   2.46e+07 +-+                                                              |
>   2.44e+07 +-+                                                              |
>            |                                                                |
>   2.42e+07 +-+               O   O           O O O O        O        O      |
>    2.4e+07 +-+          O  O   O                        O O    O O O    O O |
>            |          O             O O O O           O                     |
>   2.38e+07 O-+    O                                                         |
>   2.36e+07 +-O O    O                                                       |
>            |                                                                |
>   2.34e+07 +-+--------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad  sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen