lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 3 Jan 2017 10:01:00 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Ye Xiaolong <xiaolong.ye@...el.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, LKP <lkp@...org>
Subject: Re: [lkp-developer] [sched/core] 6b94780e45: unixbench.score -4.5% regression

Hi Xiaolong,

Thanks for testing, I'm going to look for another root cause
It was also mentioned  a -2.9% regression with a 8 threads Intel(R)
Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory. Have you checked this
platform too ?

Regards,
Vincent

On 3 January 2017 at 08:13, Ye Xiaolong <xiaolong.ye@...el.com> wrote:
> On 01/02, Vincent Guittot wrote:
>>Hi Xiaolong,
>>
>>Le Monday 19 Dec 2016 ŕ 08:14:53 (+0800), kernel test robot a écrit :
>>>
>>> Greeting,
>>>
>>> FYI, we noticed a -4.5% regression of unixbench.score due to commit:
>>
>>I have been able to restore performance on my platform with the patch below.
>>Could you test it ?
>>
>>---
>> kernel/sched/core.c | 1
>> 1 file changed, 1 insertion(+)
>>
>>diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>index 393759b..6e7d45c 100644
>>--- a/kernel/sched/core.c
>>+++ b/kernel/sched/core.c
>>@@ -2578,6 +2578,7 @@ void wake_up_new_task(struct task_struct *p)
>>       __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
>> #endif
>>       rq = __task_rq_lock(p, &rf);
>>+      update_rq_clock(rq);
>>       post_init_entity_util_avg(&p->se);
>>
>>       activate_task(rq, p, 0);
>>--
>>2.7.4
>>
>>Vincent
>
> Hi, Vincent,
>
> I applied your fix patch on top of 6b94780 ("sched/core: Use load_avg for selecting idlest group"),
> and here is the comparison. (60df283834fd4def3c11ad2de3 is the fix commit id).
> Seems the performance hasn't been restored back.

Thanks for testings.
>
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-6/performance/x86_64-rhel-7.2/100%/debian-x86_64-2016-08-31.cgz/300s/lkp-wsm-ep1/shell1/unixbench
>
> commit:
>   f519a3f1c6b7a990e5aed37a8f853c6ecfdee945
>   6b94780e45c17b83e3e75f8aaca5a328db583c74
>   60df283834fd4def3c11ad2de3e6fc9e81b7dff1
>
> f519a3f1c6b7a990 6b94780e45c17b83e3e75f8aac 60df283834fd4def3c11ad2de3
> ---------------- -------------------------- --------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      25565 ą  0%      -4.5%      24414 ą  0%      -4.5%      24421 ą  0%  unixbench.score
>   13223805 ą  2%     -19.6%   10628072 ą  0%     -21.3%   10412818 ą  1%  unixbench.time.involuntary_context_switches
>  9.232e+08 ą  0%      -4.3%  8.831e+08 ą  0%      -4.3%  8.838e+08 ą  0%  unixbench.time.minor_page_faults
>       1807 ą  0%      -5.4%       1709 ą  0%      -5.6%       1705 ą  0%  unixbench.time.percent_of_cpu_this_job_got
>       5656 ą  0%      -6.8%       5271 ą  0%      -7.3%       5243 ą  0%  unixbench.time.system_time
>       5743 ą  0%      -4.0%       5514 ą  0%      -3.9%       5516 ą  0%  unixbench.time.user_time
>   29557557 ą  0%      -2.6%   28781098 ą  0%      -2.2%   28919280 ą  0%  unixbench.time.voluntary_context_switches
>     741766 ą  2%     -62.4%     279054 ą  1%     -61.8%     283034 ą  1%  interrupts.CAL:Function_call_interrupts
>    2912823 ą  0%      -9.7%    2630010 ą  0%      -8.7%    2660077 ą  0%  softirqs.RCU
>   13223805 ą  2%     -19.6%   10628072 ą  0%     -21.3%   10412818 ą  1%  time.involuntary_context_switches
>     126250 ą  0%     -12.2%     110890 ą  0%     -11.5%     111739 ą  0%  vmstat.system.cs
>      31060 ą  1%      -9.2%      28214 ą  0%      -9.6%      28078 ą  0%  vmstat.system.in
>     454.50 ą150%    +164.7%       1203 ą166%    +792.3%       4055 ą 18%  numa-numastat.node0.numa_foreign
>     454.50 ą150%    +164.7%       1203 ą166%    +792.3%       4055 ą 18%  numa-numastat.node0.numa_miss
>       4297 ą 15%     -18.1%       3520 ą 57%     -84.5%     666.40 ą113%  numa-numastat.node1.numa_foreign
>       4297 ą 15%     -18.1%       3520 ą 57%     -84.5%     666.40 ą113%  numa-numastat.node1.numa_miss
>      78.58 ą  0%      -5.6%      74.20 ą  0%      -6.0%      73.90 ą  0%  turbostat.%Busy
>       2507 ą  0%      -5.6%       2366 ą  0%      -6.0%       2356 ą  0%  turbostat.Avg_MHz
>       3.01 ą  2%    +100.4%       6.03 ą  2%    +100.1%       6.02 ą  0%  turbostat.CPU%c3
>       2.35 ą  1%      +6.8%       2.51 ą  4%     +12.1%       2.64 ą  1%  turbostat.CPU%c6
>       1.25 ą  5%     -17.1%       1.04 ą 22%     -32.3%       0.85 ą  5%  perf-profile.children.cycles-pp.__irqentry_text_start
>
> Thanks,
> Xiaolong
>
>>
>>>
>>>
>>> commit: 6b94780e45c17b83e3e75f8aaca5a328db583c74 ("sched/core: Use load_avg for selecting idlest group")
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>
>>> in testcase: unixbench
>>> on test machine: 24 threads Nehalem-EP with 24G memory
>>> with following parameters:
>>>
>>>      runtime: 300s
>>>      nr_task: 100%
>>>      test: shell1
>>>      cpufreq_governor: performance
>>>
>>> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
>>> test-url: https://github.com/kdlucas/byte-unixbench
>>>
>>> In addition to that, the commit also has significant impact on the following tests:
>>>
>>> +------------------+-----------------------------------------------------------------------+
>>> | testcase: change | unixbench: unixbench.score -2.9% regression                           |
>>> | test machine     | 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory       |
>>> | test parameters  | nr_task=1                                                             |
>>> |                  | runtime=300s                                                          |
>>> |                  | test=shell8                                                           |
>>> +------------------+-----------------------------------------------------------------------+
>>>
>>>
>>> Details are as below:
>>> -------------------------------------------------------------------------------------------------->
>>>
>>>
>>> To reproduce:
>>>
>>>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
>>>         cd lkp-tests
>>>         bin/lkp install job.yaml  # job file is attached in this email
>>>         bin/lkp run     job.yaml
>>>
>>> testcase/path_params/tbox_group/run: unixbench/300s-100%-shell1-performance/lkp-wsm-ep1
>>>
>>> f519a3f1c6b7a990  6b94780e45c17b83e3e75f8aac
>>> ----------------  --------------------------
>>>      25565              -5%      24414        unixbench.score
>>>   29557557                    28781098        unixbench.time.voluntary_context_switches
>>>       5743              -4%       5514        unixbench.time.user_time
>>>  9.232e+08              -4%  8.831e+08        unixbench.time.minor_page_faults
>>>       1807              -5%       1709        unixbench.time.percent_of_cpu_this_job_got
>>>       5656              -7%       5271        unixbench.time.system_time
>>>   13223805             -20%   10628072        unixbench.time.involuntary_context_switches
>>>     741766             -62%     279054        interrupts.CAL:Function_call_interrupts
>>>      31060              -9%      28214        vmstat.system.in
>>>     126250             -12%     110890        vmstat.system.cs
>>>      78.58              -6%      74.20        turbostat.%Busy
>>>       2507              -6%       2366        turbostat.Avg_MHz
>>>       9134 ą 47%     -6e+03       2973 ą 36%  latency_stats.max.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
>>>     380879 ą 10%      5e+05     887692 ą 49%  latency_stats.sum.wait_on_page_bit_killable.__lock_page_or_retry.filemap_fault.__do_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
>>>      31710 ą 15%     -2e+04      10583 ą 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
>>>      51796 ą  4%     -4e+04      15457 ą 10%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64
>>>     111998 ą 18%     -7e+04      37074 ą 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath
>>>     275087 ą 15%     -2e+05      81973 ą  3%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath
>>>     930993 ą 12%     -6e+05     320520 ą  4%  latency_stats.sum.call_rwsem_down_write_failed.vma_link.mmap_region.do_mmap.vm_mmap_pgoff.vm_mmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64
>>>    4755783 ą  9%     -3e+06    1619348 ą  4%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.split_vma.mprotect_fixup.do_mprotect_pkey.SyS_mprotect.entry_SYSCALL_64_fastpath
>>>    5536067 ą 10%     -4e+06    1929338 ą  3%  latency_stats.sum.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
>>>  9.032e+08              -4%   8.64e+08        perf-stat.page-faults
>>>  9.032e+08              -4%   8.64e+08        perf-stat.minor-faults
>>>  2.329e+09                   2.269e+09        perf-stat.node-load-misses
>>>    2.2e+09              -9%  2.011e+09 ą  5%  perf-stat.dTLB-store-misses
>>>  3.278e+10              -9%  2.987e+10 ą  6%  perf-stat.dTLB-load-misses
>>>   19484819              13%   21974129        perf-stat.cpu-migrations
>>>  3.755e+13              -6%   3.54e+13        perf-stat.cpu-cycles
>>>       3244               4%       3379        perf-stat.instructions-per-iTLB-miss
>>>  4.536e+12              -4%  4.332e+12        perf-stat.branch-instructions
>>>  2.303e+13              -4%  2.208e+13        perf-stat.instructions
>>>  5.768e+12              -4%  5.517e+12        perf-stat.dTLB-loads
>>>  3.567e+11              -4%  3.414e+11        perf-stat.cache-references
>>>       2.97                        2.93        perf-stat.branch-miss-rate%
>>>  2.768e+10                   2.699e+10        perf-stat.node-stores
>>>  5.446e+10              -3%  5.275e+10        perf-stat.cache-misses
>>>       0.03              -4%       0.03        perf-stat.iTLB-load-miss-rate%
>>>  9.673e+09              -4%  9.294e+09        perf-stat.node-loads
>>>  3.596e+12              -4%  3.442e+12        perf-stat.dTLB-stores
>>>       0.61                        0.62        perf-stat.ipc
>>>  1.347e+11              -6%   1.27e+11        perf-stat.branch-misses
>>>  7.098e+09              -8%  6.533e+09        perf-stat.iTLB-load-misses
>>>  2.309e+13              -4%  2.206e+13        perf-stat.iTLB-loads
>>>   79911173             -12%   70187035        perf-stat.context-switches
>>>
>>>
>>>
>>>                                  turbostat._Busy
>>>
>>>   90 ++-------------------------------------*---*---------------------------+
>>>      |                                    ..       *...*..                  |
>>>   80 *+..*..*...*..*...*..*...*..*...O...*  O   O  O   O  O...O..O...O  O   O
>>>   70 O+  O  O   O  O   O  O   O  O                                          |
>>>      |                                                                      |
>>>   60 ++                                                                     |
>>>   50 ++                                                                     |
>>>      |                                                                      |
>>>   40 ++                                                                     |
>>>   30 ++                                                                     |
>>>      |                                                                      |
>>>   20 ++                                                                     |
>>>   10 ++                                                                     |
>>>      |                                                                      |
>>>    0 ++----------------------------------O----------------------------------+
>>>
>>>
>>>
>>>
>>>
>>>                     unixbench.time.percent_of_cpu_this_job_got
>>>
>>>   2500 ++-------------------------------------------------------------------+
>>>        |                                                                    |
>>>        |                                       .*...                        |
>>>   2000 ++                                   .*.     *..*...                 |
>>>        *..*...*..*...*..*...*..*...*..O...*. O  O   O  O   O..O...O..O   O  O
>>>        O  O   O  O   O  O   O  O   O                                        |
>>>   1500 ++                                                                   |
>>>        |                                                                    |
>>>   1000 ++                                                                   |
>>>        |                                                                    |
>>>        |                                                                    |
>>>    500 ++                                                                   |
>>>        |                                                                    |
>>>        |                                                                    |
>>>      0 ++---------------------------------O---------------------------------+
>>>
>>>
>>>                                   vmstat.system.in
>>>
>>>   40000 ++------------------------------------------------------------------+
>>>         |                                          .*...*..                 |
>>>   35000 ++                                  .*...*.                         |
>>>   30000 *+.*...*..*...*..*..*...*..*...*..*.               *..*...*..*      |
>>>         O  O   O  O   O  O  O   O  O   O     O   O  O   O  O  O   O  O   O  O
>>>   25000 ++                                                                  |
>>>         |                                                                   |
>>>   20000 ++                                                                  |
>>>         |                                                                   |
>>>   15000 ++                                                                  |
>>>   10000 ++                                                                  |
>>>         |                                                                   |
>>>    5000 ++                                                                  |
>>>         |                                                                   |
>>>       0 ++--------------------------------O---------------------------------+
>>>
>>>      [*] bisect-good sample
>>>      [O] bisect-bad  sample
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are provided
>>> for informational purposes only. Any difference in system hardware or software
>>> design or configuration may affect actual performance.
>>>
>>>
>>> Thanks,
>>> Xiaolong
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ