lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Nov 2016 10:50:38 +0800
From:   kernel test robot <xiaolong.ye@...el.com>
To:     Borislav Petkov <bp@...e.de>
Cc:     Ingo Molnar <mingo@...nel.org>, Andy Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, tipbuild@...or.com,
        lkp@...org
Subject: [lkp] [x86/copy_user]  adb402cd14:  will-it-scale.per_process_ops
 -12.7% regression


Greeting,

FYI, we noticed a -12.7% regression of will-it-scale.per_process_ops due to commit:


commit adb402cd1461eef6e1a21db4532a3b9e6a6be853 ("x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants")
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm

in testcase: will-it-scale
on test machine: 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 4G memory
with following parameters:

	test: poll1
	cpufreq_governor: performance

Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.

In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------+
| testcase: change | aim7: aim7.jobs-per-min -11.6% regression                           |
| test machine     | qemu-system-x86_64 -enable-kvm -cpu host -smp 4 -m 5G               |
| test parameters  | load=200                                                            |
|                  | test=dir_rtns_1                                                     |
+------------------+---------------------------------------------------------------------+


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
  gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/xps/poll1/will-it-scale

commit: 
  05b93c19d5 ("Merge branch 'linus' into x86/asm, to pick up fixes")
  adb402cd14 ("x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants")

05b93c19d50af2bd adb402cd1461eef6e1a21db453 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   6754528 ±  0%     -12.7%    5899566 ±  2%  will-it-scale.per_process_ops
   5574363 ±  0%      -8.1%    5121156 ±  0%  will-it-scale.per_thread_ops
    577.71 ±  0%      +1.9%     588.63 ±  0%  will-it-scale.time.system_time
    143.21 ±  0%      -7.5%     132.53 ±  0%  will-it-scale.time.user_time
      2984 ±  3%     +12.5%       3357 ±  7%  cpuidle.C1E-NHM.usage
     11.55 ±  4%      +6.8%      12.34 ±  7%  turbostat.CPU%c3
 7.704e+11 ±  0%      +3.2%   7.95e+11 ±  0%  perf-stat.branch-instructions
      0.20 ±  3%     +16.3%       0.23 ±  3%  perf-stat.branch-miss-rate%
 1.527e+09 ±  2%     +20.0%  1.833e+09 ±  3%  perf-stat.branch-misses
 4.543e+08 ±  3%      +7.0%  4.862e+08 ±  3%  perf-stat.cache-references
 1.377e+12 ±  0%      +5.6%  1.454e+12 ±  0%  perf-stat.dTLB-loads
      0.00 ±  2%      -3.9%       0.00 ±  1%  perf-stat.dTLB-store-miss-rate%
 8.151e+11 ±  0%      +5.4%  8.591e+11 ±  0%  perf-stat.dTLB-stores
      0.00 ± 11%     -15.1%       0.00 ±  4%  perf-stat.iTLB-load-miss-rate%
 3.852e+12 ±  0%      +5.6%  4.067e+12 ±  0%  perf-stat.iTLB-loads
 3.373e+12 ±  1%      +5.0%  3.542e+12 ±  1%  perf-stat.instructions
      0.80 ±  0%      +5.4%       0.84 ±  0%  perf-stat.ipc
    297784 ±  0%      -1.1%     294651 ±  0%  perf-stat.minor-faults
    297784 ±  0%      -1.1%     294651 ±  0%  perf-stat.page-faults
    230146 ±  5%     +33.3%     306713 ± 21%  sched_debug.cfs_rq:/.load.max
     23374 ± 13%     -22.2%      18188 ±  8%  sched_debug.cpu.nr_load_updates.stddev
   5268307 ± 41%     -52.4%    2508626 ± 40%  sched_debug.cpu.nr_switches.max
   1749053 ± 40%     -45.8%     947909 ± 29%  sched_debug.cpu.nr_switches.stddev
     14.50 ± 42%     -47.4%       7.62 ± 30%  sched_debug.cpu.nr_uninterruptible.max
     10.42 ± 24%     -42.1%       6.03 ± 15%  sched_debug.cpu.nr_uninterruptible.stddev
   5265639 ± 41%     -52.4%    2505468 ± 40%  sched_debug.cpu.sched_count.max
   1748858 ± 40%     -45.8%     947432 ± 29%  sched_debug.cpu.sched_count.stddev
   2627322 ± 41%     -52.5%    1247788 ± 40%  sched_debug.cpu.sched_goidle.max
   2636214 ± 41%     -52.5%    1251571 ± 40%  sched_debug.cpu.ttwu_count.max
    875433 ± 40%     -45.8%     474432 ± 29%  sched_debug.cpu.ttwu_count.stddev
   2629058 ± 41%     -52.6%    1245111 ± 40%  sched_debug.cpu.ttwu_local.max
    876396 ± 40%     -45.8%     474742 ± 29%  sched_debug.cpu.ttwu_local.stddev
      0.00 ± -1%      +Inf%       2.95 ± 11%  perf-profile.calltrace.cycles-pp.___might_sleep.__might_sleep.__might_fault._copy_from_user.do_sys_poll
      0.00 ± -1%      +Inf%       4.78 ± 12%  perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
      0.00 ± -1%      +Inf%       4.04 ± 11%  perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_from_user.do_sys_poll.sys_poll
      0.00 ± -1%      +Inf%       8.23 ± 14%  perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.call_cpuidle.cpu_startup_entry.rest_init.start_kernel.x86_64_start_reservations
      0.00 ± -1%      +Inf%      15.23 ± 13%  perf-profile.calltrace.cycles-pp.copy_user_generic_string.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
     20.07 ± 14%    -100.0%       0.00 ± -1%  perf-profile.calltrace.cycles-pp.copy_user_generic_string.sys_poll.entry_SYSCALL_64_fastpath
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_enter.call_cpuidle.cpu_startup_entry.rest_init.start_kernel
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry.rest_init
     22.59 ± 13%     +88.1%      42.49 ± 11%  perf-profile.calltrace.cycles-pp.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
      7.62 ±  9%     -18.9%       6.18 ± 10%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_swapgs
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel.start_cpu
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.start_kernel.x86_64_start_reservations.x86_64_start_kernel.start_cpu
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.x86_64_start_kernel.start_cpu
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.calltrace.cycles-pp.x86_64_start_reservations.x86_64_start_kernel.start_cpu
      1.81 ± 18%    +174.0%       4.96 ± 12%  perf-profile.children.cycles-pp.___might_sleep
      3.96 ± 17%    +139.4%       9.49 ± 13%  perf-profile.children.cycles-pp.__might_fault
      2.90 ± 15%    +146.7%       7.16 ± 12%  perf-profile.children.cycles-pp.__might_sleep
      0.42 ± 26%   +1964.5%       8.57 ± 13%  perf-profile.children.cycles-pp._copy_from_user
     22.75 ± 13%     +87.6%      42.66 ± 11%  perf-profile.children.cycles-pp.do_sys_poll
      7.62 ±  9%     -18.9%       6.18 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_swapgs
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.children.cycles-pp.rest_init
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.children.cycles-pp.start_kernel
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.children.cycles-pp.x86_64_start_kernel
      5.91 ± 21%     +42.3%       8.41 ± 18%  perf-profile.children.cycles-pp.x86_64_start_reservations
      1.80 ± 18%    +174.2%       4.95 ± 12%  perf-profile.self.cycles-pp.___might_sleep
      1.08 ± 19%    +129.3%       2.48 ± 15%  perf-profile.self.cycles-pp.__might_fault
      1.54 ± 12%     +91.9%       2.96 ± 13%  perf-profile.self.cycles-pp.__might_sleep
      0.42 ± 26%    +681.3%       3.24 ± 15%  perf-profile.self.cycles-pp._copy_from_user
     14.39 ± 14%     -26.1%      10.64 ±  9%  perf-profile.self.cycles-pp.do_sys_poll
      7.62 ±  9%     -18.9%       6.18 ± 10%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_swapgs



                           will-it-scale.time.user_time

  148 ++--------------------------------------------------------------------+
  146 ++ .*.*.*..*                                    .*. .*.*.*.*..        |
      *.*         :       .*.        .*.            .*   *          *       |
  144 ++          :   .*.*   *.*..*.*   *.*.*.*.*..*                 + .*.*.|
  142 ++           *.*                                                *     *
      |                                                                     |
  140 ++                                                                    |
  138 ++                                                                    |
  136 ++                                                                    |
      |                                                                     |
  134 ++         O O O O O   O                                              |
  132 O+                   O   O  O O O O                                   |
      |                                                                     |
  130 ++O   O O                                                             |
  128 ++--O-----------------------------------------------------------------+


                          will-it-scale.time.system_time

  594 ++--------------------------------------------------------------------+
  592 ++O   O O                                                             |
      |                                                                     |
  590 O+  O                       O O O                                     |
  588 ++         O O O O O   O          O                                   |
  586 ++                   O   O                                            |
  584 ++                                                                    |
      |                                                                     |
  582 ++                                                                    |
  580 ++           *.                                                       |
  578 ++           : *.                 *.*.                         .*. .*.*
  576 *+          :    *.*.*.*.*..*.*. +    *.*.*..*.   .*.         *   *   |
      | *.*.*.*.. :                   *              *.*   *.*.*. ..        |
  574 ++         *                                               *          |
  572 ++--------------------------------------------------------------------+


                             will-it-scale.per_process_ops

  7.2e+06 ++----------------------------------------------------------------+
          |         *                                                       |
    7e+06 ++       : :                                                      |
  6.8e+06 ++*.*.*. : :     .*.*.*.*.*.*.*.*.*.*.*.*.*.*. .*.*. .*. .*.*. .*.*
          |+      *   *.*.*                             *     *   *     *   |
  6.6e+06 *+                                                                |
  6.4e+06 ++                                                                |
          |                                                                 |
  6.2e+06 O+        O O O   O   O                                           |
    6e+06 ++      O       O   O   O   O O                                   |
          |     O                         O                                 |
  5.8e+06 ++                                                                |
  5.6e+06 ++O                       O                                       |
          |   O                                                             |
  5.4e+06 ++----------------------------------------------------------------+


                             will-it-scale.per_thread_ops

  5.9e+06 ++----------------------------------------------------------------+
  5.8e+06 ++        *                                                       |
          |        + :                                                      |
  5.7e+06 *+*.*.*.*  :     .*.*.*.*.   .*.*.*.*.*.*.*.*.*.*.*.*.*. .*.      |
  5.6e+06 ++          *.*.*         *.*                           *   *   *.*
          |                                                            + +  |
  5.5e+06 ++                                                            *   |
  5.4e+06 ++                                                                |
  5.3e+06 ++                                                                |
          |                                                                 |
  5.2e+06 O+        O O O O O   O                                           |
  5.1e+06 ++O     O           O   O O O O O                                 |
          |                                                                 |
    5e+06 ++    O                                                           |
  4.9e+06 ++--O-------------------------------------------------------------+

	[*] bisect-good sample
	[O] bisect-bad  sample


***************************************************************************************************
vm-lkp-a03: qemu-system-x86_64 -enable-kvm -cpu host -smp 4 -m 5G
=========================================================================================
compiler/kconfig/load/rootfs/tbox_group/test/testcase:
  gcc-6/x86_64-rhel-7.2/200/debian-x86_64-2016-08-31.cgz/vm-lkp-a03/dir_rtns_1/aim7

commit: 
  05b93c19d5 ("Merge branch 'linus' into x86/asm, to pick up fixes")
  adb402cd14 ("x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants")

05b93c19d50af2bd adb402cd1461eef6e1a21db453 
---------------- -------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
         %stddev     %change         %stddev
             \          |                \  
     25422 ±  0%     -11.6%      22469 ±  0%  aim7.jobs-per-min
     49.16 ±  1%     +13.7%      55.90 ±  1%  aim7.time.elapsed_time
     49.16 ±  1%     +13.7%      55.90 ±  1%  aim7.time.elapsed_time.max
     56074 ±  1%     +18.2%      66257 ±  0%  aim7.time.involuntary_context_switches
      2544 ±  0%      +0.9%       2566 ±  0%  aim7.time.maximum_resident_set_size
    102.58 ±  0%     +22.4%     125.56 ±  0%  aim7.time.system_time
    109578 ± 12%     -38.9%      66934 ± 46%  sched_debug.cfs_rq:/.load.stddev
     29520 ±  4%     +15.4%      34074 ±  5%  proc-vmstat.numa_hit
     29495 ±  4%     +15.4%      34047 ±  5%  proc-vmstat.numa_local
     43435 ±  4%     +12.3%      48756 ±  5%  proc-vmstat.pgalloc_normal
     31786 ±  4%     +15.3%      36656 ±  5%  proc-vmstat.pgfault
     35307 ± 11%     +26.0%      44486 ± 10%  proc-vmstat.pgfree
      0.00 ± -1%      +Inf%     168236 ±138%  latency_stats.avg.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
      0.00 ± -1%      +Inf%      19256 ±109%  latency_stats.avg.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.exit_mmap.mmput.flush_old_exec.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
     91562 ± 74%    +638.3%     675991 ±107%  latency_stats.avg.max
     69313 ±121%    +661.5%     527827 ±150%  latency_stats.avg.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
      0.00 ± -1%      +Inf%     175494 ±131%  latency_stats.max.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
      0.00 ± -1%      +Inf%      19825 ±107%  latency_stats.max.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.exit_mmap.mmput.flush_old_exec.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
    240734 ± 29%    +217.3%     763970 ± 86%  latency_stats.max.max
     69313 ±121%    +661.5%     527827 ±150%  latency_stats.max.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
      0.00 ± -1%      +Inf%     178824 ±127%  latency_stats.sum.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
      0.00 ± -1%      +Inf%      25864 ±100%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.exit_mmap.mmput.flush_old_exec.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
     69313 ±121%    +661.5%     527827 ±150%  latency_stats.sum.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath







Thanks,
Xiaolong

View attachment "config-4.9.0-rc3-00275-gadb402c" of type "text/plain" (153603 bytes)

View attachment "job-script" of type "text/plain" (6535 bytes)

View attachment "job.yaml" of type "text/plain" (4212 bytes)

View attachment "reproduce" of type "text/plain" (138 bytes)

Powered by blists - more mailing lists