lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 10 Oct 2016 14:26:27 +0800
From:   Ye Xiaolong <xiaolong.ye@...el.com>
To:     Manfred Spraul <manfred@...orfullife.com>
Cc:     Stephen Rothwell <sfr@...b.auug.org.au>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...e.hu>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp] [ipc/sem.c] 0882cba0a0: aim9.shared_memory.ops_per_sec
 -8.8% regression

On 10/09, Manfred Spraul wrote:
>Hi,
>
>On 10/09/2016 09:05 AM, kernel test robot wrote:
>>FYI, we noticed a -8.8% regression of aim9.shared_memory.ops_per_sec due to commit:
>>
>>commit 0882cba0a03bca73acd8fab8fb50db04691908e9 ("ipc/sem.c: fix complex_count vs. simple op race")
>>https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>>
>>in testcase: aim9
>>on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
>>with following parameters:
>>
>>	testtime: 300s
>>	test: shared_memory
>>	cpufreq_governor: performance
>>
>>Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic benchmark.
>>
>>
>It is probably caused by this change:
>>--- ipc/sem-fast-but-wrong.c    2016-10-09 19:24:47.914825410 +0200
>>+++ ipc/sem.c    2016-10-09 19:24:57.960841540 +0200
>>@@ -363,6 +363,14 @@
>>          */
>>         spin_lock(&sem->lock);
>>
>>+        /*
>>+         * See 51d7d5205d33
>>+         * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
>>+         * A full barrier is required: the write of sem->lock
>>+         * must be visible before the read is executed
>>+         */
>>+        smp_mb();
>>+
>>         if (!smp_load_acquire(&sma->complex_mode)) {
>>             /* fast path successful! */
>>             return sops->sem_num;
>Unfortunately, we need it, at least for powerpc.
>And I do not want to add a CONFIG_PPC into ipc/sem.c.
>
>Is it possible to do a test what happens with patch that avoids
>spin_unlock_wait()?
>
>https://patchwork.kernel.org/patch/9359365/
>https://patchwork.kernel.org/patch/9359371/
>
>(if possible, with both patches together)

I applied these two patches on top of 0882cba ("ipc/sem.c: fix
complex_count vs. simple op race"), seems they do help recover the
performance back, here is the comparison:


=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/shared_memory/aim9/300s

commit:
  bfe738a6c274df47650d09360babc8c1b71ff2a8
  0882cba0a03bca73acd8fab8fb50db04691908e9
  cc5f1c2b5da85fdf3de370189765af88c8df7b55 ("ipc/sem: Add hysteresis")

bfe738a6c274df47 0882cba0a03bca73acd8fab8fb cc5f1c2b5da85fdf3de3701897
---------------- -------------------------- --------------------------
       fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
           |             |             |             |             |
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   1377511 ±  0%      -8.8%    1256420 ±  0%      -0.5%    1370192 ±  0%  aim9.shared_memory.ops_per_sec
   4133531 ±  0%      -8.8%    3770259 ±  0%      -0.5%    4111578 ±  0%  aim9.time.minor_page_faults
      8106 ±  6%      -7.7%       7483 ±  8%     +24.5%      10092 ±  7%  cpuidle.POLL.usage
      9.66 ±  0%      -2.0%       9.46 ±  0%      +0.5%       9.71 ±  0%  turbostat.CorWatt
      2374 ±  5%     -12.4%       2078 ±  1%      -5.5%       2243 ±  5%  slabinfo.kmalloc-256.active_objs
    697.75 ±  4%     +33.5%     931.25 ±  5%     +43.5%       1001 ±  4%  slabinfo.kmalloc-512.active_objs
    719.50 ±  3%     +32.3%     951.75 ±  5%     +41.8%       1020 ±  4%  slabinfo.kmalloc-512.num_objs
      5184 ±  2%     -12.3%       4544 ±  8%      -5.9%       4877 ±  8%  slabinfo.kmalloc-64.active_objs
      5184 ±  2%     -12.3%       4544 ±  8%      -5.9%       4877 ±  8%  slabinfo.kmalloc-64.num_objs
 2.124e+09 ±  3%      +0.9%  2.144e+09 ±  9%     +14.8%  2.439e+09 ±  9%  perf-stat.cache-references
     10295 ±  6%      -8.9%       9376 ±  1%      +6.0%      10912 ±  7%  perf-stat.cpu-migrations
      0.04 ± 24%     +59.5%       0.06 ± 51%    +508.5%       0.24 ±123%  perf-stat.dTLB-load-miss-rate%
 5.346e+11 ±  3%     -13.1%  4.647e+11 ±  7%      +2.9%  5.501e+11 ±  8%  perf-stat.dTLB-loads
      1.61 ±  1%      -9.2%       1.46 ±  1%      -0.6%       1.60 ±  1%  perf-stat.ipc
   4417053 ±  0%      -8.2%    4053763 ±  0%      -0.5%    4394973 ±  0%  perf-stat.minor-faults
   4417054 ±  0%      -8.2%    4053763 ±  0%      -0.5%    4394973 ±  0%  perf-stat.page-faults
   4478439 ±144%     +83.7%    8227939 ± 90%     -87.4%     564387 ± 21%  sched_debug.cfs_rq:/.load.max
   1912666 ±145%     +84.7%    3531796 ± 91%     -87.7%     235296 ± 19%  sched_debug.cfs_rq:/.load.stddev
    755.19 ± 79%      +4.3%     787.48 ± 63%     -55.2%     338.70 ± 15%  sched_debug.cfs_rq:/.load_avg.avg
    186.08 ±  9%     -10.1%     167.38 ± 15%     -39.9%     111.77 ± 35%  sched_debug.cfs_rq:/.load_avg.min
     86.83 ± 22%      -8.6%      79.38 ± 31%     -38.0%      53.83 ± 27%  sched_debug.cfs_rq:/.util_avg.min
    222488 ±  2%      +0.2%     222984 ±  6%     -22.8%     171691 ± 15%  sched_debug.cpu.avg_idle.stddev
     62.58 ± 56%     -18.7%      50.90 ± 41%     -93.8%       3.86 ±168%  sched_debug.cpu.clock.stddev
     62.58 ± 56%     -18.7%      50.90 ± 41%     -93.8%       3.86 ±168%  sched_debug.cpu.clock_task.stddev
   4500146 ±144%     +82.7%    8220704 ± 92%     -87.5%     564387 ± 21%  sched_debug.cpu.load.max
   1920553 ±145%     +83.6%    3525404 ± 92%     -87.7%     235296 ± 19%  sched_debug.cpu.load.stddev
      1.52 ± 11%     -26.7%       1.11 ± 33%     -24.9%       1.14 ± 18%  sched_debug.cpu.nr_running.avg
     18404 ± 42%     +39.9%      25755 ± 11%     +66.9%      30725 ±  7%  sched_debug.cpu.ttwu_count.min
      7.90 ± 26%     -30.4%       5.50 ± 37%     -13.0%       6.88 ± 45%  sched_debug.rt_rq:/.rt_time.avg
     31.58 ± 26%     -30.4%      21.97 ± 37%     -13.2%      27.42 ± 45%  sched_debug.rt_rq:/.rt_time.max
     13.67 ± 26%     -30.5%       9.51 ± 37%     -13.2%      11.86 ± 45%  sched_debug.rt_rq:/.rt_time.stddev
      3.17 ±  9%     -26.7%       2.32 ± 13%      +2.0%       3.23 ± 12%  perf-profile.calltrace.cycles-pp._raw_spin_lock.sys_semop.entry_SYSCALL_64_fastpath
      1.16 ±  4%     -15.9%       0.98 ± 14%      +1.0%       1.17 ± 11%  perf-profile.calltrace.cycles-pp.do_munmap.sys_shmdt.entry_SYSCALL_64_fastpath
      5.25 ±  8%     -15.7%       4.43 ±  4%      -2.3%       5.13 ± 15%  perf-profile.calltrace.cycles-pp.do_smart_update.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
      1.50 ±  5%     +68.6%       2.53 ± 10%     -36.2%       0.96 ± 28%  perf-profile.calltrace.cycles-pp.pid_vnr.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
      1.24 ±  4%     -15.3%       1.05 ± 12%      +0.6%       1.25 ± 11%  perf-profile.calltrace.cycles-pp.sys_shmdt.entry_SYSCALL_64_fastpath
      0.99 ± 12%     -20.5%       0.79 ±  6%      -5.7%       0.93 ± 17%  perf-profile.calltrace.cycles-pp.wake_const_ops.do_smart_wakeup_zero.do_smart_update.SYSC_semtimedop.sys_semop
      3.45 ±  9%     -24.8%       2.60 ± 12%      +1.7%       3.51 ± 13%  perf-profile.children.cycles-pp._raw_spin_lock
      1.18 ±  4%     -15.7%       0.99 ± 14%      +0.9%       1.19 ± 12%  perf-profile.children.cycles-pp.do_munmap
      5.61 ±  8%     -15.4%       4.75 ±  3%      -3.7%       5.40 ± 15%  perf-profile.children.cycles-pp.do_smart_update
      1.78 ±  5%     +56.7%       2.80 ± 11%     -29.1%       1.27 ± 22%  perf-profile.children.cycles-pp.pid_vnr
      1.24 ±  4%     -15.3%       1.05 ± 12%      +0.8%       1.25 ± 11%  perf-profile.children.cycles-pp.sys_shmdt
      1.24 ± 13%     -24.7%       0.93 ±  8%     -12.7%       1.08 ± 16%  perf-profile.children.cycles-pp.wake_const_ops
      9.86 ±  8%     +47.0%      14.49 ±  9%      -8.5%       9.02 ± 13%  perf-profile.self.cycles-pp.SYSC_semtimedop
      3.45 ±  9%     -25.2%       2.58 ± 13%      +1.9%       3.51 ± 13%  perf-profile.self.cycles-pp._raw_spin_lock
      2.26 ±  9%     -26.8%       1.65 ±  7%     -11.0%       2.01 ± 18%  perf-profile.self.cycles-pp.do_smart_update
      1.78 ±  5%     +56.7%       2.80 ± 11%     -29.1%       1.27 ± 22%  perf-profile.self.cycles-pp.pid_vnr
      1.03 ±  2%     -15.3%       0.87 ±  8%      -8.7%       0.94 ± 25%  perf-profile.self.cycles-pp.security_ipc_permission
      1.32 ±  5%      -4.0%       1.27 ± 14%     +14.8%       1.51 ±  8%  perf-profile.self.cycles-pp.security_sem_semop
      1.24 ± 13%     -24.7%       0.93 ±  8%     -12.7%       1.08 ± 16%  perf-profile.self.cycles-pp.wake_const_ops

Thanks,
Xiaolong
>
>Thanks,
>    Manfred

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ