lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 02 Sep 2015 10:40:31 +0800
From:	Huang Ying <ying.huang@...el.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Andy Lutomirski <luto@...nel.org>, lkp@...org,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>,
	Denys Vlasenko <dvlasenk@...hat.com>
Subject: Re: [lkp] [x86/build] b2c51106c75: -18.1%
 will-it-scale.per_process_ops

On Wed, 2015-08-05 at 10:38 +0200, Ingo Molnar wrote:
> * kernel test robot <ying.huang@...el.com> wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
> > commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 ("x86/build: Fix detection of GCC -mpreferred-stack-boundary support")
> 
> Does the performance regression go away reproducibly if you do:
> 
>    git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
> 
> ?

Sorry for reply so late!

Revert the commit will restore part of the performance, as below.
parent commit: f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
the commit:    b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
revert commit: 987d12601a4a82cc2f2151b1be704723eb84cb9d

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/cpufreq_governor/test:
  wsm/will-it-scale/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/performance/readseek2

commit: 
  f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
  b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
  987d12601a4a82cc2f2151b1be704723eb84cb9d

f2a50f8b7da45ff2 b2c51106c7581866c37ffc77c5 987d12601a4a82cc2f2151b1be 
---------------- -------------------------- -------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    879002 ±  0%     -18.1%     720270 ±  7%      -3.6%     847011 ±  2%  will-it-scale.per_process_ops
      0.02 ±  0%     +34.5%       0.02 ±  7%      +5.6%       0.02 ±  2%  will-it-scale.scalability
     11144 ±  0%      +0.1%      11156 ±  0%     +10.6%      12320 ±  0%  will-it-scale.time.minor_page_faults
    769.30 ±  0%      -0.9%     762.15 ±  0%      +1.1%     777.42 ±  0%  will-it-scale.time.system_time
  26153173 ±  0%      +7.0%   27977076 ±  0%      +3.5%   27078124 ±  0%  will-it-scale.time.voluntary_context_switches
      2964 ±  2%      +1.4%       3004 ±  1%     -51.9%       1426 ±  2%  proc-vmstat.pgactivate
      0.06 ± 27%    +154.5%       0.14 ± 44%    +122.7%       0.12 ± 24%  turbostat.CPU%c3
    370683 ±  0%      +6.2%     393491 ±  0%      +2.4%     379575 ±  0%  vmstat.system.cs
     11144 ±  0%      +0.1%      11156 ±  0%     +10.6%      12320 ±  0%  time.minor_page_faults
     15.70 ±  2%     +14.5%      17.98 ±  0%      +1.5%      15.94 ±  1%  time.user_time
    830343 ± 56%     -54.0%     382128 ± 39%     -22.3%     645308 ± 65%  cpuidle.C1E-NHM.time
    788.25 ± 14%     -21.7%     617.25 ± 16%     -12.3%     691.00 ±  3%  cpuidle.C1E-NHM.usage
   2489132 ± 20%     +79.3%    4464147 ± 33%     +78.4%    4440574 ± 21%  cpuidle.C3-NHM.time
   1082762 ±162%    -100.0%       0.00 ± -1%    +189.3%    3132030 ±110%  latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
    102189 ±  2%      -2.1%     100087 ±  5%     -32.9%      68568 ±  2%  latency_stats.hits.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
   1082762 ±162%    -100.0%       0.00 ± -1%    +289.6%    4217977 ±109%  latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   1082762 ±162%    -100.0%       0.00 ± -1%    +478.5%    6264061 ±110%  latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
      5.10 ±  2%      -8.0%       4.69 ±  1%     +13.0%       5.76 ±  1%  perf-profile.cpu-cycles.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
      2.58 ±  8%     +19.5%       3.09 ±  3%      -1.8%       2.54 ± 11%  perf-profile.cpu-cycles._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry
      7.02 ±  3%      +9.2%       7.67 ±  2%      +7.1%       7.52 ±  3%  perf-profile.cpu-cycles._raw_spin_lock_irqsave.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry
      3.07 ±  2%     +14.8%       3.53 ±  3%      -1.4%       3.03 ±  5%  perf-profile.cpu-cycles.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
      3.05 ±  5%      -8.4%       2.79 ±  4%      -5.2%       2.90 ±  5%  perf-profile.cpu-cycles.hrtimer_start_range_ns.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_idle_enter.cpu_startup_entry
      0.89 ±  5%      -7.6%       0.82 ±  3%     +16.3%       1.03 ±  5%  perf-profile.cpu-cycles.is_ftrace_trampoline.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk
      0.98 ±  3%     -25.1%       0.74 ±  7%     -16.8%       0.82 ±  2%  perf-profile.cpu-cycles.is_ftrace_trampoline.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
      1.58 ±  3%      -5.2%       1.50 ±  2%     +44.2%       2.28 ±  1%  perf-profile.cpu-cycles.is_module_text_address.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk
      1.82 ± 18%     +46.6%       2.67 ±  3%     -32.6%       1.23 ± 56%  perf-profile.cpu-cycles.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page
      8.05 ±  3%      +9.5%       8.82 ±  3%      +5.4%       8.49 ±  2%  perf-profile.cpu-cycles.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
      1.16 ±  2%      +6.9%       1.25 ±  5%     +11.4%       1.30 ±  5%  perf-profile.cpu-cycles.put_page.shmem_file_read_iter.__vfs_read.vfs_read.sys_read
     11102 ±  1%      +0.0%      11102 ±  1%     -95.8%     468.00 ±  0%  slabinfo.Acpi-ParseExt.active_objs
    198.25 ±  1%      +0.0%     198.25 ±  1%     -93.9%      12.00 ±  0%  slabinfo.Acpi-ParseExt.active_slabs
     11102 ±  1%      +0.0%      11102 ±  1%     -95.8%     468.00 ±  0%  slabinfo.Acpi-ParseExt.num_objs
    198.25 ±  1%      +0.0%     198.25 ±  1%     -93.9%      12.00 ±  0%  slabinfo.Acpi-ParseExt.num_slabs
    341.25 ± 14%      +2.9%     351.00 ± 11%    -100.0%       0.00 ± -1%  slabinfo.blkdev_ioc.active_objs
    341.25 ± 14%      +2.9%     351.00 ± 11%    -100.0%       0.00 ± -1%  slabinfo.blkdev_ioc.num_objs
    438.00 ± 16%      -8.3%     401.50 ± 20%    -100.0%       0.00 ± -1%  slabinfo.file_lock_ctx.active_objs
    438.00 ± 16%      -8.3%     401.50 ± 20%    -100.0%       0.00 ± -1%  slabinfo.file_lock_ctx.num_objs
      4398 ±  1%      +1.4%       4462 ±  0%     -14.5%       3761 ±  2%  slabinfo.ftrace_event_field.active_objs
      4398 ±  1%      +1.4%       4462 ±  0%     -14.5%       3761 ±  2%  slabinfo.ftrace_event_field.num_objs
      3947 ±  2%     +10.6%       4363 ±  3%    +107.1%       8175 ±  2%  slabinfo.kmalloc-192.active_objs
     93.00 ±  2%     +10.8%     103.00 ±  3%    +120.2%     204.75 ±  2%  slabinfo.kmalloc-192.active_slabs
      3947 ±  2%     +10.6%       4363 ±  3%    +118.4%       8620 ±  2%  slabinfo.kmalloc-192.num_objs
     93.00 ±  2%     +10.8%     103.00 ±  3%    +120.2%     204.75 ±  2%  slabinfo.kmalloc-192.num_slabs
      1794 ±  0%      +3.2%       1851 ±  2%     +12.2%       2012 ±  3%  slabinfo.trace_event_file.active_objs
      1794 ±  0%      +3.2%       1851 ±  2%     +12.2%       2012 ±  3%  slabinfo.trace_event_file.num_objs
      7065 ±  7%      -5.4%       6684 ±  8%    -100.0%       0.00 ± -1%  slabinfo.vm_area_struct.active_objs
    160.50 ±  7%      -5.5%     151.75 ±  8%    -100.0%       0.00 ± -1%  slabinfo.vm_area_struct.active_slabs
      7091 ±  7%      -5.6%       6694 ±  8%    -100.0%       0.00 ± -1%  slabinfo.vm_area_struct.num_objs
    160.50 ±  7%      -5.5%     151.75 ±  8%    -100.0%       0.00 ± -1%  slabinfo.vm_area_struct.num_slabs
    857.50 ± 29%     +75.7%       1506 ± 78%    +157.6%       2209 ± 33%  sched_debug.cfs_rq[11]:/.blocked_load_avg
     52.75 ± 29%     -29.4%      37.25 ± 60%    +103.3%     107.25 ± 43%  sched_debug.cfs_rq[11]:/.load
    914.50 ± 29%     +69.9%       1553 ± 77%    +155.6%       2337 ± 32%  sched_debug.cfs_rq[11]:/.tg_load_contrib
      7.75 ± 34%     -64.5%       2.75 ± 64%     -12.9%       6.75 ±115%  sched_debug.cfs_rq[2]:/.nr_spread_over
      1135 ± 20%     -43.6%     640.75 ± 49%     -18.8%     922.50 ± 51%  sched_debug.cfs_rq[3]:/.blocked_load_avg
      1215 ± 21%     -43.1%     691.50 ± 46%     -21.3%     956.25 ± 50%  sched_debug.cfs_rq[3]:/.tg_load_contrib
     38.50 ± 21%    +129.9%      88.50 ± 36%     +96.1%      75.50 ± 56%  sched_debug.cfs_rq[4]:/.load
     26.00 ± 20%     +98.1%      51.50 ± 46%    +142.3%      63.00 ± 53%  sched_debug.cfs_rq[4]:/.runnable_load_avg
    128.25 ± 18%    +227.5%     420.00 ± 43%    +152.4%     323.75 ± 68%  sched_debug.cfs_rq[4]:/.utilization_load_avg
     28320 ± 12%      -6.3%      26545 ± 11%     -19.4%      22813 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
      1015 ± 78%    +101.1%       2042 ± 25%     +64.4%       1669 ± 73%  sched_debug.cfs_rq[6]:/.blocked_load_avg
      1069 ± 72%    +100.2%       2140 ± 23%     +61.2%       1722 ± 70%  sched_debug.cfs_rq[6]:/.tg_load_contrib
    619.25 ± 12%      -6.3%     580.25 ± 11%     -19.2%     500.25 ± 13%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
     88.75 ± 14%     -47.3%      46.75 ± 36%     -24.5%      67.00 ± 11%  sched_debug.cfs_rq[9]:/.load
     59.25 ± 23%     -41.4%      34.75 ± 34%      -6.3%      55.50 ± 12%  sched_debug.cfs_rq[9]:/.runnable_load_avg
    315.50 ± 45%     -64.6%     111.67 ±  1%     -12.1%     277.25 ±  3%  sched_debug.cfs_rq[9]:/.utilization_load_avg
   2246758 ±  7%     +87.6%    4213925 ± 65%      -2.2%    2197475 ±  4%  sched_debug.cpu#0.nr_switches
   2249376 ±  7%     +87.4%    4215969 ± 65%      -2.2%    2199216 ±  4%  sched_debug.cpu#0.sched_count
   1121438 ±  7%     +81.0%    2030313 ± 61%      -2.2%    1096479 ±  4%  sched_debug.cpu#0.sched_goidle
   1151160 ±  7%     +86.5%    2146608 ± 64%      -1.9%    1129264 ±  3%  sched_debug.cpu#0.ttwu_count
     33.75 ± 15%     -22.2%      26.25 ±  6%      -8.9%      30.75 ± 10%  sched_debug.cpu#1.cpu_load[3]
     33.25 ± 10%     -18.0%      27.25 ±  7%      -3.8%      32.00 ± 11%  sched_debug.cpu#1.cpu_load[4]
     41.75 ± 29%     +23.4%      51.50 ± 33%     +53.9%      64.25 ± 16%  sched_debug.cpu#10.cpu_load[1]
     40.00 ± 18%     +24.4%      49.75 ± 18%     +49.4%      59.75 ±  8%  sched_debug.cpu#10.cpu_load[2]
     39.25 ± 14%     +22.3%      48.00 ± 10%     +38.9%      54.50 ±  7%  sched_debug.cpu#10.cpu_load[3]
     39.50 ± 15%     +20.3%      47.50 ±  6%     +30.4%      51.50 ±  7%  sched_debug.cpu#10.cpu_load[4]
   5269004 ±  1%     +27.8%    6731790 ± 30%      +1.4%    5342560 ±  2%  sched_debug.cpu#10.nr_switches
   5273193 ±  1%     +27.8%    6736526 ± 30%      +1.4%    5345791 ±  2%  sched_debug.cpu#10.sched_count
   2633974 ±  1%     +27.8%    3365271 ± 30%      +1.4%    2670901 ±  2%  sched_debug.cpu#10.sched_goidle
   2644149 ±  1%     +26.9%    3356318 ± 30%      +1.9%    2693295 ±  1%  sched_debug.cpu#10.ttwu_count
     26.50 ± 37%    +116.0%      57.25 ± 48%    +109.4%      55.50 ± 29%  sched_debug.cpu#11.cpu_load[0]
     30.75 ± 15%     +66.7%      51.25 ± 31%     +65.9%      51.00 ± 21%  sched_debug.cpu#11.cpu_load[1]
     33.50 ± 10%     +37.3%      46.00 ± 22%     +39.6%      46.75 ± 17%  sched_debug.cpu#11.cpu_load[2]
     37.00 ± 11%     +15.5%      42.75 ± 19%     +29.7%      48.00 ± 11%  sched_debug.cpu#11.cpu_load[4]
    508300 ± 11%      -0.6%     505024 ±  1%     +18.1%     600291 ±  7%  sched_debug.cpu#4.avg_idle
    454696 ±  9%      -5.9%     427894 ± 25%     +21.8%     553608 ±  4%  sched_debug.cpu#5.avg_idle
     66.00 ± 27%     +11.0%      73.25 ± 37%     -46.6%      35.25 ± 22%  sched_debug.cpu#6.cpu_load[0]
     62.00 ± 36%     +12.5%      69.75 ± 45%     -41.5%      36.25 ± 11%  sched_debug.cpu#6.cpu_load[1]
    247681 ± 19%     +21.0%     299747 ± 10%     +28.7%     318764 ± 17%  sched_debug.cpu#8.avg_idle
   5116609 ±  4%     +34.5%    6884238 ± 33%     +55.2%    7942254 ± 34%  sched_debug.cpu#9.nr_switches
   5120531 ±  4%     +34.5%    6889156 ± 33%     +55.2%    7945270 ± 34%  sched_debug.cpu#9.sched_count
   2557822 ±  4%     +34.5%    3441428 ± 33%     +55.2%    3970337 ± 34%  sched_debug.cpu#9.sched_goidle
   2565307 ±  4%     +32.9%    3410042 ± 33%     +54.0%    3949696 ± 34%  sched_debug.cpu#9.ttwu_count
      0.00 ±141%  +4.2e+05%       4.76 ±173%     +47.7%       0.00 ±-59671%  sched_debug.rt_rq[10]:/.rt_time
    155259 ±  0%      +0.0%     155259 ±  0%     -42.2%      89723 ±  0%  sched_debug.sysctl_sched.sysctl_sched_features

Best Regards,
Huang, Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ