lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20181015031457.GC28215@shao2-debian>
Date:   Mon, 15 Oct 2018 11:14:57 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Rik van Riel <riel@...riel.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Song Liu <songliubraving@...com>,
        Dave Hansen <dave.hansen@...el.com>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "H. Peter Anvin" <hpa@...or.com>, tipbuild@...or.com, lkp@...org
Subject: [LKP] [x86/mm/tlb]  5462bc3a9a:  unixbench.score 7.0% improvement

Greeting,

FYI, we noticed a 7.0% improvement of unixbench.score due to commit:


commit: 5462bc3a9a3c38328bbbd276d51164c7cf21d6a8 ("x86/mm/tlb: Always use lazy TLB mode")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm

in testcase: unixbench
on test machine: 8 threads Ivy Bridge with 16G memory
with following parameters:

	runtime: 300s
	nr_task: 1
	test: context1
	ucode: 0x20
	cpufreq_governor: performance

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.2/1/debian-x86_64-2018-04-03.cgz/300s/lkp-ivb-d01/context1/unixbench/0x20

commit: 
  a31acd3ee8 ("x86/mm: Page size aware flush_tlb_mm_range()")
  5462bc3a9a ("x86/mm/tlb: Always use lazy TLB mode")

a31acd3ee8f7dbc0 5462bc3a9a3c38328bbbd276d5 
---------------- -------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
          1:4          -25%            :4     dmesg.RIP:copy_page_to_iter
           :4          100%           4:4     dmesg.RIP:cpuidle_enter_state
           :4           25%           1:4     kmsg.ba52ac8>]usb_hcd_irq
          1:4          -25%            :4     kmsg.e4afb4>]usb_hcd_irq
           :4           25%           1:4     kmsg.e5d84e9>]usb_hcd_irq
          1:4          -25%            :4     kmsg.eaf7194>]usb_hcd_irq
          1:4          -25%            :4     kmsg.f4ac>]usb_hcd_irq
           :4           25%           1:4     kmsg.usb_hcd_irq
         %stddev     %change         %stddev
             \          |                \  
    386.50            +7.0%     413.58        unixbench.score
    410.13            -0.9%     406.25        unixbench.time.elapsed_time
    410.13            -0.9%     406.25        unixbench.time.elapsed_time.max
     55.00            +3.2%      56.75        unixbench.time.percent_of_cpu_this_job_got
    207.51            +2.2%     212.17        unixbench.time.system_time
  46114091            +7.0%   49358271        unixbench.time.voluntary_context_switches
  62958045            +6.2%   66876363        unixbench.workload
     22199            +1.9%      22621        interrupts.CAL:Function_call_interrupts
      0.22 ± 43%     -40.2%       0.13 ± 12%  turbostat.CPU%c6
    451150            +7.5%     484981        vmstat.system.cs
   3399624 ± 12%     -17.0%    2823082 ±  5%  cpuidle.POLL.time
   1497754 ±  2%     +18.2%    1770379        cpuidle.POLL.usage
      3826 ±  7%     +18.1%       4518 ±  7%  slabinfo.anon_vma.active_objs
      3875 ±  6%     +16.6%       4520 ±  7%  slabinfo.anon_vma.num_objs
      1280 ± 11%     -14.6%       1093 ±  9%  slabinfo.skbuff_head_cache.active_objs
    147541 ±  7%     +17.0%     172576 ±  9%  sched_debug.cfs_rq:/.load.avg
     89.04 ± 11%     +24.6%     110.95 ±  7%  sched_debug.cfs_rq:/.runnable_load_avg.avg
    155.33 ± 15%     +20.5%     187.16 ± 12%  sched_debug.cfs_rq:/.runnable_load_avg.stddev
    136488 ± 10%     +19.9%     163599 ±  7%  sched_debug.cfs_rq:/.runnable_weight.avg
    139887 ±  7%     +18.0%     165093 ±  3%  sched_debug.cpu.load.avg
   3820087 ±  9%     +20.8%    4613034 ±  8%  sched_debug.cpu.nr_switches.stddev
   3818882 ±  9%     +20.8%    4611438 ±  8%  sched_debug.cpu.sched_count.stddev
   1909422 ±  9%     +20.8%    2305711 ±  8%  sched_debug.cpu.sched_goidle.stddev
   1909878 ±  9%     +20.8%    2306344 ±  8%  sched_debug.cpu.ttwu_count.stddev
 2.633e+11 ± 25%     +29.7%  3.415e+11 ± 12%  perf-stat.branch-instructions
 1.865e+08            +7.0%  1.995e+08        perf-stat.context-switches
      1.41            -5.9%       1.32        perf-stat.cpi
      1.46 ±  3%      -0.5        1.00 ±  3%  perf-stat.dTLB-load-miss-rate%
 3.225e+11 ± 25%     +30.0%  4.192e+11 ± 12%  perf-stat.dTLB-loads
      0.15 ±  7%      -0.1        0.08 ±  3%  perf-stat.dTLB-store-miss-rate%
 2.001e+11 ± 25%     +30.1%  2.604e+11 ± 12%  perf-stat.dTLB-stores
     77.66           -15.9       61.81        perf-stat.iTLB-load-miss-rate%
 2.038e+09 ± 25%     -45.2%  1.118e+09 ± 14%  perf-stat.iTLB-load-misses
 1.213e+12 ± 25%     +29.4%  1.569e+12 ± 12%  perf-stat.instructions
    595.17          +136.3%       1406        perf-stat.instructions-per-iTLB-miss
      0.71            +6.3%       0.76        perf-stat.ipc
     10.90 ± 10%      -4.5        6.41 ± 45%  perf-profile.calltrace.cycles-pp.pipe_read.__vfs_read.vfs_read.ksys_read.do_syscall_64
     11.03 ± 10%      -4.4        6.60 ± 45%  perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.76 ±  9%      -3.2        4.54 ± 43%  perf-profile.calltrace.cycles-pp.pipe_wait.pipe_read.__vfs_read.vfs_read.ksys_read
      6.68 ±  9%      -3.1        3.60 ± 44%  perf-profile.calltrace.cycles-pp.schedule.pipe_wait.pipe_read.__vfs_read.vfs_read
      6.49 ±  9%      -3.0        3.48 ± 44%  perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.pipe_wait.pipe_read.__vfs_read
      4.80 ±  5%      -2.5        2.29 ± 48%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
      4.68 ±  4%      -2.5        2.20 ± 49%  perf-profile.calltrace.cycles-pp.__sched_text_start.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      1.01 ± 22%      -0.6        0.36 ±102%  perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.__vfs_read.vfs_read.ksys_read
     11.81 ± 11%      -5.5        6.27 ± 45%  perf-profile.children.cycles-pp.__sched_text_start
     12.77 ± 10%      -4.6        8.13 ± 44%  perf-profile.children.cycles-pp.ksys_read
     12.33 ± 10%      -4.6        7.74 ± 44%  perf-profile.children.cycles-pp.vfs_read
     10.92 ± 10%      -4.5        6.45 ± 44%  perf-profile.children.cycles-pp.pipe_read
     11.05 ± 10%      -4.4        6.66 ± 44%  perf-profile.children.cycles-pp.__vfs_read
      7.79 ±  9%      -3.1        4.68 ± 44%  perf-profile.children.cycles-pp.pipe_wait
      6.69 ±  9%      -3.0        3.69 ± 44%  perf-profile.children.cycles-pp.schedule
      2.68 ±  6%      -2.6        0.08 ± 66%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      5.12 ± 14%      -2.5        2.64 ± 46%  perf-profile.children.cycles-pp.schedule_idle
      2.25 ±  8%      -0.6        1.60 ± 34%  perf-profile.children.cycles-pp.tick_nohz_next_event
      1.08 ± 10%      -0.6        0.51 ± 49%  perf-profile.children.cycles-pp.copy_page_to_iter
      0.48 ± 32%      -0.3        0.18 ± 71%  perf-profile.children.cycles-pp.touch_atime
      1.07 ±  7%      -0.3        0.79 ± 35%  perf-profile.children.cycles-pp.__next_timer_interrupt
      0.42 ± 28%      -0.2        0.25 ± 32%  perf-profile.children.cycles-pp.___perf_sw_event
      0.68 ±  7%      -0.2        0.50 ± 33%  perf-profile.children.cycles-pp.find_next_bit
      0.28 ± 17%      -0.2        0.12 ± 57%  perf-profile.children.cycles-pp.account_entity_enqueue
      0.49 ±  9%      -0.1        0.35 ± 32%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.32 ± 12%      -0.1        0.19 ± 39%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.20 ± 19%      -0.1        0.06 ± 70%  perf-profile.children.cycles-pp.pm_qos_request
      0.18 ± 20%      -0.1        0.05 ±124%  perf-profile.children.cycles-pp.anon_pipe_buf_release
      0.21 ± 19%      -0.1        0.11 ± 70%  perf-profile.children.cycles-pp.rcu_needs_cpu
      0.14 ± 20%      -0.1        0.04 ±104%  perf-profile.children.cycles-pp.tick_check_broadcast_expired
      0.18 ± 57%      -0.1        0.10 ± 17%  perf-profile.children.cycles-pp.clockevents_program_event
      0.12 ± 39%      -0.1        0.03 ±102%  perf-profile.children.cycles-pp.irq_work_needs_cpu
      0.15 ± 18%      -0.1        0.09 ± 40%  perf-profile.children.cycles-pp.put_prev_entity
      0.08 ± 40%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.run_timer_softirq
      1.28 ±  5%      -1.2        0.07 ± 62%  perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.53 ± 17%      -0.4        0.17 ± 39%  perf-profile.self.cycles-pp.copy_page_to_iter
      0.24 ± 42%      -0.2        0.04 ±101%  perf-profile.self.cycles-pp.atime_needs_update
      0.47 ±  8%      -0.1        0.34 ± 32%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.32 ± 13%      -0.1        0.18 ± 38%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.18 ± 20%      -0.1        0.05 ±124%  perf-profile.self.cycles-pp.anon_pipe_buf_release
      0.18 ± 22%      -0.1        0.06 ± 70%  perf-profile.self.cycles-pp.pm_qos_request
      0.24 ± 12%      -0.1        0.12 ± 78%  perf-profile.self.cycles-pp.__calc_delta
      0.20 ± 17%      -0.1        0.11 ± 68%  perf-profile.self.cycles-pp.rcu_needs_cpu
      0.14 ± 19%      -0.1        0.04 ±104%  perf-profile.self.cycles-pp.tick_check_broadcast_expired
      0.11 ± 41%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.irq_work_needs_cpu
      0.09 ± 20%      -0.1        0.04 ±103%  perf-profile.self.cycles-pp.current_time


                                                                                
                      unixbench.time.voluntary_context_switches                 
                                                                                
  6e+07 +-+-----------------------------------------------------------------+   
        |                                                                   |   
  5e+07 O-OO O O OO O OO O OO O O OO O OO O OO O O O  O  O O OO O    O      |   
        |.++.+.+  +.+.++.+.++.+.+.++.+.++.+.++.+.+.+O.+.O  +.+  +.O.O+.+.++.|   
        |      :  :                                     :  : :  :           |   
  4e+07 +-+    :  :                                     :  : :  :           |   
        |       : :                                     : :  : :            |   
  3e+07 +-+     : :                                     : :  : :            |   
        |       : :                                     : :  : :            |   
  2e+07 +-+     ::                                       ::   ::            |   
        |       ::                                       ::   ::            |   
        |       ::                                       ::   ::            |   
  1e+07 +-+      :                                       :    :             |   
        |        :                                       :    :             |   
      0 +-+-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen

View attachment "config-4.19.0-rc5-00036-g5462bc3" of type "text/plain" (167748 bytes)

View attachment "job-script" of type "text/plain" (6920 bytes)

View attachment "job.yaml" of type "text/plain" (4540 bytes)

View attachment "reproduce" of type "text/plain" (293 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ