lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210214141833.GE6321@xsang-OptiPlex-9020>
Date:   Sun, 14 Feb 2021 22:18:33 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Juergen Gross <jgross@...e.com>
Cc:     Borislav Petkov <bp@...e.de>, Andy Lutomirski <luto@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...el.com
Subject: [x86/pv]  ab234a260b:  stress-ng.timerfd.ops_per_sec 6.6% improvement


Greeting,

FYI, we noticed a 6.6% improvement of stress-ng.timerfd.ops_per_sec due to commit:


commit: ab234a260b1f625b26cbefa93ca365b0ae66df33 ("x86/pv: Rework arch_local_irq_restore() to not use popf")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/paravirt


in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

	nr_threads: 10%
	disk: 1HDD
	testtime: 60s
	fs: ext4
	class: os
	test: timerfd
	cpufreq_governor: performance
	ucode: 0x5003003






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml
        bin/lkp run                    compatible-job.yaml

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
  os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/timerfd/stress-ng/60s/0x5003003

commit: 
  afd30525a6 ("x86/xen: Drop USERGS_SYSRET64 paravirt call")
  ab234a260b ("x86/pv: Rework arch_local_irq_restore() to not use popf")

afd30525a659ac0a ab234a260b1f625b26cbefa93ca 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
          0:4           34%           1:4     perf-profile.calltrace.cycles-pp.error_entry
          3:4           12%           3:4     perf-profile.children.cycles-pp.error_entry
          1:4           -1%           1:4     perf-profile.self.cycles-pp.error_entry
         %stddev     %change         %stddev
             \          |                \  
    675.25            +1.6%     686.00        stress-ng.time.percent_of_cpu_this_job_got
    376.77            -1.4%     371.41        stress-ng.time.system_time
     42.56 ±  2%     +28.5%      54.70        stress-ng.time.user_time
 5.309e+08            +6.6%   5.66e+08        stress-ng.timerfd.ops
   8847658            +6.6%    9432727        stress-ng.timerfd.ops_per_sec
      8.81            -1.9%       8.64        iostat.cpu.system
      0.73 ±  2%      +0.2        0.93        mpstat.cpu.all.usr%
    291454            -0.9%     288975        proc-vmstat.numa_local
    293563 ±  2%     +15.2%     338198        softirqs.RCU
   4506538            +4.4%    4706667        vmstat.system.in
      5.75 ± 23%     -26.3%       4.24 ± 10%  perf-sched.wait_and_delay.max.ms.__sched_text_start.__sched_text_start.schedule_timeout.rcu_gp_kthread.kthread
      4.95           -14.6%       4.23 ± 10%  perf-sched.wait_time.max.ms.__sched_text_start.__sched_text_start.schedule_timeout.rcu_gp_kthread.kthread
      1797 ±  7%     -13.0%       1563 ±  8%  slabinfo.khugepaged_mm_slot.active_objs
      1797 ±  7%     -13.0%       1563 ±  8%  slabinfo.khugepaged_mm_slot.num_objs
      9508 ±  3%      -8.8%       8672 ±  5%  numa-vmstat.node0.nr_kernel_stack
    655.75 ±  7%     -25.6%     488.00 ± 12%  numa-vmstat.node0.nr_page_table_pages
      9875 ±  5%      -7.8%       9105 ±  4%  numa-vmstat.node0.nr_slab_reclaimable
    565.75 ±  8%     +29.9%     734.75 ±  8%  numa-vmstat.node1.nr_page_table_pages
     39502 ±  5%      -7.8%      36424 ±  4%  numa-meminfo.node0.KReclaimable
      9508 ±  3%      -8.8%       8672 ±  5%  numa-meminfo.node0.KernelStack
      2623 ±  7%     -25.4%       1956 ± 12%  numa-meminfo.node0.PageTables
     39502 ±  5%      -7.8%      36424 ±  4%  numa-meminfo.node0.SReclaimable
      2264 ±  8%     +30.1%       2946 ±  8%  numa-meminfo.node1.PageTables
      0.14 ±  8%     +25.0%       0.18 ±  5%  sched_debug.cfs_rq:/.nr_running.avg
      0.35 ±  3%      +9.6%       0.38 ±  2%  sched_debug.cfs_rq:/.nr_running.stddev
   1047995 ±  7%     +47.2%    1542703 ± 13%  sched_debug.cpu.avg_idle.max
    262.12 ±  4%     +18.3%     310.09 ±  7%  sched_debug.cpu.curr->pid.avg
      0.12 ±  4%     +21.3%       0.14 ±  3%  sched_debug.cpu.nr_running.avg
      0.32 ±  2%     +10.9%       0.35 ±  2%  sched_debug.cpu.nr_running.stddev
    582.50 ± 25%    +337.1%       2546 ±115%  interrupts.CPU1.CAL:Function_call_interrupts
    436.25 ±124%    +221.3%       1401 ± 31%  interrupts.CPU1.NMI:Non-maskable_interrupts
    436.25 ±124%    +221.3%       1401 ± 31%  interrupts.CPU1.PMI:Performance_monitoring_interrupts
    606.25 ± 51%    +262.5%       2197 ±105%  interrupts.CPU11.CAL:Function_call_interrupts
    627.50 ± 20%     -21.0%     495.50        interrupts.CPU18.CAL:Function_call_interrupts
      1327 ± 65%     -90.8%     122.50 ± 23%  interrupts.CPU28.NMI:Non-maskable_interrupts
      1327 ± 65%     -90.8%     122.50 ± 23%  interrupts.CPU28.PMI:Performance_monitoring_interrupts
     96.75 ± 32%    +248.6%     337.25 ± 59%  interrupts.CPU47.NMI:Non-maskable_interrupts
     96.75 ± 32%    +248.6%     337.25 ± 59%  interrupts.CPU47.PMI:Performance_monitoring_interrupts
    318.50 ±128%    +753.6%       2718 ± 58%  interrupts.CPU49.NMI:Non-maskable_interrupts
    318.50 ±128%    +753.6%       2718 ± 58%  interrupts.CPU49.PMI:Performance_monitoring_interrupts
      2698 ± 31%     -59.1%       1104 ± 52%  interrupts.CPU5.NMI:Non-maskable_interrupts
      2698 ± 31%     -59.1%       1104 ± 52%  interrupts.CPU5.PMI:Performance_monitoring_interrupts
   2386946 ± 46%    +184.0%    6779268 ± 30%  interrupts.CPU64.LOC:Local_timer_interrupts
    533.00 ±  5%      -7.1%     495.00        interrupts.CPU68.CAL:Function_call_interrupts
    689256 ± 57%    +222.6%    2223739 ± 33%  interrupts.CPU7.LOC:Local_timer_interrupts
      2.00 ± 93%   +2175.0%      45.50 ±133%  interrupts.CPU7.RES:Rescheduling_interrupts
    431.25 ±132%    +471.4%       2464 ±129%  interrupts.CPU74.NMI:Non-maskable_interrupts
    431.25 ±132%    +471.4%       2464 ±129%  interrupts.CPU74.PMI:Performance_monitoring_interrupts
      2349 ±124%     -93.8%     146.25 ±  6%  interrupts.CPU76.NMI:Non-maskable_interrupts
      2349 ±124%     -93.8%     146.25 ±  6%  interrupts.CPU76.PMI:Performance_monitoring_interrupts
   1890196 ± 62%    +190.4%    5490038 ± 34%  interrupts.CPU79.LOC:Local_timer_interrupts
    107.25 ± 21%    +149.7%     267.75 ± 86%  interrupts.CPU93.NMI:Non-maskable_interrupts
    107.25 ± 21%    +149.7%     267.75 ± 86%  interrupts.CPU93.PMI:Performance_monitoring_interrupts
    124.00 ± 25%    +111.3%     262.00 ± 44%  interrupts.CPU95.NMI:Non-maskable_interrupts
    124.00 ± 25%    +111.3%     262.00 ± 44%  interrupts.CPU95.PMI:Performance_monitoring_interrupts
    994.25 ± 15%     +34.4%       1336 ± 12%  interrupts.RES:Rescheduling_interrupts
 4.801e+09            +6.6%  5.119e+09        perf-stat.i.branch-instructions
  99909476            +5.4%  1.053e+08        perf-stat.i.branch-misses
     17.72 ±  3%      +0.6       18.35 ±  2%  perf-stat.i.cache-miss-rate%
   1664858 ±  7%     +10.4%    1837658        perf-stat.i.cache-misses
      1.17            -4.1%       1.12        perf-stat.i.cpi
 2.758e+10            +1.1%  2.789e+10        perf-stat.i.cpu-cycles
 6.845e+09            +6.2%  7.269e+09        perf-stat.i.dTLB-loads
      0.02 ±  3%      +0.0        0.03 ±  3%  perf-stat.i.dTLB-store-miss-rate%
    998610 ±  4%     +32.4%    1321727 ±  2%  perf-stat.i.dTLB-store-misses
 4.522e+09            +4.6%  4.731e+09        perf-stat.i.dTLB-stores
 2.408e+10            +5.7%  2.545e+10        perf-stat.i.instructions
      0.86            +4.3%       0.90        perf-stat.i.ipc
      0.29            +1.1%       0.29        perf-stat.i.metric.GHz
    168.55            +5.9%     178.46        perf-stat.i.metric.M/sec
      2.08            -0.0        2.06        perf-stat.overall.branch-miss-rate%
      1.15            -4.3%       1.10        perf-stat.overall.cpi
      0.02 ±  4%      +0.0        0.03 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
      0.87            +4.5%       0.91        perf-stat.overall.ipc
 4.723e+09            +6.6%  5.036e+09        perf-stat.ps.branch-instructions
  98286780            +5.4%  1.036e+08        perf-stat.ps.branch-misses
   1638114 ±  7%     +10.4%    1808597        perf-stat.ps.cache-misses
 2.714e+10            +1.1%  2.744e+10        perf-stat.ps.cpu-cycles
 6.734e+09            +6.2%  7.151e+09        perf-stat.ps.dTLB-loads
    982410 ±  4%     +32.4%    1300313 ±  2%  perf-stat.ps.dTLB-store-misses
 4.449e+09            +4.6%  4.654e+09        perf-stat.ps.dTLB-stores
 2.369e+10            +5.7%  2.504e+10        perf-stat.ps.instructions
 1.489e+12            +6.0%  1.578e+12        perf-stat.total.instructions
      9.09 ±  9%      -1.5        7.54 ±  9%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.timerfd_read.vfs_read.ksys_read.do_syscall_64
      3.00 ±  8%      -1.1        1.90 ±  9%  perf-profile.calltrace.cycles-pp.__x64_sys_select.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.00 ±  8%      -1.1        1.90 ±  9%  perf-profile.calltrace.cycles-pp.kern_select.__x64_sys_select.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.96 ±  8%      -1.1        1.86 ±  9%  perf-profile.calltrace.cycles-pp.core_sys_select.kern_select.__x64_sys_select.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.89 ±  8%      -1.1        1.80 ±  9%  perf-profile.calltrace.cycles-pp.do_select.core_sys_select.kern_select.__x64_sys_select.do_syscall_64
      5.18 ±  9%      -1.0        4.17 ±  9%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.timerfd_read.vfs_read.ksys_read
      4.68 ±  9%      -1.0        3.72 ±  9%  perf-profile.calltrace.cycles-pp.asm_call_sysvec_on_stack.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.timerfd_read.vfs_read
      4.64 ±  9%      -1.0        3.69 ±  9%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.asm_call_sysvec_on_stack.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.timerfd_read
      1.68 ±  9%      -0.5        1.14 ± 10%  perf-profile.calltrace.cycles-pp.timerfd_poll.do_select.core_sys_select.kern_select.__x64_sys_select
      0.77 ± 11%      -0.4        0.40 ± 57%  perf-profile.calltrace.cycles-pp.timerfd_tmrproc.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.asm_call_sysvec_on_stack
      1.41 ±  5%      +0.3        1.70 ±  8%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      1.46 ±  5%      +0.3        1.79 ±  9%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      1.63 ±  5%      +0.4        2.02 ±  9%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      1.71 ±  4%      +0.4        2.14 ± 10%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.26 ±100%      +0.4        0.70 ± 10%  perf-profile.calltrace.cycles-pp.clockevents_program_event.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.27 ±100%      +0.6        0.84 ±  8%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.do_timerfd_gettime.__x64_sys_timerfd_gettime.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.22 ±  7%      -2.6        0.62 ±  7%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      2.97 ±  8%      -1.1        1.86 ±  9%  perf-profile.children.cycles-pp.core_sys_select
      2.94 ±  8%      -1.1        1.83 ±  9%  perf-profile.children.cycles-pp.do_select
      3.00 ±  8%      -1.1        1.90 ±  9%  perf-profile.children.cycles-pp.kern_select
      3.00 ±  8%      -1.1        1.90 ±  9%  perf-profile.children.cycles-pp.__x64_sys_select
      1.70 ±  9%      -0.5        1.17 ± 11%  perf-profile.children.cycles-pp.timerfd_poll
      2.65 ± 10%      -0.4        2.20 ± 10%  perf-profile.children.cycles-pp.__fget_light
      2.25 ± 10%      -0.4        1.83 ±  9%  perf-profile.children.cycles-pp.timerfd_tmrproc
      0.30 ±  5%      +0.1        0.41 ± 11%  perf-profile.children.cycles-pp.sync_regs
      3.19 ±  8%      -2.6        0.58 ±  8%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      2.04 ± 10%      -0.5        1.58 ±  9%  perf-profile.self.cycles-pp.__fget_light
      0.18 ±  6%      -0.0        0.13 ± 18%  perf-profile.self.cycles-pp.tick_program_event
      0.30 ±  4%      +0.1        0.40 ± 11%  perf-profile.self.cycles-pp.sync_regs
      0.80 ±  8%      +0.2        1.00 ±  7%  perf-profile.self.cycles-pp.do_timerfd_gettime


                                                                                
                             stress-ng.timerfd.ops_per_sec                      
                                                                                
    1e+07 +-----------------------------------------------------------------+   
  9.5e+06 |-+      O O        OO       OO O O  O O O O OO        O      O   |   
          | O OO O    O O O O    O O O        O             O   O  O O O  O |   
    9e+06 |.+.++.+.+.++.+.+.+.++.+.+.+.+  +.+.++.+.+.+.++.+.+.+.++.+        |   
  8.5e+06 |-+                          :  :                                 |   
          |                            :  :                                 |   
    8e+06 |-+                          : :                                  |   
  7.5e+06 |-+                          : :                                  |   
    7e+06 |-+                           ::                                  |   
          |                             ::                                  |   
  6.5e+06 |-+                           ::                                  |   
    6e+06 |-+                           :                                   |   
          |                             :                                   |   
  5.5e+06 |-+                           +                 O   O             |   
    5e+06 +-----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                             stress-ng.time.user_time                           
                                                                                
  300 +---------------------------------------------------------------------+   
      |                                                                     |   
  250 |-+                                                O   O              |   
      |                               +                                     |   
      |                               :                                     |   
  200 |-+                             :                                     |   
      |                              : :                                    |   
  150 |-+                            : :                                    |   
      |                              : :                                    |   
  100 |-+                            : :                                    |   
      |                             :   :                                   |   
      | O O O OO O O O O   O O OO O : O : O O O O OO O O   O   O O OO O O O |   
   50 |.+.+.+.++.+.+.+.+.+.+.+.++.+.+   +.+.+.+.+.++.+.+.+.+.+.+.+.+        |   
      |                                                                     |   
    0 +---------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


View attachment "config-5.11.0-rc7-00005-gab234a260b1f" of type "text/plain" (174007 bytes)

View attachment "job-script" of type "text/plain" (8123 bytes)

View attachment "job.yaml" of type "text/plain" (5625 bytes)

View attachment "reproduce" of type "text/plain" (535 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ