lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 18 Dec 2020 14:38:03 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...el.com
Subject: [perf/x86]  e506d1dac0:  stress-ng.sigsuspend.ops_per_sec 58.5%
 improvement


Greeting,

FYI, we noticed a 58.5% improvement of stress-ng.sigsuspend.ops_per_sec due to commit:


commit: e506d1dac0edb2df82f2aa0582e814f9cd9aa07d ("perf/x86: Make dummy_iregs static")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
with following parameters:

	nr_threads: 100%
	disk: 1HDD
	testtime: 30s
	class: interrupt
	cpufreq_governor: performance
	ucode: 0x5002f01






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
  interrupt/gcc-9/performance/1HDD/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/stress-ng/30s/0x5002f01

commit: 
  76a4efa809 ("perf/arch: Remove perf_sample_data::regs_user_copy")
  e506d1dac0 ("perf/x86: Make dummy_iregs static")

76a4efa80900fc40 e506d1dac0edb2df82f2aa0582e 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
           :6           17%           1:6     kmsg.BTRFS_error(device_sda1):bdev/dev/sda1_errs:wr#,rd#,flush#,corrupt#,gen
         %stddev     %change         %stddev
             \          |                \  
  19676113 ±  2%     +38.4%   27232673        stress-ng.sigrt.ops
    655554 ±  2%     +38.2%     906196        stress-ng.sigrt.ops_per_sec
  19953963 ±  4%     +58.5%   31619237        stress-ng.sigsuspend.ops
    665085 ±  4%     +58.5%    1053922        stress-ng.sigsuspend.ops_per_sec
 1.199e+08 ±  3%     +40.0%  1.677e+08 ±  6%  stress-ng.time.involuntary_context_switches
 3.251e+08 ±  6%     +21.7%  3.958e+08 ±  5%  stress-ng.time.voluntary_context_switches
  92940673            -2.8%   90309178        interrupts.CAL:Function_call_interrupts
    672419 ±  4%     +23.8%     832364 ±  5%  vmstat.system.cs
     46309 ±  4%      -9.3%      41992 ±  4%  slabinfo.Acpi-State.active_objs
    912.33 ±  4%      -9.4%     826.17 ±  4%  slabinfo.Acpi-State.active_slabs
     46559 ±  4%      -9.5%      42156 ±  4%  slabinfo.Acpi-State.num_objs
    912.33 ±  4%      -9.4%     826.17 ±  4%  slabinfo.Acpi-State.num_slabs
    476933 ±  4%      +8.1%     515358 ±  3%  sched_debug.cpu.avg_idle.avg
   2404226 ±  5%     +17.0%    2813799 ±  6%  sched_debug.cpu.nr_switches.avg
   2707460 ±  3%     +21.0%    3276452 ±  8%  sched_debug.cpu.nr_switches.max
   2046442 ±  5%     +12.4%    2300906 ±  7%  sched_debug.cpu.nr_switches.min
    150123 ± 14%     +47.4%     221244 ± 20%  sched_debug.cpu.nr_switches.stddev
   2494371 ±  5%     +16.3%    2900267 ±  5%  sched_debug.cpu.sched_count.avg
   3030416 ±  5%     +15.6%    3504469 ±  9%  sched_debug.cpu.sched_count.max
   1526367 ±  8%     +17.8%    1797379 ±  7%  sched_debug.cpu.ttwu_count.avg
   1737242 ±  7%     +18.8%    2064291 ±  7%  sched_debug.cpu.ttwu_count.max
   1321299 ±  8%     +15.6%    1527713 ± 10%  sched_debug.cpu.ttwu_count.min
   1053766 ±  2%     +16.9%    1231415 ±  5%  sched_debug.cpu.ttwu_local.avg
   1254194           +19.1%    1493169 ±  7%  sched_debug.cpu.ttwu_local.max
     23.91            -2.4       21.54 ±  6%  perf-stat.i.cache-miss-rate%
  59213457 ±  4%     -17.8%   48681553 ±  9%  perf-stat.i.cache-misses
    604543 ±  4%     +27.6%     771225 ±  4%  perf-stat.i.context-switches
     22934 ±  5%     +30.1%      29829 ± 11%  perf-stat.i.cycles-between-cache-misses
   4814712 ± 12%     +21.2%    5834775 ± 12%  perf-stat.i.iTLB-loads
  14029258 ±  2%     -27.2%   10214682 ± 20%  perf-stat.i.node-load-misses
      3338 ±  5%     +16.6%       3892 ±  9%  perf-stat.overall.cycles-between-cache-misses
     69.59            -5.9       63.73 ±  3%  perf-stat.overall.node-load-miss-rate%
  66908064 ±  4%     -14.0%   57544495 ±  8%  perf-stat.ps.cache-misses
    673703 ±  4%     +23.6%     832871 ±  5%  perf-stat.ps.context-switches
     61553 ± 13%     +12.4%      69199 ± 12%  perf-stat.ps.cpu-migrations
   5187429 ± 10%     +19.1%    6177199 ± 11%  perf-stat.ps.iTLB-loads
  13948921 ±  2%     -25.7%   10365005 ± 18%  perf-stat.ps.node-load-misses
     28.53 ± 70%     -14.1       14.39 ±141%  perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.new_sync_write.vfs_write.ksys_pwrite64.do_syscall_64
     28.41 ± 70%     -14.1       14.34 ±141%  perf-profile.calltrace.cycles-pp.btrfs_buffered_write.btrfs_file_write_iter.new_sync_write.vfs_write.ksys_pwrite64
     19.52 ± 70%      -9.4       10.10 ±141%  perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.btrfs_buffered_write.btrfs_file_write_iter.new_sync_write.vfs_write
     13.61 ± 70%      -6.7        6.89 ±141%  perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.btrfs_dirty_pages.btrfs_buffered_write.btrfs_file_write_iter
     13.61 ± 70%      -6.7        6.89 ±141%  perf-profile.calltrace.cycles-pp.clear_extent_bit.btrfs_dirty_pages.btrfs_buffered_write.btrfs_file_write_iter.new_sync_write
     13.52 ± 70%      -6.7        6.84 ±141%  perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.btrfs_dirty_pages.btrfs_buffered_write
     13.51 ± 70%      -6.7        6.83 ±141%  perf-profile.calltrace.cycles-pp.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit.clear_extent_bit.btrfs_dirty_pages
      7.95 ± 71%      -4.2        3.79 ±142%  perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_file_write_iter.new_sync_write.vfs_write
      7.68 ± 71%      -4.0        3.66 ±142%  perf-profile.calltrace.cycles-pp.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_file_write_iter.new_sync_write
      7.67 ± 71%      -4.0        3.66 ±142%  perf-profile.calltrace.cycles-pp.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_file_write_iter
      7.45 ± 71%      -3.9        3.57 ±142%  perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit.clear_extent_bit
      7.44 ± 71%      -3.9        3.56 ±142%  perf-profile.calltrace.cycles-pp.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit
      7.39 ± 71%      -3.9        3.54 ±142%  perf-profile.calltrace.cycles-pp._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit
      7.34 ± 71%      -3.8        3.50 ±143%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write
      7.27 ± 71%      -3.8        3.48 ±142%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent
      7.22 ± 71%      -3.8        3.44 ±143%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata
      5.75 ± 72%      -2.6        3.13 ±142%  perf-profile.calltrace.cycles-pp.__set_extent_bit.set_extent_bit.btrfs_set_extent_delalloc.btrfs_dirty_pages.btrfs_buffered_write
      5.75 ± 72%      -2.6        3.13 ±142%  perf-profile.calltrace.cycles-pp.set_extent_bit.btrfs_set_extent_delalloc.btrfs_dirty_pages.btrfs_buffered_write.btrfs_file_write_iter
      5.75 ± 72%      -2.6        3.13 ±142%  perf-profile.calltrace.cycles-pp.btrfs_set_extent_delalloc.btrfs_dirty_pages.btrfs_buffered_write.btrfs_file_write_iter.new_sync_write
      5.67 ± 72%      -2.6        3.09 ±142%  perf-profile.calltrace.cycles-pp.set_state_bits.__set_extent_bit.set_extent_bit.btrfs_set_extent_delalloc.btrfs_dirty_pages
      5.67 ± 72%      -2.6        3.09 ±142%  perf-profile.calltrace.cycles-pp.btrfs_set_delalloc_extent.set_state_bits.__set_extent_bit.set_extent_bit.btrfs_set_extent_delalloc
      5.47 ± 72%      -2.5        2.98 ±142%  perf-profile.calltrace.cycles-pp._raw_spin_lock.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit.clear_extent_bit
      5.43 ± 72%      -2.5        2.96 ±142%  perf-profile.calltrace.cycles-pp._raw_spin_lock.btrfs_set_delalloc_extent.set_state_bits.__set_extent_bit.set_extent_bit
      5.32 ± 73%      -2.4        2.91 ±142%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.btrfs_set_delalloc_extent.set_state_bits.__set_extent_bit
      5.32 ± 73%      -2.4        2.92 ±142%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit
     28.53 ± 70%     -14.1       14.39 ±141%  perf-profile.children.cycles-pp.btrfs_file_write_iter
     28.42 ± 70%     -14.1       14.34 ±141%  perf-profile.children.cycles-pp.btrfs_buffered_write
     19.52 ± 70%      -9.4       10.10 ±141%  perf-profile.children.cycles-pp.btrfs_dirty_pages
     13.72 ± 70%      -6.8        6.94 ±141%  perf-profile.children.cycles-pp.__clear_extent_bit
     13.61 ± 70%      -6.7        6.89 ±141%  perf-profile.children.cycles-pp.clear_extent_bit
     13.57 ± 70%      -6.7        6.87 ±141%  perf-profile.children.cycles-pp.clear_state_bit
     13.52 ± 70%      -6.7        6.84 ±141%  perf-profile.children.cycles-pp.btrfs_clear_delalloc_extent
      7.95 ± 71%      -4.2        3.79 ±142%  perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
      7.93 ± 71%      -4.2        3.77 ±142%  perf-profile.children.cycles-pp.__reserve_bytes
      7.68 ± 71%      -4.0        3.66 ±142%  perf-profile.children.cycles-pp.btrfs_reserve_metadata_bytes
      7.48 ± 71%      -3.9        3.58 ±142%  perf-profile.children.cycles-pp.btrfs_inode_rsv_release
      7.46 ± 71%      -3.9        3.57 ±142%  perf-profile.children.cycles-pp.btrfs_block_rsv_release
      5.91 ± 72%      -2.7        3.21 ±142%  perf-profile.children.cycles-pp.__set_extent_bit
      5.81 ± 72%      -2.7        3.16 ±142%  perf-profile.children.cycles-pp.set_extent_bit
      5.75 ± 72%      -2.6        3.13 ±142%  perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
      5.69 ± 72%      -2.6        3.10 ±142%  perf-profile.children.cycles-pp.set_state_bits
      5.68 ± 72%      -2.6        3.09 ±142%  perf-profile.children.cycles-pp.btrfs_set_delalloc_extent


                                                                                
                           stress-ng.sigsuspend.ops_per_sec                     
                                                                                
   1.1e+06 +----------------------------------------------------------------+   
  1.05e+06 |-+                      O O   O                                 |   
           |     O                      O                                   |   
     1e+06 |-+            O   O   O                                         |   
    950000 |-O       O      O   O                                           |   
           |   O                                                            |   
    900000 |-+                                                              |   
    850000 |-+     O                                                        |   
    800000 |-+                                                              |   
           |                                                                |   
    750000 |-+              +       +.                                      |   
    700000 |-+       +..O  + :     +  +.+     .+. .+.         .+..          |   
           |+ .+.+. +   +.+  :   .+      +  .+   +   +.+    .+    +. .+.+. +|   
    650000 |-+     +          +.+         +.            + .+        +     + |   
    600000 +----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


View attachment "config-5.10.0-rc2-00372-ge506d1dac0ed" of type "text/plain" (171265 bytes)

View attachment "job-script" of type "text/plain" (8183 bytes)

View attachment "job.yaml" of type "text/plain" (5697 bytes)

View attachment "reproduce" of type "text/plain" (464 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ