lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Mon, 7 Sep 2020 16:37:09 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Josh Don <joshdon@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Venkatesh Pallipadi <venki@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...el.com,
        aubrey.li@...ux.intel.com, yu.c.chen@...el.com
Subject: [sched/fair] ec73240b16: aim7.jobs-per-min 2.3% improvement

Greeting,

FYI, we noticed a 2.3% improvement of aim7.jobs-per-min due to commit:


commit: ec73240b1627cddfd7cef018c7fa1c32e64a721e ("sched/fair: Ignore cache hotness for SMT migration")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core


in testcase: aim7
on test machine: 192 threads Cooper Lake with 128G memory
with following parameters:

	disk: 4BRD_12G
	md: RAID1
	fs: xfs
	test: sync_disk_rw
	load: 300
	cpufreq_governor: performance
	ucode: 0x86000017

test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/4BRD_12G/xfs/x86_64-rhel-8.3/300/RAID1/debian-10.4-x86_64-20200603.cgz/lkp-cpx-4s1/sync_disk_rw/aim7/0x86000017

commit: 
  5f4a1c4ea4 ("sched/topology: Mark SD_NUMA as SDF_NEEDS_GROUPS")
  ec73240b16 ("sched/fair: Ignore cache hotness for SMT migration")

5f4a1c4ea44728aa ec73240b1627cddfd7cef018c7f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      5365            +2.3%       5488        aim7.jobs-per-min
    335.52            -2.2%     327.98        aim7.time.elapsed_time
    335.52            -2.2%     327.98        aim7.time.elapsed_time.max
      5014 ±  2%      -7.5%       4638 ±  2%  aim7.time.system_time
  51059388            -1.5%   50309088        aim7.time.voluntary_context_switches
     49.81 ±  2%      +4.0%      51.78        iostat.cpu.iowait
      9.24            -4.3%       8.84 ±  2%  iostat.cpu.system
      0.07 ± 24%    +138.0%       0.16 ± 31%  sched_debug.cfs_rq:/.nr_spread_over.avg
      0.20 ± 28%     +77.0%       0.36 ± 22%  sched_debug.cfs_rq:/.nr_spread_over.stddev
     49.38 ±  2%      +4.1%      51.38        vmstat.cpu.wa
    223673            +2.3%     228851        vmstat.io.bo
    298749            +2.1%     304962        proc-vmstat.nr_file_pages
    241703            +2.8%     248552        proc-vmstat.nr_unevictable
    241703            +2.8%     248552        proc-vmstat.nr_zone_unevictable
      5330            -9.3%       4835 ±  2%  proc-vmstat.nr_zone_write_pending
   1330084            -2.3%    1298931        proc-vmstat.pgfault
      1.63            -0.0        1.59        perf-stat.i.branch-miss-rate%
     27.64            +0.9       28.52        perf-stat.i.cache-miss-rate%
      1197            -4.3%       1146        perf-stat.i.cycles-between-cache-misses
  34968537            +4.0%   36353165        perf-stat.i.node-load-misses
   3229831            +4.0%    3357767        perf-stat.i.node-loads
      1.63            -0.0        1.58        perf-stat.overall.branch-miss-rate%
     27.72            +0.9       28.63        perf-stat.overall.cache-miss-rate%
      1161            -4.8%       1105        perf-stat.overall.cycles-between-cache-misses
  34868866            +4.0%   36247556        perf-stat.ps.node-load-misses
   3220778            +4.0%    3348190        perf-stat.ps.node-loads
     15.56 ±  3%      -2.6       12.92 ±  4%  perf-profile.calltrace.cycles-pp.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write.vfs_write
      9.81 ±  4%      -1.8        7.97 ±  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
      9.78 ±  4%      -1.8        7.95 ±  5%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
      5.69 ±  2%      -0.8        4.91 ±  2%  perf-profile.calltrace.cycles-pp.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
      4.01 ±  2%      -0.5        3.49 ±  2%  perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
      3.99 ±  2%      -0.5        3.47 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync
      3.68 ±  2%      -0.4        3.29 ±  4%  perf-profile.calltrace.cycles-pp.remove_wait_queue.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
      3.67 ±  2%      -0.4        3.28 ±  4%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync
      3.65 ±  2%      -0.4        3.26 ±  4%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.__xfs_log_force_lsn.xfs_log_force_lsn
      5.69            -0.4        5.34        perf-profile.calltrace.cycles-pp.load_balance.newidle_balance.pick_next_task_fair.__sched_text_start.schedule
      1.63 ±  3%      -0.3        1.37 ±  2%  perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync
      1.64 ±  3%      -0.3        1.39 ±  2%  perf-profile.calltrace.cycles-pp.schedule.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
      1.52 ±  3%      -0.2        1.27 ±  2%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.xlog_wait_on_iclog.__xfs_log_force_lsn
      1.51 ±  3%      -0.2        1.27 ±  2%  perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__sched_text_start.schedule.xlog_wait_on_iclog
      2.89 ±  2%      -0.2        2.69 ±  2%  perf-profile.calltrace.cycles-pp.schedule.io_schedule.wait_on_page_bit.__filemap_fdatawait_range.file_write_and_wait_range
      2.90 ±  2%      -0.2        2.70 ±  2%  perf-profile.calltrace.cycles-pp.io_schedule.wait_on_page_bit.__filemap_fdatawait_range.file_write_and_wait_range.xfs_file_fsync
      2.89 ±  2%      -0.2        2.69 ±  2%  perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.io_schedule.wait_on_page_bit.__filemap_fdatawait_range
      2.64 ±  2%      -0.2        2.44 ±  2%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.io_schedule.wait_on_page_bit
      2.64 ±  2%      -0.2        2.44 ±  2%  perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__sched_text_start.schedule.io_schedule
      0.68 ±  2%      +0.0        0.73 ±  2%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.wait_for_completion.__flush_work.xlog_cil_force_lsn
      0.68 ±  2%      +0.0        0.73 ±  2%  perf-profile.calltrace.cycles-pp.schedule_timeout.wait_for_completion.__flush_work.xlog_cil_force_lsn.xfs_log_force_lsn
      0.68 ±  2%      +0.0        0.73 ±  2%  perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.schedule_timeout.wait_for_completion.__flush_work
      0.69 ±  2%      +0.0        0.74 ±  2%  perf-profile.calltrace.cycles-pp.wait_for_completion.__flush_work.xlog_cil_force_lsn.xfs_log_force_lsn.xfs_file_fsync
      0.79            +0.0        0.84 ±  2%  perf-profile.calltrace.cycles-pp.__flush_work.xlog_cil_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
      1.79 ±  2%      +0.1        1.88 ±  2%  perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
      2.04            +0.1        2.14 ±  2%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
      3.02 ±  4%      +0.3        3.33 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request
      3.08 ±  4%      +0.3        3.39 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request.md_submit_bio
      4.47 ±  3%      +0.4        4.84 ±  2%  perf-profile.calltrace.cycles-pp.md_flush_request.raid1_make_request.md_handle_request.md_submit_bio.submit_bio_noacct
      4.62 ±  3%      +0.4        4.99 ±  3%  perf-profile.calltrace.cycles-pp.md_submit_bio.submit_bio_noacct.submit_bio.submit_bio_wait.blkdev_issue_flush
      4.64 ±  3%      +0.4        5.02 ±  3%  perf-profile.calltrace.cycles-pp.submit_bio_noacct.submit_bio.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync
      4.55 ±  3%      +0.4        4.93 ±  3%  perf-profile.calltrace.cycles-pp.md_handle_request.md_submit_bio.submit_bio_noacct.submit_bio.submit_bio_wait
      4.64 ±  3%      +0.4        5.02 ±  3%  perf-profile.calltrace.cycles-pp.submit_bio.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_aio_write
      4.66 ±  3%      +0.4        5.04 ±  3%  perf-profile.calltrace.cycles-pp.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
      4.71 ±  3%      +0.4        5.10 ±  3%  perf-profile.calltrace.cycles-pp.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write.vfs_write
      8.29            +0.6        8.86 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn
      9.03            +0.9        9.92        perf-profile.calltrace.cycles-pp.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
     10.21            +0.9       11.14        perf-profile.calltrace.cycles-pp.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write.vfs_write
      4.33 ±  3%      +1.1        5.42 ±  4%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_log_force_lsn
      4.34 ±  3%      +1.1        5.44 ±  4%  perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync
      4.73 ±  2%      +1.1        5.88 ±  3%  perf-profile.calltrace.cycles-pp.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
     24.59 ±  2%      -1.7       22.84 ±  2%  perf-profile.children.cycles-pp.__xfs_log_force_lsn
     11.23 ±  4%      -1.6        9.59 ±  4%  perf-profile.children.cycles-pp._raw_spin_lock
      7.45            -0.3        7.16        perf-profile.children.cycles-pp.schedule
      6.76            -0.3        6.47        perf-profile.children.cycles-pp.newidle_balance
      6.85            -0.3        6.57        perf-profile.children.cycles-pp.pick_next_task_fair
      6.62            -0.3        6.34        perf-profile.children.cycles-pp.load_balance
      7.92            -0.3        7.64        perf-profile.children.cycles-pp.__sched_text_start
      2.90 ±  2%      -0.2        2.70 ±  2%  perf-profile.children.cycles-pp.io_schedule
      0.70 ±  2%      +0.0        0.74 ±  2%  perf-profile.children.cycles-pp.wait_for_completion
      0.79            +0.0        0.84 ±  2%  perf-profile.children.cycles-pp.__flush_work
      0.70 ±  2%      +0.1        0.75 ±  2%  perf-profile.children.cycles-pp.schedule_timeout
      0.15 ±  5%      +0.1        0.22 ±  4%  perf-profile.children.cycles-pp.xlog_write
      0.26 ±  2%      +0.1        0.32 ±  3%  perf-profile.children.cycles-pp.xlog_cil_push_work
      0.00 ±387%      +0.1        0.08 ±  9%  perf-profile.children.cycles-pp.xlog_state_get_iclog_space
      1.79 ±  2%      +0.1        1.88 ±  2%  perf-profile.children.cycles-pp.process_one_work
      2.04            +0.1        2.14 ±  2%  perf-profile.children.cycles-pp.worker_thread
      3.22 ±  4%      +0.3        3.53 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock_irq
     10.42            +0.4       10.79 ±  2%  perf-profile.children.cycles-pp.xlog_wait_on_iclog
      4.48 ±  3%      +0.4        4.86 ±  3%  perf-profile.children.cycles-pp.md_flush_request
      4.66 ±  3%      +0.4        5.04 ±  3%  perf-profile.children.cycles-pp.submit_bio_wait
      4.71 ±  3%      +0.4        5.10 ±  3%  perf-profile.children.cycles-pp.blkdev_issue_flush
     10.21            +0.9       11.14        perf-profile.children.cycles-pp.xfs_log_force_lsn
      0.40 ±  3%      -0.0        0.36 ±  3%  perf-profile.self.cycles-pp.load_balance


                                                                                
                                 aim7.jobs-per-min                              
                                                                                
  5600 +--------------------------------------------------------------------+   
       |OO                                                                  |   
  5550 |-+                                                                  |   
       |       OO O   OO   O       O       O    O                           |   
  5500 |-+O      O OOO  OOO  OO OO OOO  O O OO O OO  OO O                   |   
       |   OOO    O  O O  O O  O OO   OO O          O  O O                  |   
  5450 |-+  O OO                       O OO  OOO  OO OO                     |   
       |        +                         + O   O                           |   
  5400 |-+  +   :+ ++                     :   +   ++++    O          +      |   
       |+++++++: ++::   + +     +++++ ++++: ++ :++++ :: +  ++   ++  +::+++ +|   
  5350 |++ +  ++  + ++ ++ ++++++++ +++++ :: :  ++    +++++++++++ ++ +:+   +:|   
       |              +  + + ++          ++ +                     +: +    + |   
  5300 |-+                                 +                       +        |   
       |                                                                    |   
  5250 +--------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


View attachment "config-5.9.0-rc1-00115-gec73240b1627c" of type "text/plain" (170215 bytes)

View attachment "job-script" of type "text/plain" (8038 bytes)

View attachment "job.yaml" of type "text/plain" (5405 bytes)

View attachment "reproduce" of type "text/plain" (1020 bytes)

Powered by blists - more mailing lists