[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200907083709.GH31308@shao2-debian>
Date: Mon, 7 Sep 2020 16:37:09 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Josh Don <joshdon@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Venkatesh Pallipadi <venki@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
feng.tang@...el.com, zhengjun.xing@...el.com,
aubrey.li@...ux.intel.com, yu.c.chen@...el.com
Subject: [sched/fair] ec73240b16: aim7.jobs-per-min 2.3% improvement
Greeting,
FYI, we noticed a 2.3% improvement of aim7.jobs-per-min due to commit:
commit: ec73240b1627cddfd7cef018c7fa1c32e64a721e ("sched/fair: Ignore cache hotness for SMT migration")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
in testcase: aim7
on test machine: 192 threads Cooper Lake with 128G memory
with following parameters:
disk: 4BRD_12G
md: RAID1
fs: xfs
test: sync_disk_rw
load: 300
cpufreq_governor: performance
ucode: 0x86000017
test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/4BRD_12G/xfs/x86_64-rhel-8.3/300/RAID1/debian-10.4-x86_64-20200603.cgz/lkp-cpx-4s1/sync_disk_rw/aim7/0x86000017
commit:
5f4a1c4ea4 ("sched/topology: Mark SD_NUMA as SDF_NEEDS_GROUPS")
ec73240b16 ("sched/fair: Ignore cache hotness for SMT migration")
5f4a1c4ea44728aa ec73240b1627cddfd7cef018c7f
---------------- ---------------------------
%stddev %change %stddev
\ | \
5365 +2.3% 5488 aim7.jobs-per-min
335.52 -2.2% 327.98 aim7.time.elapsed_time
335.52 -2.2% 327.98 aim7.time.elapsed_time.max
5014 ± 2% -7.5% 4638 ± 2% aim7.time.system_time
51059388 -1.5% 50309088 aim7.time.voluntary_context_switches
49.81 ± 2% +4.0% 51.78 iostat.cpu.iowait
9.24 -4.3% 8.84 ± 2% iostat.cpu.system
0.07 ± 24% +138.0% 0.16 ± 31% sched_debug.cfs_rq:/.nr_spread_over.avg
0.20 ± 28% +77.0% 0.36 ± 22% sched_debug.cfs_rq:/.nr_spread_over.stddev
49.38 ± 2% +4.1% 51.38 vmstat.cpu.wa
223673 +2.3% 228851 vmstat.io.bo
298749 +2.1% 304962 proc-vmstat.nr_file_pages
241703 +2.8% 248552 proc-vmstat.nr_unevictable
241703 +2.8% 248552 proc-vmstat.nr_zone_unevictable
5330 -9.3% 4835 ± 2% proc-vmstat.nr_zone_write_pending
1330084 -2.3% 1298931 proc-vmstat.pgfault
1.63 -0.0 1.59 perf-stat.i.branch-miss-rate%
27.64 +0.9 28.52 perf-stat.i.cache-miss-rate%
1197 -4.3% 1146 perf-stat.i.cycles-between-cache-misses
34968537 +4.0% 36353165 perf-stat.i.node-load-misses
3229831 +4.0% 3357767 perf-stat.i.node-loads
1.63 -0.0 1.58 perf-stat.overall.branch-miss-rate%
27.72 +0.9 28.63 perf-stat.overall.cache-miss-rate%
1161 -4.8% 1105 perf-stat.overall.cycles-between-cache-misses
34868866 +4.0% 36247556 perf-stat.ps.node-load-misses
3220778 +4.0% 3348190 perf-stat.ps.node-loads
15.56 ± 3% -2.6 12.92 ± 4% perf-profile.calltrace.cycles-pp.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write.vfs_write
9.81 ± 4% -1.8 7.97 ± 5% perf-profile.calltrace.cycles-pp._raw_spin_lock.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
9.78 ± 4% -1.8 7.95 ± 5% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
5.69 ± 2% -0.8 4.91 ± 2% perf-profile.calltrace.cycles-pp.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
4.01 ± 2% -0.5 3.49 ± 2% perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
3.99 ± 2% -0.5 3.47 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync
3.68 ± 2% -0.4 3.29 ± 4% perf-profile.calltrace.cycles-pp.remove_wait_queue.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
3.67 ± 2% -0.4 3.28 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync
3.65 ± 2% -0.4 3.26 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.__xfs_log_force_lsn.xfs_log_force_lsn
5.69 -0.4 5.34 perf-profile.calltrace.cycles-pp.load_balance.newidle_balance.pick_next_task_fair.__sched_text_start.schedule
1.63 ± 3% -0.3 1.37 ± 2% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync
1.64 ± 3% -0.3 1.39 ± 2% perf-profile.calltrace.cycles-pp.schedule.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
1.52 ± 3% -0.2 1.27 ± 2% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.xlog_wait_on_iclog.__xfs_log_force_lsn
1.51 ± 3% -0.2 1.27 ± 2% perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__sched_text_start.schedule.xlog_wait_on_iclog
2.89 ± 2% -0.2 2.69 ± 2% perf-profile.calltrace.cycles-pp.schedule.io_schedule.wait_on_page_bit.__filemap_fdatawait_range.file_write_and_wait_range
2.90 ± 2% -0.2 2.70 ± 2% perf-profile.calltrace.cycles-pp.io_schedule.wait_on_page_bit.__filemap_fdatawait_range.file_write_and_wait_range.xfs_file_fsync
2.89 ± 2% -0.2 2.69 ± 2% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.io_schedule.wait_on_page_bit.__filemap_fdatawait_range
2.64 ± 2% -0.2 2.44 ± 2% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.io_schedule.wait_on_page_bit
2.64 ± 2% -0.2 2.44 ± 2% perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__sched_text_start.schedule.io_schedule
0.68 ± 2% +0.0 0.73 ± 2% perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.wait_for_completion.__flush_work.xlog_cil_force_lsn
0.68 ± 2% +0.0 0.73 ± 2% perf-profile.calltrace.cycles-pp.schedule_timeout.wait_for_completion.__flush_work.xlog_cil_force_lsn.xfs_log_force_lsn
0.68 ± 2% +0.0 0.73 ± 2% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.schedule_timeout.wait_for_completion.__flush_work
0.69 ± 2% +0.0 0.74 ± 2% perf-profile.calltrace.cycles-pp.wait_for_completion.__flush_work.xlog_cil_force_lsn.xfs_log_force_lsn.xfs_file_fsync
0.79 +0.0 0.84 ± 2% perf-profile.calltrace.cycles-pp.__flush_work.xlog_cil_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
1.79 ± 2% +0.1 1.88 ± 2% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
2.04 +0.1 2.14 ± 2% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
3.02 ± 4% +0.3 3.33 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request
3.08 ± 4% +0.3 3.39 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request.md_submit_bio
4.47 ± 3% +0.4 4.84 ± 2% perf-profile.calltrace.cycles-pp.md_flush_request.raid1_make_request.md_handle_request.md_submit_bio.submit_bio_noacct
4.62 ± 3% +0.4 4.99 ± 3% perf-profile.calltrace.cycles-pp.md_submit_bio.submit_bio_noacct.submit_bio.submit_bio_wait.blkdev_issue_flush
4.64 ± 3% +0.4 5.02 ± 3% perf-profile.calltrace.cycles-pp.submit_bio_noacct.submit_bio.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync
4.55 ± 3% +0.4 4.93 ± 3% perf-profile.calltrace.cycles-pp.md_handle_request.md_submit_bio.submit_bio_noacct.submit_bio.submit_bio_wait
4.64 ± 3% +0.4 5.02 ± 3% perf-profile.calltrace.cycles-pp.submit_bio.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_aio_write
4.66 ± 3% +0.4 5.04 ± 3% perf-profile.calltrace.cycles-pp.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
4.71 ± 3% +0.4 5.10 ± 3% perf-profile.calltrace.cycles-pp.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write.vfs_write
8.29 +0.6 8.86 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn
9.03 +0.9 9.92 perf-profile.calltrace.cycles-pp.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write
10.21 +0.9 11.14 perf-profile.calltrace.cycles-pp.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write.new_sync_write.vfs_write
4.33 ± 3% +1.1 5.42 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_log_force_lsn
4.34 ± 3% +1.1 5.44 ± 4% perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync
4.73 ± 2% +1.1 5.88 ± 3% perf-profile.calltrace.cycles-pp.xlog_wait_on_iclog.__xfs_log_force_lsn.xfs_log_force_lsn.xfs_file_fsync.xfs_file_buffered_aio_write
24.59 ± 2% -1.7 22.84 ± 2% perf-profile.children.cycles-pp.__xfs_log_force_lsn
11.23 ± 4% -1.6 9.59 ± 4% perf-profile.children.cycles-pp._raw_spin_lock
7.45 -0.3 7.16 perf-profile.children.cycles-pp.schedule
6.76 -0.3 6.47 perf-profile.children.cycles-pp.newidle_balance
6.85 -0.3 6.57 perf-profile.children.cycles-pp.pick_next_task_fair
6.62 -0.3 6.34 perf-profile.children.cycles-pp.load_balance
7.92 -0.3 7.64 perf-profile.children.cycles-pp.__sched_text_start
2.90 ± 2% -0.2 2.70 ± 2% perf-profile.children.cycles-pp.io_schedule
0.70 ± 2% +0.0 0.74 ± 2% perf-profile.children.cycles-pp.wait_for_completion
0.79 +0.0 0.84 ± 2% perf-profile.children.cycles-pp.__flush_work
0.70 ± 2% +0.1 0.75 ± 2% perf-profile.children.cycles-pp.schedule_timeout
0.15 ± 5% +0.1 0.22 ± 4% perf-profile.children.cycles-pp.xlog_write
0.26 ± 2% +0.1 0.32 ± 3% perf-profile.children.cycles-pp.xlog_cil_push_work
0.00 ±387% +0.1 0.08 ± 9% perf-profile.children.cycles-pp.xlog_state_get_iclog_space
1.79 ± 2% +0.1 1.88 ± 2% perf-profile.children.cycles-pp.process_one_work
2.04 +0.1 2.14 ± 2% perf-profile.children.cycles-pp.worker_thread
3.22 ± 4% +0.3 3.53 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irq
10.42 +0.4 10.79 ± 2% perf-profile.children.cycles-pp.xlog_wait_on_iclog
4.48 ± 3% +0.4 4.86 ± 3% perf-profile.children.cycles-pp.md_flush_request
4.66 ± 3% +0.4 5.04 ± 3% perf-profile.children.cycles-pp.submit_bio_wait
4.71 ± 3% +0.4 5.10 ± 3% perf-profile.children.cycles-pp.blkdev_issue_flush
10.21 +0.9 11.14 perf-profile.children.cycles-pp.xfs_log_force_lsn
0.40 ± 3% -0.0 0.36 ± 3% perf-profile.self.cycles-pp.load_balance
aim7.jobs-per-min
5600 +--------------------------------------------------------------------+
|OO |
5550 |-+ |
| OO O OO O O O O |
5500 |-+O O OOO OOO OO OO OOO O O OO O OO OO O |
| OOO O O O O O O OO OO O O O O |
5450 |-+ O OO O OO OOO OO OO |
| + + O O |
5400 |-+ + :+ ++ : + ++++ O + |
|+++++++: ++:: + + +++++ ++++: ++ :++++ :: + ++ ++ +::+++ +|
5350 |++ + ++ + ++ ++ ++++++++ +++++ :: : ++ +++++++++++ ++ +:+ +:|
| + + + ++ ++ + +: + + |
5300 |-+ + + |
| |
5250 +--------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.9.0-rc1-00115-gec73240b1627c" of type "text/plain" (170215 bytes)
View attachment "job-script" of type "text/plain" (8038 bytes)
View attachment "job.yaml" of type "text/plain" (5405 bytes)
View attachment "reproduce" of type "text/plain" (1020 bytes)
Powered by blists - more mailing lists