[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202211202030.b9842879-oliver.sang@intel.com>
Date: Sun, 20 Nov 2022 21:01:20 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Jens Axboe <axboe@...nel.dk>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
Christiano Haesbaert <haesbaert@...sbaert.org>,
<linux-kernel@...r.kernel.org>, <io-uring@...r.kernel.org>,
<ying.huang@...el.com>, <feng.tang@...el.com>,
<zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [linus:master] [io_uring] 46a525e199: fio.read_iops 22.4%
improvement
Greeting,
FYI, we noticed a 22.4% improvement of fio.read_iops due to commit:
commit: 46a525e199e4037516f7e498c18f065b09df32ac ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: fio-basic
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
with following parameters:
disk: 2pmem
fs: ext2
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: read
bs: 2M
ioengine: io_uring
test_size: 200G
cpufreq_governor: performance
test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
2M/gcc-11/performance/2pmem/ext2/io_uring/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/read/lkp-csl-2sp7/200G/fio-basic/tb
commit:
b000145e99 ("io_uring/rw: defer fsnotify calls to task context")
46a525e199 ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
b000145e99078094 46a525e199e4037516f7e498c18
---------------- ---------------------------
%stddev %change %stddev
\ | \
21.62 ± 28% +13.5 35.14 ± 17% fio.latency_100ms%
0.21 ± 7% -0.0 0.17 ± 6% fio.latency_20ms%
11.42 ± 20% -7.1 4.34 ± 21% fio.latency_500ms%
0.68 ± 62% -0.6 0.06 ± 11% fio.latency_750ms%
20312 ± 5% +22.4% 24856 fio.read_bw_MBps
2.66e+08 ± 7% -34.2% 1.751e+08 ± 5% fio.read_clat_90%_us
3.305e+08 ± 6% -28.9% 2.349e+08 ± 9% fio.read_clat_95%_us
4.549e+08 ± 14% -33.7% 3.016e+08 ± 2% fio.read_clat_99%_us
1.515e+08 ± 5% -18.5% 1.235e+08 fio.read_clat_mean_us
84751148 ± 10% -38.1% 52462994 ± 8% fio.read_clat_stddev
10156 ± 5% +22.4% 12428 fio.read_iops
34028 ± 4% -17.5% 28081 ± 4% fio.read_slat_mean_us
207091 ± 7% +23.0% 254784 ± 5% fio.read_slat_stddev
1322240 ± 6% +39.5% 1844804 ± 2% fio.time.involuntary_context_switches
152665 ± 3% +415.2% 786457 fio.time.minor_page_faults
8350 ± 2% +6.6% 8904 fio.time.percent_of_cpu_this_job_got
16651 ± 2% +6.8% 17775 fio.time.system_time
96.17 ± 8% -14.3% 82.42 ± 5% fio.time.user_time
2032503 ± 5% +22.4% 2487213 fio.workload
715266 ± 13% +22.3% 874621 ± 6% numa-numastat.node1.numa_hit
715333 ± 13% +22.2% 874330 ± 6% numa-vmstat.node1.numa_hit
1.966e+09 ± 17% -52.7% 9.303e+08 ± 12% cpuidle..time
4301667 ± 16% -53.0% 2021131 ± 20% cpuidle..usage
10.98 ± 15% -49.4% 5.56 ± 11% iostat.cpu.idle
88.19 +6.4% 93.80 iostat.cpu.system
9.76 ± 17% -53.3% 4.56 ± 15% turbostat.CPU%c1
57.07 +4.1% 59.40 turbostat.RAMWatt
10.12 ± 17% -5.5 4.66 ± 13% mpstat.cpu.all.idle%
0.25 ± 12% -0.1 0.12 ± 6% mpstat.cpu.all.soft%
0.83 ± 10% -0.2 0.65 ± 6% mpstat.cpu.all.usr%
10.60 ± 15% -50.9% 5.20 ± 16% vmstat.cpu.id
159.40 ± 4% +27.7% 203.50 ± 2% vmstat.procs.r
15208 ± 4% +15.1% 17506 vmstat.system.cs
20749 +2.9% 21355 proc-vmstat.nr_kernel_stack
225429 ± 7% +282.2% 861493 proc-vmstat.numa_hint_faults
136434 ± 8% +456.6% 759435 proc-vmstat.numa_hint_faults_local
1090679 +2.6% 1119565 proc-vmstat.numa_hit
6836 ± 3% +520.3% 42407 proc-vmstat.numa_huge_pte_updates
1005727 +2.6% 1031985 proc-vmstat.numa_local
775153 ± 8% +31.7% 1020704 ± 8% proc-vmstat.numa_pages_migrated
3833507 ± 3% +490.7% 22643711 proc-vmstat.numa_pte_updates
1090769 +2.5% 1118337 proc-vmstat.pgalloc_normal
979930 +65.5% 1621767 proc-vmstat.pgfault
895767 ± 15% +7.3% 961415 proc-vmstat.pgfree
775153 ± 8% +31.7% 1020704 ± 8% proc-vmstat.pgmigrate_success
1462 ± 8% +31.9% 1927 ± 8% proc-vmstat.thp_migration_success
98.27 +0.4 98.64 perf-profile.calltrace.cycles-pp.ret_from_fork
97.37 +0.4 97.73 perf-profile.calltrace.cycles-pp.io_read.io_issue_sqe.io_wq_submit_work.io_worker_handle_work.io_wqe_worker
98.23 +0.4 98.60 perf-profile.calltrace.cycles-pp.io_worker_handle_work.io_wqe_worker.ret_from_fork
98.23 +0.4 98.61 perf-profile.calltrace.cycles-pp.io_wqe_worker.ret_from_fork
1.13 ± 12% -0.3 0.86 ± 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.27 ± 6% -0.0 0.23 ± 4% perf-profile.children.cycles-pp.try_to_wake_up
0.28 ± 6% -0.0 0.24 ± 3% perf-profile.children.cycles-pp.__wake_up_common
0.28 ± 6% -0.0 0.24 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
0.26 ± 6% -0.0 0.22 ± 5% perf-profile.children.cycles-pp.autoremove_wake_function
0.08 ± 6% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.select_task_rq
0.00 +0.3 0.33 ± 4% perf-profile.children.cycles-pp.io_assign_current_work
98.27 +0.4 98.64 perf-profile.children.cycles-pp.ret_from_fork
98.23 +0.4 98.60 perf-profile.children.cycles-pp.io_worker_handle_work
98.23 +0.4 98.61 perf-profile.children.cycles-pp.io_wqe_worker
131360 ± 8% +37.7% 180837 ± 3% sched_debug.cfs_rq:/.MIN_vruntime.avg
1.49 ± 5% +26.4% 1.88 sched_debug.cfs_rq:/.h_nr_running.avg
195967 ± 10% -22.4% 152129 ± 10% sched_debug.cfs_rq:/.load.stddev
144.62 ± 20% +44.6% 209.08 ± 9% sched_debug.cfs_rq:/.load_avg.min
131362 ± 8% +37.7% 180842 ± 3% sched_debug.cfs_rq:/.max_vruntime.avg
201691 ± 7% +13.2% 228371 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
1.32 ± 5% +25.7% 1.66 sched_debug.cfs_rq:/.nr_running.avg
0.54 ± 5% +14.1% 0.62 ± 3% sched_debug.cfs_rq:/.nr_running.stddev
1584 ± 5% +25.2% 1983 sched_debug.cfs_rq:/.runnable_avg.avg
2.63 ± 13% +77.8% 4.68 ± 13% sched_debug.cfs_rq:/.spread.avg
5.35 ± 15% +36.7% 7.31 ± 9% sched_debug.cfs_rq:/.spread.stddev
1549299 ± 6% -31.1% 1067585 ± 7% sched_debug.cpu.avg_idle.avg
3658796 ± 10% -30.2% 2554830 ± 15% sched_debug.cpu.avg_idle.max
276671 ± 18% -41.5% 161849 ± 20% sched_debug.cpu.avg_idle.min
718901 ± 13% -31.3% 494161 ± 11% sched_debug.cpu.avg_idle.stddev
872915 ± 9% -36.4% 555389 ± 3% sched_debug.cpu.max_idle_balance_cost.avg
1335161 ± 12% -42.1% 773347 ± 15% sched_debug.cpu.max_idle_balance_cost.max
602677 ± 9% -17.0% 500000 sched_debug.cpu.max_idle_balance_cost.min
166368 ± 11% -64.5% 59141 ± 46% sched_debug.cpu.max_idle_balance_cost.stddev
1.49 ± 5% +26.0% 1.88 ± 2% sched_debug.cpu.nr_running.avg
16805 ± 4% +16.8% 19629 sched_debug.cpu.nr_switches.avg
9904 ± 6% +17.8% 11670 ± 2% sched_debug.cpu.nr_switches.min
4092 ± 9% +25.8% 5147 ± 9% sched_debug.cpu.nr_switches.stddev
40.89 +1.2% 41.37 perf-stat.i.MPKI
2.899e+09 ± 5% +20.7% 3.498e+09 perf-stat.i.branch-instructions
0.20 ± 4% -0.0 0.16 ± 4% perf-stat.i.branch-miss-rate%
6.679e+08 ± 5% +22.1% 8.153e+08 perf-stat.i.cache-misses
7.021e+08 ± 5% +21.9% 8.556e+08 perf-stat.i.cache-references
15269 ± 5% +15.1% 17569 perf-stat.i.context-switches
14.35 ± 5% -14.9% 12.21 perf-stat.i.cpi
2.381e+11 +5.8% 2.517e+11 perf-stat.i.cpu-cycles
372.98 ± 5% -16.3% 312.29 perf-stat.i.cycles-between-cache-misses
0.00 ± 9% -0.0 0.00 ± 6% perf-stat.i.dTLB-load-miss-rate%
2.976e+09 ± 5% +20.7% 3.59e+09 perf-stat.i.dTLB-loads
0.00 ± 2% +0.0 0.01 perf-stat.i.dTLB-store-miss-rate%
123809 ± 5% +48.9% 184353 perf-stat.i.dTLB-store-misses
2.799e+09 ± 5% +21.3% 3.394e+09 perf-stat.i.dTLB-stores
74.58 ± 3% +6.6 81.23 perf-stat.i.iTLB-load-miss-rate%
383587 ± 13% -37.2% 240807 ± 14% perf-stat.i.iTLB-loads
1.707e+10 ± 5% +20.9% 2.063e+10 perf-stat.i.instructions
0.07 ± 5% +15.1% 0.08 perf-stat.i.ipc
2.48 +5.8% 2.62 perf-stat.i.metric.GHz
1290 ± 9% -38.8% 790.15 ± 2% perf-stat.i.metric.K/sec
98.96 ± 5% +21.4% 120.15 perf-stat.i.metric.M/sec
4017 ± 2% +80.4% 7248 perf-stat.i.minor-faults
46.99 ± 4% -10.2 36.84 ± 3% perf-stat.i.node-load-miss-rate%
36469552 ± 11% +20.0% 43763092 perf-stat.i.node-loads
25.92 ± 9% -22.8 3.08 ± 14% perf-stat.i.node-store-miss-rate%
41252120 ± 10% -89.2% 4473071 ± 20% perf-stat.i.node-store-misses
1.465e+08 ± 9% +38.1% 2.022e+08 ± 2% perf-stat.i.node-stores
4017 ± 2% +80.4% 7248 perf-stat.i.page-faults
0.16 ± 5% -0.0 0.14 ± 4% perf-stat.overall.branch-miss-rate%
14.08 ± 6% -13.1% 12.24 perf-stat.overall.cpi
359.74 ± 6% -14.0% 309.45 perf-stat.overall.cycles-between-cache-misses
0.00 ± 10% -0.0 0.00 ± 6% perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 2% +0.0 0.01 perf-stat.overall.dTLB-store-miss-rate%
64.21 ± 4% +11.1 75.32 ± 2% perf-stat.overall.iTLB-load-miss-rate%
26608 ± 5% +14.3% 30425 ± 6% perf-stat.overall.instructions-per-iTLB-miss
0.07 ± 5% +14.7% 0.08 perf-stat.overall.ipc
44.39 ± 2% -7.9 36.50 ± 3% perf-stat.overall.node-load-miss-rate%
22.12 ± 8% -20.1 2.06 ± 19% perf-stat.overall.node-store-miss-rate%
2.881e+09 ± 5% +20.9% 3.484e+09 perf-stat.ps.branch-instructions
6.644e+08 ± 5% +22.3% 8.128e+08 perf-stat.ps.cache-misses
6.983e+08 ± 5% +22.1% 8.529e+08 perf-stat.ps.cache-references
15070 ± 5% +15.5% 17412 perf-stat.ps.context-switches
2.383e+11 +5.6% 2.515e+11 perf-stat.ps.cpu-cycles
2.957e+09 ± 5% +20.9% 3.576e+09 perf-stat.ps.dTLB-loads
123050 ± 5% +49.6% 184070 perf-stat.ps.dTLB-store-misses
2.783e+09 ± 5% +21.5% 3.382e+09 perf-stat.ps.dTLB-stores
358319 ± 14% -37.8% 222740 ± 12% perf-stat.ps.iTLB-loads
1.697e+10 ± 5% +21.1% 2.055e+10 perf-stat.ps.instructions
4008 ± 2% +79.2% 7183 perf-stat.ps.minor-faults
36583387 ± 11% +19.6% 43742068 perf-stat.ps.node-loads
41398082 ± 11% -89.7% 4252522 ± 20% perf-stat.ps.node-store-misses
1.459e+08 ± 9% +38.5% 2.021e+08 ± 2% perf-stat.ps.node-stores
4008 ± 2% +79.2% 7183 perf-stat.ps.page-faults
3.428e+12 ± 5% +21.2% 4.155e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-6.0.0-rc6-00054-g46a525e199e4" of type "text/plain" (164404 bytes)
View attachment "job-script" of type "text/plain" (8608 bytes)
View attachment "job.yaml" of type "text/plain" (5877 bytes)
View attachment "reproduce" of type "text/plain" (942 bytes)
Powered by blists - more mailing lists