[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210414141846.GC28254@xsang-OptiPlex-9020>
Date: Wed, 14 Apr 2021 22:18:46 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Josef Bacik <josef@...icpanda.com>
Cc: David Sterba <dsterba@...e.com>,
Nikolay Borisov <nborisov@...e.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
zhengjun.xing@...el.com, linux-btrfs@...r.kernel.org
Subject: [btrfs] 5deb17e18e: fio.write_iops 1.2% improvement
Greeting,
FYI, we noticed a 1.2% improvement of fio.write_iops due to commit:
commit: 5deb17e18e27a3502f21581ba4d086e762b86b31 ("btrfs: track ordered bytes instead of just dio ordered bytes")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: fio-basic
on test machine: 16 threads Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz with 32G memory
with following parameters:
runtime: 300s
disk: 1HDD
fs: btrfs
nr_task: 1
test_size: 128G
rw: randwrite
bs: 4k
ioengine: sync
cpufreq_governor: performance
ucode: 0xde
test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/ucode:
4k/gcc-9/performance/1HDD/btrfs/sync/x86_64-rhel-8.3/1/debian-10.4-x86_64-20200603.cgz/300s/randwrite/lkp-cfl-e1/128G/fio-basic/0xde
commit:
ac1ea10e75 ("btrfs: add a trace point for reserve tickets")
5deb17e18e ("btrfs: track ordered bytes instead of just dio ordered bytes")
ac1ea10e757a57fb 5deb17e18e27a3502f21581ba4d
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.10 ± 6% -0.0 0.05 ± 6% fio.latency_1000us%
0.01 ± 11% +0.0 0.05 ± 5% fio.latency_100ms%
7.51 ± 3% -3.8 3.74 ± 4% fio.latency_100us%
5.26 -0.7 4.59 fio.latency_10ms%
14.97 ± 3% +1.3 16.22 fio.latency_10us%
0.02 ± 4% +0.0 0.03 fio.latency_2000ms%
0.80 -0.2 0.62 fio.latency_20ms%
4.41 -0.9 3.50 ± 4% fio.latency_20us%
1.00 ± 5% -0.2 0.84 fio.latency_250us%
8.89 ± 2% +4.2 13.07 fio.latency_2ms%
7.38 ± 3% +3.0 10.41 fio.latency_4ms%
0.06 ± 11% -0.0 0.01 ± 11% fio.latency_500ms%
0.17 ± 6% -0.0 0.12 ± 4% fio.latency_50ms%
4.73 ± 3% -2.2 2.57 ± 4% fio.latency_50us%
1693210 +1.3% 1715361 fio.time.file_system_outputs
211651 +1.3% 214420 fio.workload
2.75 +1.2% 2.78 fio.write_bw_MBps
2916352 -5.7% 2749235 fio.write_clat_90%_us
4642133 -10.1% 4171366 fio.write_clat_95%_us
11315882 ± 2% -26.3% 8336179 fio.write_clat_99%_us
1419243 -1.2% 1402015 fio.write_clat_mean_us
22694034 +4.5% 23720487 fio.write_clat_stddev
704.13 +1.2% 712.73 fio.write_iops
9.06 -6.9% 8.44 iostat.cpu.iowait
52914 ± 4% -23.5% 40465 ± 4% softirqs.TIMER
5529 +1.6% 5617 vmstat.io.bo
6263 +4.3% 6530 vmstat.system.cs
108039 +15.0% 124211 meminfo.Active
94360 +16.5% 109929 meminfo.Active(file)
2835 ± 3% +50.0% 4253 ± 3% meminfo.Dirty
2475 ± 2% +11.3% 2755 ± 2% slabinfo.btrfs_extent_buffer.active_objs
2478 ± 2% +11.4% 2761 ± 2% slabinfo.btrfs_extent_buffer.num_objs
2724 ± 4% +8.9% 2966 ± 6% slabinfo.fsnotify_mark_connector.num_objs
2824 ± 74% -67.1% 928.20 ± 63% interrupts.133:IR-PCI-MSI.2097154-edge.eth1-TxRx-1
236.67 ± 6% -22.3% 184.00 ± 26% interrupts.CPU13.NMI:Non-maskable_interrupts
236.67 ± 6% -22.3% 184.00 ± 26% interrupts.CPU13.PMI:Performance_monitoring_interrupts
2824 ± 74% -67.1% 928.20 ± 63% interrupts.CPU2.133:IR-PCI-MSI.2097154-edge.eth1-TxRx-1
23440 +17.5% 27544 proc-vmstat.nr_active_file
401066 -1.8% 393830 proc-vmstat.nr_dirtied
694.67 ± 3% +51.0% 1048 ± 3% proc-vmstat.nr_dirty
173383 +1.3% 175566 proc-vmstat.nr_inactive_file
301375 -2.7% 293337 proc-vmstat.nr_written
23440 +17.5% 27544 proc-vmstat.nr_zone_active_file
173383 +1.3% 175566 proc-vmstat.nr_zone_inactive_file
1105434 +1.2% 1118728 proc-vmstat.numa_hit
1105434 +1.2% 1118728 proc-vmstat.numa_local
1680626 +1.8% 1710624 proc-vmstat.pgpgout
79275921 +3.7% 82180925 perf-stat.i.branch-instructions
54448091 +4.4% 56830814 perf-stat.i.cache-references
6302 +4.1% 6560 perf-stat.i.context-switches
46476020 +1.5% 47154582 perf-stat.i.dTLB-stores
3.912e+08 +3.9% 4.063e+08 perf-stat.i.instructions
0.53 +3.0% 0.54 perf-stat.i.metric.K/sec
17.37 +2.8% 17.85 perf-stat.i.metric.M/sec
557137 +2.8% 572549 perf-stat.overall.path-length
79095771 +3.7% 81994540 perf-stat.ps.branch-instructions
54269739 +4.4% 56645144 perf-stat.ps.cache-references
6280 +4.1% 6539 perf-stat.ps.context-switches
46349077 +1.5% 47028173 perf-stat.ps.dTLB-stores
3.903e+08 +3.9% 4.053e+08 perf-stat.ps.instructions
1.179e+11 +4.1% 1.228e+11 perf-stat.total.instructions
99.55 ± 5% +27.1% 126.57 ± 14% sched_debug.cfs_rq:/.load_avg.avg
272.17 ± 2% +12.1% 305.09 ± 5% sched_debug.cfs_rq:/.load_avg.stddev
2.13 ±223% +940.0% 22.16 ± 52% sched_debug.cfs_rq:/.removed.load_avg.avg
34.10 ±223% +702.2% 273.53 ± 49% sched_debug.cfs_rq:/.removed.load_avg.max
8.25 ±223% +794.7% 73.85 ± 50% sched_debug.cfs_rq:/.removed.load_avg.stddev
0.67 ±223% +1186.0% 8.57 ± 52% sched_debug.cfs_rq:/.removed.runnable_avg.avg
10.67 ±223% +963.4% 113.43 ± 54% sched_debug.cfs_rq:/.removed.runnable_avg.max
2.58 ±223% +1054.3% 29.80 ± 52% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
0.48 ±223% +1655.9% 8.49 ± 52% sched_debug.cfs_rq:/.removed.util_avg.avg
7.73 ±223% +1349.1% 112.07 ± 55% sched_debug.cfs_rq:/.removed.util_avg.max
1.87 ±223% +1474.3% 29.47 ± 53% sched_debug.cfs_rq:/.removed.util_avg.stddev
29.13 ± 17% -28.8% 20.75 ± 28% sched_debug.cfs_rq:/.util_est_enqueued.avg
117919 ± 8% +21.3% 142999 ± 8% sched_debug.cpu.nr_switches.max
19720 ± 11% +34.5% 26518 ± 8% sched_debug.cpu.nr_switches.stddev
32.58 ± 44% -51.4% 15.83 ± 14% sched_debug.cpu.nr_uninterruptible.max
-37.74 -38.3% -23.30 sched_debug.cpu.nr_uninterruptible.min
14.75 ± 24% -32.9% 9.89 ± 13% sched_debug.cpu.nr_uninterruptible.stddev
0.05 ± 3% -17.7% 0.04 ± 7% perf-sched.sch_delay.avg.ms.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
0.01 ± 9% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
0.08 ± 9% +52.3% 0.12 ± 29% perf-sched.sch_delay.max.ms.rwsem_down_read_slowpath.__btrfs_tree_read_lock.btrfs_read_lock_root_node.btrfs_search_slot
0.08 ± 13% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
1.28 ± 3% -10.7% 1.14 ± 2% perf-sched.wait_and_delay.avg.ms.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
0.28 ± 24% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
3526 ± 24% +58.6% 5592 ± 9% perf-sched.wait_and_delay.count.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
650.33 ± 23% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
1862 ± 24% +59.3% 2966 ± 11% perf-sched.wait_and_delay.count.schedule_timeout.wait_for_completion.btrfs_wait_ordered_extents.btrfs_wait_ordered_roots
3.43 ± 15% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
13.69 ± 5% +28.7% 17.62 ± 9% perf-sched.wait_and_delay.max.ms.schedule_timeout.wait_for_completion.btrfs_wait_ordered_extents.btrfs_wait_ordered_roots
1739 ± 15% +70.3% 2961 ± 24% perf-sched.wait_and_delay.max.ms.wait_current_trans.start_transaction.btrfs_finish_ordered_io.btrfs_work_helper
1.23 ± 3% -10.5% 1.10 ± 3% perf-sched.wait_time.avg.ms.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
0.27 ± 25% -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
0.01 ± 24% -37.4% 0.00 ± 24% perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.write_all_supers
3.40 ± 15% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
0.03 ± 56% -70.6% 0.01 ± 63% perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.write_all_supers
13.66 ± 5% +28.8% 17.59 ± 9% perf-sched.wait_time.max.ms.schedule_timeout.wait_for_completion.btrfs_wait_ordered_extents.btrfs_wait_ordered_roots
1738 ± 15% +70.3% 2961 ± 24% perf-sched.wait_time.max.ms.wait_current_trans.start_transaction.btrfs_finish_ordered_io.btrfs_work_helper
5.53 ± 4% +0.5 6.04 ± 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
5.65 ± 4% +0.5 6.19 ± 4% perf-profile.calltrace.cycles-pp.ret_from_fork
5.65 ± 4% +0.5 6.19 ± 4% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
1.15 ± 7% -0.3 0.88 ± 21% perf-profile.children.cycles-pp.sched_clock_cpu
0.14 ± 34% -0.1 0.05 ± 99% perf-profile.children.cycles-pp.__switch_to_asm
0.06 ± 51% +0.1 0.12 ± 52% perf-profile.children.cycles-pp.filename_lookup
0.01 ±223% +0.1 0.12 ± 29% perf-profile.children.cycles-pp.___perf_sw_event
0.21 ± 30% +0.2 0.38 ± 22% perf-profile.children.cycles-pp.dequeue_task_fair
0.16 ± 26% +0.2 0.33 ± 20% perf-profile.children.cycles-pp.dequeue_entity
0.67 ± 12% +0.2 0.91 ± 15% perf-profile.children.cycles-pp.schedule
1.88 ± 5% +0.3 2.19 ± 11% perf-profile.children.cycles-pp.tick_irq_enter
1.10 ± 9% +0.3 1.41 ± 7% perf-profile.children.cycles-pp.do_writepages
5.53 ± 4% +0.5 6.04 ± 4% perf-profile.children.cycles-pp.worker_thread
5.67 ± 4% +0.5 6.20 ± 4% perf-profile.children.cycles-pp.ret_from_fork
5.65 ± 4% +0.5 6.19 ± 4% perf-profile.children.cycles-pp.kthread
3.12 ± 13% -0.6 2.56 ± 11% perf-profile.self.cycles-pp.cpuidle_enter_state
0.14 ± 34% -0.1 0.05 ± 99% perf-profile.self.cycles-pp.__switch_to_asm
0.22 ± 24% -0.1 0.15 ± 33% perf-profile.self.cycles-pp.__softirqentry_text_start
0.01 ±223% +0.1 0.09 ± 46% perf-profile.self.cycles-pp.___perf_sw_event
0.16 ± 45% +0.1 0.25 ± 18% perf-profile.self.cycles-pp.update_rq_clock
0.15 ± 24% +0.1 0.27 ± 30% perf-profile.self.cycles-pp.update_blocked_averages
fio.write_iops
800 +---------------------------------------------------------------------+
| O O O O O O O O O O O O |
700 |.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+ OO |
600 |-+ |
| |
500 |-+ |
| |
400 |-+ |
| |
300 |-+ |
200 |-+ |
| |
100 |-+ |
| |
0 +---------------------------------------------------------------------+
fio.latency_100us_
9 +-----------------------------------------------------------------------+
| + .+. .+. |
8 |.+. +.+.+.+.+ + .+. .+.+ ++ +.+.++.+.+.+.++.+.+.+. +.+.+.+.+ |
7 |-+ + + +.++ + |
| |
6 |-+ |
5 |-+ |
| |
4 |-+ O O O O O O O O O O O O |
3 |-O OO O O O O O OO O O O O O |
| OO O O O OO O O O O |
2 |-+ |
1 |-+ |
| |
0 +-----------------------------------------------------------------------+
fio.latency_2ms_
14 +----------------------------------------------------------------------+
| O O O O O O O OO O O O O O OO O OO O |
12 |-O OO O O O OO O O O O O OO O |
| |
10 |-+ |
|.+.++.+.+. +.+.+. .++.+.+. .+. .++. .++. .+ .+.+.+. +.+.+.+ |
8 |-+ + + ++ +.+ +.+ +.+ + + |
| |
6 |-+ |
| |
4 |-+ |
| |
2 |-+ |
| |
0 +----------------------------------------------------------------------+
fio.latency_4ms_
14 +----------------------------------------------------------------------+
| O O O OO O O OO O O |
12 |-O O O O OO O O O OO O O O O O O O O |
| O O |
10 |-+ O O O OO O |
| |
8 |-+.+ .+. .++. .+. |
|.+ + +.++.+.+.+ +.+.++.+.+.+.++.+.+.++.+.+.++.+.+ ++.+.+.+ |
6 |-+ |
| |
4 |-+ |
| |
2 |-+ |
| |
0 +----------------------------------------------------------------------+
fio.latency_500ms_
0.07 +--------------------------------------------------------------------+
| +. :+ + |
0.06 |-+ +.+.+ + + .+.+. +. .+ : + + |
|.+ +.+. : + +. .++. .+.+ .+.+.++ + + + +. : +. |
0.05 |-+:+ + + + + + + + |
| + |
0.04 |-+ |
| |
0.03 |-+ |
| |
0.02 |-+ O |
| O O O OO |
0.01 |-OO O O O O O O O O O O O O O |
| |
0 +--------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks,
Oliver Sang
View attachment "config-5.11.0-rc7-00063-g5deb17e18e27" of type "text/plain" (172403 bytes)
View attachment "job-script" of type "text/plain" (8022 bytes)
View attachment "job.yaml" of type "text/plain" (5489 bytes)
View attachment "reproduce" of type "text/plain" (710 bytes)
Powered by blists - more mailing lists