[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202407142212.5595ea54-oliver.sang@intel.com>
Date: Sun, 14 Jul 2024 22:49:55 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Zhang Yi <yi.zhang@...wei.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-ext4@...r.kernel.org>,
Theodore Ts'o <tytso@....edu>, Dave Chinner <david@...morbit.com>, Jan Kara
<jack@...e.cz>, Ritesh Harjani <ritesh.list@...il.com>,
<ying.huang@...el.com>, <feng.tang@...el.com>, <fengwei.yin@...el.com>,
<oliver.sang@...el.com>
Subject: [tytso-ext4:dev] [jbd2] 7c73ddb758: stress-ng.fiemap.ops_per_sec
565.3% improvement
Hello,
kernel test robot noticed a 565.3% improvement of stress-ng.fiemap.ops_per_sec on:
commit: 7c73ddb7589fb8ddb1136b6306dfb72089c81511 ("jbd2: speed up jbd2_transaction_committed()")
https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev
testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
disk: 1HDD
testtime: 60s
fs: ext4
test: fiemap
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240714/202407142212.5595ea54-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fiemap/stress-ng/60s
commit:
8262fe9a90 ("ext4: make ext4_da_map_blocks() buffer_head unaware")
7c73ddb758 ("jbd2: speed up jbd2_transaction_committed()")
8262fe9a902c8a7b 7c73ddb7589fb8ddb1136b6306d
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.651e+08 ± 27% -31.9% 1.125e+08 ± 6% cpuidle..time
1.15 ± 10% +39.7% 1.61 ± 2% iostat.cpu.user
364499 ± 18% +93.2% 704285 ± 21% numa-numastat.node1.local_node
391444 ± 13% +87.0% 731983 ± 19% numa-numastat.node1.numa_hit
0.01 ± 93% +0.0 0.04 ± 42% mpstat.cpu.all.iowait%
0.02 ± 11% -0.0 0.01 ± 10% mpstat.cpu.all.soft%
1.17 ± 10% +0.5 1.63 ± 2% mpstat.cpu.all.usr%
7.49 ± 26% +150.8% 18.78 ± 5% vmstat.procs.b
206521 ± 17% +533.8% 1309000 ± 2% vmstat.system.cs
161690 ± 3% +9.7% 177423 vmstat.system.in
295.83 ± 41% +89.5% 560.50 ± 13% perf-c2c.DRAM.local
2738 ± 54% +302.1% 11011 ± 22% perf-c2c.DRAM.remote
19455 ± 26% +159.8% 50553 ± 3% perf-c2c.HITM.local
2088 ± 64% +341.3% 9214 ± 25% perf-c2c.HITM.remote
21543 ± 28% +177.4% 59768 ± 4% perf-c2c.HITM.total
7686439 ± 19% +567.4% 51297323 ± 2% stress-ng.fiemap.ops
127116 ± 19% +565.3% 845744 ± 2% stress-ng.fiemap.ops_per_sec
13124477 ± 16% +538.2% 83760706 ± 2% stress-ng.time.involuntary_context_switches
16.98 ± 4% +123.7% 37.98 ± 2% stress-ng.time.user_time
68650 ± 2% +21.2% 83171 ± 6% stress-ng.time.voluntary_context_switches
3772338 +32.7% 5006703 meminfo.Cached
3979639 +32.0% 5253874 meminfo.Committed_AS
1184714 ± 10% +70.1% 2014850 ± 15% meminfo.Inactive
1149048 ± 10% +72.2% 1978594 ± 15% meminfo.Inactive(anon)
376153 ± 24% +148.7% 935654 ± 15% meminfo.Mapped
5932787 +22.2% 7248581 meminfo.Memused
564523 ± 7% +218.2% 1796085 meminfo.Shmem
5998794 +21.5% 7289549 meminfo.max_used_kB
816342 ±130% +203.1% 2474499 ± 40% numa-meminfo.node0.FilePages
1933517 ± 56% +86.2% 3599441 ± 31% numa-meminfo.node0.MemUsed
205709 ± 33% +120.9% 454343 ± 43% numa-meminfo.node1.Active
196984 ± 34% +126.3% 445786 ± 45% numa-meminfo.node1.Active(anon)
647051 ± 24% +120.3% 1425192 ± 29% numa-meminfo.node1.Inactive
632883 ± 25% +123.0% 1411032 ± 29% numa-meminfo.node1.Inactive(anon)
249614 ± 23% +145.7% 613193 ± 31% numa-meminfo.node1.Mapped
468647 ± 20% +206.9% 1438272 ± 28% numa-meminfo.node1.Shmem
204074 ±130% +203.1% 618583 ± 40% numa-vmstat.node0.nr_file_pages
48403 ± 36% +128.8% 110725 ± 43% numa-vmstat.node1.nr_active_anon
158779 ± 25% +122.3% 352949 ± 29% numa-vmstat.node1.nr_inactive_anon
62898 ± 23% +143.8% 153346 ± 31% numa-vmstat.node1.nr_mapped
116833 ± 19% +207.3% 359079 ± 28% numa-vmstat.node1.nr_shmem
48401 ± 36% +128.8% 110724 ± 43% numa-vmstat.node1.nr_zone_active_anon
158780 ± 25% +122.3% 352949 ± 29% numa-vmstat.node1.nr_zone_inactive_anon
389858 ± 13% +87.4% 730598 ± 19% numa-vmstat.node1.numa_hit
362913 ± 18% +93.7% 702900 ± 21% numa-vmstat.node1.numa_local
2712171 ± 7% -11.3% 2404936 ± 2% sched_debug.cfs_rq:/.avg_vruntime.avg
1407145 ± 17% -30.4% 979280 ± 19% sched_debug.cfs_rq:/.load.max
2712177 ± 7% -11.3% 2404936 ± 2% sched_debug.cfs_rq:/.min_vruntime.avg
547.78 ± 36% +106.3% 1130 ± 7% sched_debug.cfs_rq:/.util_est.avg
1863 ± 12% +54.1% 2871 ± 17% sched_debug.cfs_rq:/.util_est.max
59.08 ±100% +241.7% 201.92 ± 36% sched_debug.cfs_rq:/.util_est.min
392.55 ± 11% +38.5% 543.51 ± 11% sched_debug.cfs_rq:/.util_est.stddev
104511 ± 16% +518.1% 645974 ± 2% sched_debug.cpu.nr_switches.avg
204555 ± 32% +290.2% 798142 ± 6% sched_debug.cpu.nr_switches.max
12171 ± 65% +844.5% 114956 ± 47% sched_debug.cpu.nr_switches.min
945799 +32.6% 1254064 proc-vmstat.nr_file_pages
287060 ± 10% +72.7% 495669 ± 15% proc-vmstat.nr_inactive_anon
93971 ± 24% +150.2% 235154 ± 15% proc-vmstat.nr_mapped
141268 ± 8% +217.7% 448745 proc-vmstat.nr_shmem
25272 +2.4% 25873 proc-vmstat.nr_slab_reclaimable
287060 ± 10% +72.7% 495669 ± 15% proc-vmstat.nr_zone_inactive_anon
24933 ± 50% +200.6% 74949 ± 6% proc-vmstat.numa_hint_faults
9891 ± 50% +317.8% 41324 ± 8% proc-vmstat.numa_hint_faults_local
614783 ± 2% +72.8% 1062609 proc-vmstat.numa_hit
548520 ± 2% +81.6% 996296 proc-vmstat.numa_local
549876 ± 4% +14.4% 628860 proc-vmstat.numa_pte_updates
734634 ± 2% +60.7% 1180339 proc-vmstat.pgalloc_normal
388924 ± 3% +20.6% 468855 ± 2% proc-vmstat.pgfault
478242 ± 5% -19.5% 385183 ± 14% proc-vmstat.pgfree
3.169e+09 ± 8% +506.1% 1.921e+10 perf-stat.i.branch-instructions
0.65 ± 3% -0.1 0.53 ± 5% perf-stat.i.branch-miss-rate%
20491366 ± 5% +378.8% 98121036 ± 5% perf-stat.i.branch-misses
7452019 ± 37% +441.8% 40374999 ± 10% perf-stat.i.cache-misses
71298660 ± 3% +361.7% 3.292e+08 perf-stat.i.cache-references
227657 ± 19% +498.6% 1362709 ± 2% perf-stat.i.context-switches
14.22 ± 8% -83.9% 2.29 perf-stat.i.cpi
37069 ± 36% -84.2% 5866 ± 13% perf-stat.i.cycles-between-cache-misses
1.6e+10 ± 7% +516.8% 9.867e+10 perf-stat.i.instructions
0.08 ± 10% +479.2% 0.44 perf-stat.i.ipc
3.56 ± 19% +502.1% 21.45 ± 2% perf-stat.i.metric.K/sec
5090 ± 4% +25.5% 6387 ± 3% perf-stat.i.minor-faults
5090 ± 4% +25.5% 6387 ± 3% perf-stat.i.page-faults
0.06 ± 45% +637.9% 0.44 perf-stat.overall.ipc
2.598e+09 ± 45% +627.1% 1.889e+10 perf-stat.ps.branch-instructions
17015300 ± 45% +468.0% 96650879 ± 5% perf-stat.ps.branch-misses
5919193 ± 63% +570.7% 39699549 ± 10% perf-stat.ps.cache-misses
58527096 ± 44% +453.7% 3.241e+08 perf-stat.ps.cache-references
181655 ± 48% +640.9% 1345912 ± 2% perf-stat.ps.context-switches
1.311e+10 ± 45% +640.3% 9.704e+10 perf-stat.ps.instructions
4075 ± 44% +53.6% 6260 ± 3% perf-stat.ps.minor-faults
4075 ± 44% +53.6% 6260 ± 3% perf-stat.ps.page-faults
8.153e+11 ± 45% +637.0% 6.009e+12 perf-stat.total.instructions
85.99 -86.0 0.00 perf-profile.calltrace.cycles-pp.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
86.52 -84.1 2.43 perf-profile.calltrace.cycles-pp.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
46.15 ± 12% -46.2 0.00 perf-profile.calltrace.cycles-pp._raw_read_lock.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter
96.11 -10.9 85.20 perf-profile.calltrace.cycles-pp.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
4.58 ± 89% -4.6 0.00 perf-profile.calltrace.cycles-pp.queued_read_lock_slowpath.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter
97.15 -4.5 92.65 perf-profile.calltrace.cycles-pp.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
4.34 ± 89% -4.3 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_read_lock_slowpath.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report
97.78 -0.7 97.12 perf-profile.calltrace.cycles-pp.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.17 ±141% +0.6 0.72 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.18 ±141% +0.6 0.75 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.64 ± 16% +0.6 1.26 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.64 ± 16% +0.6 1.26 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
0.29 ±100% +0.6 0.94 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.70 ± 14% +0.9 1.60 perf-profile.calltrace.cycles-pp.__sched_yield
0.00 +1.0 0.95 perf-profile.calltrace.cycles-pp.ext4_sb_block_valid.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
0.00 +1.0 1.05 perf-profile.calltrace.cycles-pp._copy_to_user.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
0.00 +1.4 1.35 ± 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks
0.00 +1.4 1.41 perf-profile.calltrace.cycles-pp.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
0.00 +1.6 1.58 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report
0.00 +3.1 3.08 perf-profile.calltrace.cycles-pp.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
0.82 ± 6% +5.1 5.90 perf-profile.calltrace.cycles-pp.iomap_iter_advance.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
4.18 ± 17% +5.2 9.35 ± 3% perf-profile.calltrace.cycles-pp._raw_read_lock.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
0.62 ± 7% +52.1 52.70 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
8.69 ± 13% +68.0 76.73 perf-profile.calltrace.cycles-pp.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
9.30 ± 11% +71.3 80.61 perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
86.24 -85.8 0.42 perf-profile.children.cycles-pp.jbd2_transaction_committed
86.55 -83.8 2.70 perf-profile.children.cycles-pp.ext4_set_iomap
50.52 ± 12% -41.1 9.45 ± 3% perf-profile.children.cycles-pp._raw_read_lock
96.16 -10.6 85.61 perf-profile.children.cycles-pp.ext4_iomap_begin_report
4.60 ± 89% -4.6 0.00 perf-profile.children.cycles-pp.queued_read_lock_slowpath
97.20 -4.2 92.96 perf-profile.children.cycles-pp.iomap_iter
0.09 ± 14% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.update_process_times
0.10 ± 13% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler
0.06 ± 15% +0.0 0.09 perf-profile.children.cycles-pp.switch_fpu_return
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__switch_to_asm
0.00 +0.1 0.05 perf-profile.children.cycles-pp.pick_eevdf
0.00 +0.1 0.05 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__switch_to
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.rseq_ip_fixup
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.00 +0.1 0.07 perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.00 +0.1 0.08 ± 4% perf-profile.children.cycles-pp.set_next_entity
0.00 +0.1 0.09 ± 4% perf-profile.children.cycles-pp.put_prev_entity
0.00 +0.1 0.09 perf-profile.children.cycles-pp.update_load_avg
0.06 ± 7% +0.1 0.18 ± 2% perf-profile.children.cycles-pp.do_sched_yield
0.00 +0.1 0.11 ± 8% perf-profile.children.cycles-pp.stress_fiemap
0.00 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.update_curr
0.00 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.00 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.clear_bhb_loop
0.20 ± 13% +0.1 0.33 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.00 +0.1 0.13 perf-profile.children.cycles-pp.yield_task_fair
0.07 ± 86% +0.1 0.22 ± 19% perf-profile.children.cycles-pp.ordered_events__queue
0.07 ± 86% +0.1 0.22 ± 19% perf-profile.children.cycles-pp.queue_event
0.00 +0.2 0.16 ± 3% perf-profile.children.cycles-pp.ext4_inode_block_valid
0.07 ± 87% +0.2 0.23 ± 17% perf-profile.children.cycles-pp.process_simple
0.17 ± 61% +0.2 0.38 ± 11% perf-profile.children.cycles-pp.reader__read_event
0.17 ± 60% +0.2 0.39 ± 11% perf-profile.children.cycles-pp.perf_session__process_events
0.17 ± 60% +0.2 0.39 ± 11% perf-profile.children.cycles-pp.record__finish_output
0.07 ± 5% +0.2 0.28 perf-profile.children.cycles-pp.pick_next_task_fair
0.47 ± 10% +0.3 0.74 perf-profile.children.cycles-pp.__schedule
0.48 ± 10% +0.3 0.75 perf-profile.children.cycles-pp.schedule
0.06 ± 7% +0.4 0.47 perf-profile.children.cycles-pp.iomap_to_fiemap
0.50 ± 17% +0.4 0.94 perf-profile.children.cycles-pp.__x64_sys_sched_yield
0.15 ± 7% +0.9 1.00 perf-profile.children.cycles-pp.ext4_sb_block_valid
0.71 ± 14% +0.9 1.64 perf-profile.children.cycles-pp.__sched_yield
0.17 ± 8% +1.0 1.21 perf-profile.children.cycles-pp._copy_to_user
0.23 ± 9% +1.3 1.52 perf-profile.children.cycles-pp.__check_block_validity
0.18 ± 9% +1.4 1.62 ± 10% perf-profile.children.cycles-pp._raw_spin_lock
0.44 ± 6% +2.7 3.15 perf-profile.children.cycles-pp.fiemap_fill_next_extent
0.84 ± 7% +5.2 6.02 perf-profile.children.cycles-pp.iomap_iter_advance
0.63 ± 6% +52.2 52.83 perf-profile.children.cycles-pp.percpu_counter_add_batch
8.75 ± 13% +68.4 77.12 perf-profile.children.cycles-pp.ext4_es_lookup_extent
9.35 ± 11% +71.6 81.00 perf-profile.children.cycles-pp.ext4_map_blocks
50.35 ± 12% -41.0 9.32 ± 3% perf-profile.self.cycles-pp._raw_read_lock
35.26 ± 7% -35.0 0.25 perf-profile.self.cycles-pp.jbd2_transaction_committed
0.06 ± 15% +0.0 0.10 perf-profile.self.cycles-pp.__schedule
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.__switch_to
0.00 +0.1 0.07 perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.00 +0.1 0.10 ± 8% perf-profile.self.cycles-pp.stress_fiemap
0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.ext4_inode_block_valid
0.17 ± 12% +0.1 0.27 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.1 0.12 ± 3% perf-profile.self.cycles-pp.clear_bhb_loop
0.06 ±107% +0.2 0.21 ± 19% perf-profile.self.cycles-pp.queue_event
0.05 ± 46% +0.3 0.37 perf-profile.self.cycles-pp.__check_block_validity
0.04 ± 45% +0.3 0.36 perf-profile.self.cycles-pp.iomap_to_fiemap
0.14 ± 8% +0.8 0.95 perf-profile.self.cycles-pp.ext4_sb_block_valid
0.14 ± 7% +0.8 0.99 perf-profile.self.cycles-pp.iomap_fiemap
0.17 ± 8% +1.0 1.18 perf-profile.self.cycles-pp._copy_to_user
0.20 ± 7% +1.2 1.44 perf-profile.self.cycles-pp.iomap_iter
0.27 ± 7% +1.7 1.95 perf-profile.self.cycles-pp.fiemap_fill_next_extent
0.28 ± 7% +1.8 2.10 perf-profile.self.cycles-pp.ext4_iomap_begin_report
0.30 ± 7% +1.9 2.24 perf-profile.self.cycles-pp.ext4_set_iomap
0.40 ± 9% +2.1 2.54 perf-profile.self.cycles-pp.ext4_map_blocks
0.82 ± 6% +5.1 5.91 perf-profile.self.cycles-pp.iomap_iter_advance
3.91 ± 12% +11.0 14.90 perf-profile.self.cycles-pp.ext4_es_lookup_extent
0.55 ± 8% +50.4 50.92 perf-profile.self.cycles-pp.percpu_counter_add_batch
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists