[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202507010457.3b3d3c33-lkp@intel.com>
Date: Tue, 1 Jul 2025 10:57:34 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Baokun Li <libaokun1@...wei.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-ext4@...r.kernel.org>,
<tytso@....edu>, <jack@...e.cz>, <adilger.kernel@...ger.ca>,
<ojaswin@...ux.ibm.com>, <linux-kernel@...r.kernel.org>,
<yi.zhang@...wei.com>, <yangerkun@...wei.com>, <libaokun1@...wei.com>,
<oliver.sang@...el.com>
Subject: Re: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update
s_mb_last_group
Hello,
kernel test robot noticed a 31.1% improvement of stress-ng.fsize.ops_per_sec on:
commit: ad0d50f30d3fe376a99fd0e392867c7ca9b619e3 ("[PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update s_mb_last_group")
url: https://github.com/intel-lab-lkp/linux/commits/Baokun-Li/ext4-add-ext4_try_lock_group-to-skip-busy-groups/20250623-155451
base: https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev
patch link: https://lore.kernel.org/all/20250623073304.3275702-4-libaokun1@huawei.com/
patch subject: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update s_mb_last_group
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:
nr_threads: 100%
disk: 1HDD
testtime: 60s
fs: ext4
test: fsize
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250701/202507010457.3b3d3c33-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/1HDD/ext4/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp4/fsize/stress-ng/60s
commit:
86f92bf2c0 ("ext4: remove unnecessary s_mb_last_start")
ad0d50f30d ("ext4: remove unnecessary s_md_lock on update s_mb_last_group")
86f92bf2c059852a ad0d50f30d3fe376a99fd0e3928
---------------- ---------------------------
%stddev %change %stddev
\ | \
5042 ± 4% -10.1% 4532 ± 2% meminfo.Dirty
100194 ± 63% +92.5% 192828 ± 32% numa-meminfo.node0.Shmem
5082 ± 3% +28.1% 6510 ± 5% vmstat.system.cs
71089 -17.1% 58900 ± 2% perf-c2c.DRAM.remote
44206 -13.4% 38284 ± 2% perf-c2c.HITM.remote
131696 -4.1% 126359 ± 2% perf-c2c.HITM.total
0.15 ± 18% +0.2 0.35 ± 14% mpstat.cpu.all.iowait%
0.32 ± 7% -0.0 0.28 ± 4% mpstat.cpu.all.irq%
0.05 ± 4% +0.0 0.07 ± 3% mpstat.cpu.all.soft%
0.50 ± 13% +0.2 0.69 ± 16% mpstat.cpu.all.usr%
14478005 ± 2% +32.7% 19217687 ± 4% numa-numastat.node0.local_node
14540770 ± 2% +32.6% 19285137 ± 4% numa-numastat.node0.numa_hit
14722680 +28.8% 18967713 numa-numastat.node1.local_node
14793059 +28.7% 19032805 numa-numastat.node1.numa_hit
918392 -38.4% 565297 ± 18% sched_debug.cpu.avg_idle.avg
356474 ± 5% -92.0% 28413 ± 90% sched_debug.cpu.avg_idle.min
2362 ± 2% +18.8% 2806 ± 4% sched_debug.cpu.nr_switches.avg
1027 +35.5% 1391 ± 6% sched_debug.cpu.nr_switches.min
25263 ± 63% +91.0% 48258 ± 31% numa-vmstat.node0.nr_shmem
14540796 ± 2% +32.5% 19271949 ± 4% numa-vmstat.node0.numa_hit
14478031 ± 2% +32.6% 19204499 ± 4% numa-vmstat.node0.numa_local
14792432 +28.6% 19020203 numa-vmstat.node1.numa_hit
14722053 +28.8% 18955111 numa-vmstat.node1.numa_local
3780 +30.9% 4950 ± 2% stress-ng.fsize.SIGXFSZ_signals_per_sec
643887 +31.0% 843807 ± 2% stress-ng.fsize.ops
10726 +31.1% 14059 ± 2% stress-ng.fsize.ops_per_sec
126167 ± 2% +8.7% 137085 ± 2% stress-ng.time.involuntary_context_switches
21.82 ± 2% +45.1% 31.66 ± 4% stress-ng.time.user_time
5144 ± 15% +704.0% 41366 ± 20% stress-ng.time.voluntary_context_switches
1272 ± 4% -10.8% 1135 ± 2% proc-vmstat.nr_dirty
59459 +8.1% 64288 proc-vmstat.nr_slab_reclaimable
1272 ± 4% -10.8% 1134 ± 2% proc-vmstat.nr_zone_write_pending
29335922 +30.6% 38319823 proc-vmstat.numa_hit
29202778 +30.8% 38187281 proc-vmstat.numa_local
35012787 +31.9% 46166245 ± 2% proc-vmstat.pgalloc_normal
34753289 +31.9% 45830460 ± 2% proc-vmstat.pgfree
120464 +2.3% 123212 proc-vmstat.pgpgout
0.35 ± 3% +0.1 0.41 ± 3% perf-stat.i.branch-miss-rate%
48059547 +21.7% 58484853 perf-stat.i.branch-misses
33.69 -1.8 31.91 perf-stat.i.cache-miss-rate%
1.227e+08 +13.5% 1.392e+08 ± 7% perf-stat.i.cache-misses
3.623e+08 +19.9% 4.342e+08 ± 7% perf-stat.i.cache-references
4958 ± 3% +30.4% 6467 ± 4% perf-stat.i.context-switches
6.10 -5.2% 5.79 ± 4% perf-stat.i.cpi
208.43 +22.0% 254.30 ± 5% perf-stat.i.cpu-migrations
3333 -11.4% 2954 ± 7% perf-stat.i.cycles-between-cache-misses
0.33 +0.1 0.39 ± 2% perf-stat.overall.branch-miss-rate%
33.87 -1.8 32.04 perf-stat.overall.cache-miss-rate%
6.16 -5.3% 5.83 ± 4% perf-stat.overall.cpi
3360 -11.5% 2973 ± 7% perf-stat.overall.cycles-between-cache-misses
0.16 +5.8% 0.17 ± 4% perf-stat.overall.ipc
47200442 +21.7% 57451126 perf-stat.ps.branch-misses
1.206e+08 +13.5% 1.369e+08 ± 7% perf-stat.ps.cache-misses
3.563e+08 +19.9% 4.271e+08 ± 7% perf-stat.ps.cache-references
4873 ± 3% +30.3% 6351 ± 4% perf-stat.ps.context-switches
204.75 +22.0% 249.75 ± 5% perf-stat.ps.cpu-migrations
6.583e+10 +5.7% 6.955e+10 ± 4% perf-stat.ps.instructions
4.046e+12 +5.5% 4.267e+12 ± 4% perf-stat.total.instructions
0.15 ± 24% +97.6% 0.31 ± 21% perf-sched.sch_delay.avg.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
0.69 ± 34% -45.3% 0.38 ± 24% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
0.04 ± 2% -11.0% 0.03 ± 7% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.09 ± 18% +104.1% 0.19 ± 38% perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.32 ± 59% +284.8% 1.24 ± 71% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio
16.34 ± 81% -81.7% 2.99 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
3.51 ± 11% +56.2% 5.48 ± 38% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
0.06 ±223% +1443.8% 0.86 ± 97% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.generic_update_time
0.47 ± 33% +337.5% 2.05 ± 67% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_ext_insert_extent.ext4_ext_map_blocks.ext4_map_create_blocks
0.47 ± 64% +417.9% 2.43 ± 53% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change
7.30 ± 60% -53.7% 3.38 ± 22% perf-sched.sch_delay.max.ms.__cond_resched.__find_get_block_slow.find_get_block_common.bdev_getblk.ext4_read_block_bitmap_nowait
2.72 ± 34% +59.5% 4.33 ± 20% perf-sched.sch_delay.max.ms.__cond_resched.down_read.ext4_map_blocks.ext4_alloc_file_blocks.isra
0.08 ±138% +382.6% 0.37 ± 24% perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_truncate.do_ftruncate.do_sys_ftruncate
1.33 ± 90% +122.5% 2.96 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_alloc_file_blocks.isra.0
3.04 +93.7% 5.89 ± 82% perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_setattr.notify_change.do_truncate
3.66 ± 19% +52.6% 5.59 ± 31% perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_truncate.ext4_setattr.notify_change
0.41 ± 26% +169.4% 1.11 ± 78% perf-sched.sch_delay.max.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
6.93 ± 82% -65.5% 2.39 ± 49% perf-sched.sch_delay.max.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
0.23 ± 68% +357.9% 1.04 ± 82% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
0.26 ± 39% +205.8% 0.78 ± 73% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.ext4_mb_initialize_context.ext4_mb_new_blocks.ext4_ext_map_blocks
0.11 ± 93% +1390.4% 1.60 ± 62% perf-sched.sch_delay.max.ms.io_schedule.bit_wait_io.__wait_on_bit_lock.out_of_line_wait_on_bit_lock
0.30 ± 74% +2467.2% 7.58 ± 60% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__find_get_block_slow.find_get_block_common
2.66 ± 18% +29.4% 3.44 ± 7% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
2.64 ± 21% +197.3% 7.84 ± 53% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
87.11 ± 2% -15.3% 73.79 ± 4% perf-sched.total_wait_and_delay.average.ms
21561 ± 2% +18.5% 25553 ± 4% perf-sched.total_wait_and_delay.count.ms
86.95 ± 2% -15.4% 73.60 ± 4% perf-sched.total_wait_time.average.ms
0.76 ± 54% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context
0.61 ± 47% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
168.47 ± 2% -10.4% 150.98 ± 4% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
125.33 ± 10% +72.2% 215.83 ± 8% perf-sched.wait_and_delay.count.__cond_resched.__ext4_handle_dirty_metadata.ext4_do_update_inode.isra.0
781.33 ± 3% -74.6% 198.83 ± 15% perf-sched.wait_and_delay.count.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
278.67 ± 13% +310.9% 1145 ± 20% perf-sched.wait_and_delay.count.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
1116 ± 3% -81.5% 206.33 ± 13% perf-sched.wait_and_delay.count.__cond_resched.__find_get_block_slow.find_get_block_common.bdev_getblk.ext4_read_block_bitmap_nowait
166.33 ± 8% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context
115.50 ± 46% +298.7% 460.50 ± 16% perf-sched.wait_and_delay.count.__cond_resched.down_read.ext4_map_blocks.ext4_alloc_file_blocks.isra
138.33 ± 16% +290.7% 540.50 ± 18% perf-sched.wait_and_delay.count.__cond_resched.down_write.ext4_setattr.notify_change.do_truncate
310.17 ± 14% +263.9% 1128 ± 21% perf-sched.wait_and_delay.count.__cond_resched.down_write.ext4_truncate.ext4_setattr.notify_change
1274 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
7148 ± 2% +11.9% 7998 ± 4% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
32.82 ± 80% -81.8% 5.99 ± 34% perf-sched.wait_and_delay.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
12.06 ± 22% +168.4% 32.36 ± 47% perf-sched.wait_and_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
20.55 ± 82% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context
27.66 ± 20% +78.9% 49.49 ± 60% perf-sched.wait_and_delay.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_dirty_inode.__mark_inode_dirty
16.75 ± 64% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
0.19 ± 29% +191.5% 0.55 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change
0.15 ± 24% +98.1% 0.31 ± 21% perf-sched.wait_time.avg.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
168.44 ± 2% -10.4% 150.94 ± 4% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.36 ± 40% +392.9% 1.78 ± 71% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio
17.42 ± 70% -82.4% 3.07 ± 34% perf-sched.wait_time.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
11.49 ± 26% +180.6% 32.23 ± 48% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
0.06 ±223% +1443.8% 0.86 ± 97% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.generic_update_time
0.47 ± 33% +411.8% 2.40 ± 56% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_ext_insert_extent.ext4_ext_map_blocks.ext4_map_create_blocks
0.64 ±161% +244.6% 2.20 ± 61% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_setattr.notify_change.do_truncate
0.47 ± 64% +968.9% 5.01 ± 83% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change
0.08 ±138% +382.6% 0.37 ± 24% perf-sched.wait_time.max.ms.__cond_resched.down_write.do_truncate.do_ftruncate.do_sys_ftruncate
0.41 ± 26% +169.4% 1.11 ± 78% perf-sched.wait_time.max.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
17.67 ± 25% +110.8% 37.26 ± 35% perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_dirty_inode.__mark_inode_dirty
2.23 ± 51% +360.3% 10.28 ± 71% perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_ext_remove_space.ext4_ext_truncate
84.33 ± 14% -46.9% 44.77 ± 72% perf-sched.wait_time.max.ms.__cond_resched.ext4_mb_load_buddy_gfp.ext4_process_freed_data.ext4_journal_commit_callback.jbd2_journal_commit_transaction
0.23 ± 68% +357.9% 1.04 ± 82% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
0.26 ± 39% +205.8% 0.78 ± 73% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.ext4_mb_initialize_context.ext4_mb_new_blocks.ext4_ext_map_blocks
276.82 ± 13% -22.2% 215.50 ± 13% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.30 ± 74% +9637.4% 28.76 ± 48% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.__find_get_block_slow.find_get_block_common
1.44 ± 79% +11858.3% 172.80 ±219% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists