lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202507010457.3b3d3c33-lkp@intel.com>
Date: Tue, 1 Jul 2025 10:57:34 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Baokun Li <libaokun1@...wei.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-ext4@...r.kernel.org>,
	<tytso@....edu>, <jack@...e.cz>, <adilger.kernel@...ger.ca>,
	<ojaswin@...ux.ibm.com>, <linux-kernel@...r.kernel.org>,
	<yi.zhang@...wei.com>, <yangerkun@...wei.com>, <libaokun1@...wei.com>,
	<oliver.sang@...el.com>
Subject: Re: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update
 s_mb_last_group



Hello,

kernel test robot noticed a 31.1% improvement of stress-ng.fsize.ops_per_sec on:


commit: ad0d50f30d3fe376a99fd0e392867c7ca9b619e3 ("[PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update s_mb_last_group")
url: https://github.com/intel-lab-lkp/linux/commits/Baokun-Li/ext4-add-ext4_try_lock_group-to-skip-busy-groups/20250623-155451
base: https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev
patch link: https://lore.kernel.org/all/20250623073304.3275702-4-libaokun1@huawei.com/
patch subject: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update s_mb_last_group

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

	nr_threads: 100%
	disk: 1HDD
	testtime: 60s
	fs: ext4
	test: fsize
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250701/202507010457.3b3d3c33-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/1HDD/ext4/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp4/fsize/stress-ng/60s

commit: 
  86f92bf2c0 ("ext4: remove unnecessary s_mb_last_start")
  ad0d50f30d ("ext4: remove unnecessary s_md_lock on update s_mb_last_group")

86f92bf2c059852a ad0d50f30d3fe376a99fd0e3928 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      5042 ±  4%     -10.1%       4532 ±  2%  meminfo.Dirty
    100194 ± 63%     +92.5%     192828 ± 32%  numa-meminfo.node0.Shmem
      5082 ±  3%     +28.1%       6510 ±  5%  vmstat.system.cs
     71089           -17.1%      58900 ±  2%  perf-c2c.DRAM.remote
     44206           -13.4%      38284 ±  2%  perf-c2c.HITM.remote
    131696            -4.1%     126359 ±  2%  perf-c2c.HITM.total
      0.15 ± 18%      +0.2        0.35 ± 14%  mpstat.cpu.all.iowait%
      0.32 ±  7%      -0.0        0.28 ±  4%  mpstat.cpu.all.irq%
      0.05 ±  4%      +0.0        0.07 ±  3%  mpstat.cpu.all.soft%
      0.50 ± 13%      +0.2        0.69 ± 16%  mpstat.cpu.all.usr%
  14478005 ±  2%     +32.7%   19217687 ±  4%  numa-numastat.node0.local_node
  14540770 ±  2%     +32.6%   19285137 ±  4%  numa-numastat.node0.numa_hit
  14722680           +28.8%   18967713        numa-numastat.node1.local_node
  14793059           +28.7%   19032805        numa-numastat.node1.numa_hit
    918392           -38.4%     565297 ± 18%  sched_debug.cpu.avg_idle.avg
    356474 ±  5%     -92.0%      28413 ± 90%  sched_debug.cpu.avg_idle.min
      2362 ±  2%     +18.8%       2806 ±  4%  sched_debug.cpu.nr_switches.avg
      1027           +35.5%       1391 ±  6%  sched_debug.cpu.nr_switches.min
     25263 ± 63%     +91.0%      48258 ± 31%  numa-vmstat.node0.nr_shmem
  14540796 ±  2%     +32.5%   19271949 ±  4%  numa-vmstat.node0.numa_hit
  14478031 ±  2%     +32.6%   19204499 ±  4%  numa-vmstat.node0.numa_local
  14792432           +28.6%   19020203        numa-vmstat.node1.numa_hit
  14722053           +28.8%   18955111        numa-vmstat.node1.numa_local
      3780           +30.9%       4950 ±  2%  stress-ng.fsize.SIGXFSZ_signals_per_sec
    643887           +31.0%     843807 ±  2%  stress-ng.fsize.ops
     10726           +31.1%      14059 ±  2%  stress-ng.fsize.ops_per_sec
    126167 ±  2%      +8.7%     137085 ±  2%  stress-ng.time.involuntary_context_switches
     21.82 ±  2%     +45.1%      31.66 ±  4%  stress-ng.time.user_time
      5144 ± 15%    +704.0%      41366 ± 20%  stress-ng.time.voluntary_context_switches
      1272 ±  4%     -10.8%       1135 ±  2%  proc-vmstat.nr_dirty
     59459            +8.1%      64288        proc-vmstat.nr_slab_reclaimable
      1272 ±  4%     -10.8%       1134 ±  2%  proc-vmstat.nr_zone_write_pending
  29335922           +30.6%   38319823        proc-vmstat.numa_hit
  29202778           +30.8%   38187281        proc-vmstat.numa_local
  35012787           +31.9%   46166245 ±  2%  proc-vmstat.pgalloc_normal
  34753289           +31.9%   45830460 ±  2%  proc-vmstat.pgfree
    120464            +2.3%     123212        proc-vmstat.pgpgout
      0.35 ±  3%      +0.1        0.41 ±  3%  perf-stat.i.branch-miss-rate%
  48059547           +21.7%   58484853        perf-stat.i.branch-misses
     33.69            -1.8       31.91        perf-stat.i.cache-miss-rate%
 1.227e+08           +13.5%  1.392e+08 ±  7%  perf-stat.i.cache-misses
 3.623e+08           +19.9%  4.342e+08 ±  7%  perf-stat.i.cache-references
      4958 ±  3%     +30.4%       6467 ±  4%  perf-stat.i.context-switches
      6.10            -5.2%       5.79 ±  4%  perf-stat.i.cpi
    208.43           +22.0%     254.30 ±  5%  perf-stat.i.cpu-migrations
      3333           -11.4%       2954 ±  7%  perf-stat.i.cycles-between-cache-misses
      0.33            +0.1        0.39 ±  2%  perf-stat.overall.branch-miss-rate%
     33.87            -1.8       32.04        perf-stat.overall.cache-miss-rate%
      6.16            -5.3%       5.83 ±  4%  perf-stat.overall.cpi
      3360           -11.5%       2973 ±  7%  perf-stat.overall.cycles-between-cache-misses
      0.16            +5.8%       0.17 ±  4%  perf-stat.overall.ipc
  47200442           +21.7%   57451126        perf-stat.ps.branch-misses
 1.206e+08           +13.5%  1.369e+08 ±  7%  perf-stat.ps.cache-misses
 3.563e+08           +19.9%  4.271e+08 ±  7%  perf-stat.ps.cache-references
      4873 ±  3%     +30.3%       6351 ±  4%  perf-stat.ps.context-switches
    204.75           +22.0%     249.75 ±  5%  perf-stat.ps.cpu-migrations
 6.583e+10            +5.7%  6.955e+10 ±  4%  perf-stat.ps.instructions
 4.046e+12            +5.5%  4.267e+12 ±  4%  perf-stat.total.instructions
      0.15 ± 24%     +97.6%       0.31 ± 21%  perf-sched.sch_delay.avg.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
      0.69 ± 34%     -45.3%       0.38 ± 24%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.04 ±  2%     -11.0%       0.03 ±  7%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.09 ± 18%    +104.1%       0.19 ± 38%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.32 ± 59%    +284.8%       1.24 ± 71%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio
     16.34 ± 81%     -81.7%       2.99 ± 34%  perf-sched.sch_delay.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
      3.51 ± 11%     +56.2%       5.48 ± 38%  perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
      0.06 ±223%   +1443.8%       0.86 ± 97%  perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.generic_update_time
      0.47 ± 33%    +337.5%       2.05 ± 67%  perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_ext_insert_extent.ext4_ext_map_blocks.ext4_map_create_blocks
      0.47 ± 64%    +417.9%       2.43 ± 53%  perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change
      7.30 ± 60%     -53.7%       3.38 ± 22%  perf-sched.sch_delay.max.ms.__cond_resched.__find_get_block_slow.find_get_block_common.bdev_getblk.ext4_read_block_bitmap_nowait
      2.72 ± 34%     +59.5%       4.33 ± 20%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.ext4_map_blocks.ext4_alloc_file_blocks.isra
      0.08 ±138%    +382.6%       0.37 ± 24%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_truncate.do_ftruncate.do_sys_ftruncate
      1.33 ± 90%    +122.5%       2.96 ± 34%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_alloc_file_blocks.isra.0
      3.04           +93.7%       5.89 ± 82%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_setattr.notify_change.do_truncate
      3.66 ± 19%     +52.6%       5.59 ± 31%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_truncate.ext4_setattr.notify_change
      0.41 ± 26%    +169.4%       1.11 ± 78%  perf-sched.sch_delay.max.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
      6.93 ± 82%     -65.5%       2.39 ± 49%  perf-sched.sch_delay.max.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
      0.23 ± 68%    +357.9%       1.04 ± 82%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
      0.26 ± 39%    +205.8%       0.78 ± 73%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.ext4_mb_initialize_context.ext4_mb_new_blocks.ext4_ext_map_blocks
      0.11 ± 93%   +1390.4%       1.60 ± 62%  perf-sched.sch_delay.max.ms.io_schedule.bit_wait_io.__wait_on_bit_lock.out_of_line_wait_on_bit_lock
      0.30 ± 74%   +2467.2%       7.58 ± 60%  perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__find_get_block_slow.find_get_block_common
      2.66 ± 18%     +29.4%       3.44 ±  7%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      2.64 ± 21%    +197.3%       7.84 ± 53%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     87.11 ±  2%     -15.3%      73.79 ±  4%  perf-sched.total_wait_and_delay.average.ms
     21561 ±  2%     +18.5%      25553 ±  4%  perf-sched.total_wait_and_delay.count.ms
     86.95 ±  2%     -15.4%      73.60 ±  4%  perf-sched.total_wait_time.average.ms
      0.76 ± 54%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context
      0.61 ± 47%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
    168.47 ±  2%     -10.4%     150.98 ±  4%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    125.33 ± 10%     +72.2%     215.83 ±  8%  perf-sched.wait_and_delay.count.__cond_resched.__ext4_handle_dirty_metadata.ext4_do_update_inode.isra.0
    781.33 ±  3%     -74.6%     198.83 ± 15%  perf-sched.wait_and_delay.count.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
    278.67 ± 13%    +310.9%       1145 ± 20%  perf-sched.wait_and_delay.count.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
      1116 ±  3%     -81.5%     206.33 ± 13%  perf-sched.wait_and_delay.count.__cond_resched.__find_get_block_slow.find_get_block_common.bdev_getblk.ext4_read_block_bitmap_nowait
    166.33 ±  8%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context
    115.50 ± 46%    +298.7%     460.50 ± 16%  perf-sched.wait_and_delay.count.__cond_resched.down_read.ext4_map_blocks.ext4_alloc_file_blocks.isra
    138.33 ± 16%    +290.7%     540.50 ± 18%  perf-sched.wait_and_delay.count.__cond_resched.down_write.ext4_setattr.notify_change.do_truncate
    310.17 ± 14%    +263.9%       1128 ± 21%  perf-sched.wait_and_delay.count.__cond_resched.down_write.ext4_truncate.ext4_setattr.notify_change
      1274 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
      7148 ±  2%     +11.9%       7998 ±  4%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     32.82 ± 80%     -81.8%       5.99 ± 34%  perf-sched.wait_and_delay.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
     12.06 ± 22%    +168.4%      32.36 ± 47%  perf-sched.wait_and_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
     20.55 ± 82%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context
     27.66 ± 20%     +78.9%      49.49 ± 60%  perf-sched.wait_and_delay.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_dirty_inode.__mark_inode_dirty
     16.75 ± 64%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks
      0.19 ± 29%    +191.5%       0.55 ± 29%  perf-sched.wait_time.avg.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change
      0.15 ± 24%     +98.1%       0.31 ± 21%  perf-sched.wait_time.avg.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
    168.44 ±  2%     -10.4%     150.94 ±  4%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.36 ± 40%    +392.9%       1.78 ± 71%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio
     17.42 ± 70%     -82.4%       3.07 ± 34%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks
     11.49 ± 26%    +180.6%      32.23 ± 48%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr
      0.06 ±223%   +1443.8%       0.86 ± 97%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.generic_update_time
      0.47 ± 33%    +411.8%       2.40 ± 56%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_ext_insert_extent.ext4_ext_map_blocks.ext4_map_create_blocks
      0.64 ±161%    +244.6%       2.20 ± 61%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_setattr.notify_change.do_truncate
      0.47 ± 64%    +968.9%       5.01 ± 83%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change
      0.08 ±138%    +382.6%       0.37 ± 24%  perf-sched.wait_time.max.ms.__cond_resched.down_write.do_truncate.do_ftruncate.do_sys_ftruncate
      0.41 ± 26%    +169.4%       1.11 ± 78%  perf-sched.wait_time.max.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space
     17.67 ± 25%    +110.8%      37.26 ± 35%  perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_dirty_inode.__mark_inode_dirty
      2.23 ± 51%    +360.3%      10.28 ± 71%  perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_ext_remove_space.ext4_ext_truncate
     84.33 ± 14%     -46.9%      44.77 ± 72%  perf-sched.wait_time.max.ms.__cond_resched.ext4_mb_load_buddy_gfp.ext4_process_freed_data.ext4_journal_commit_callback.jbd2_journal_commit_transaction
      0.23 ± 68%    +357.9%       1.04 ± 82%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
      0.26 ± 39%    +205.8%       0.78 ± 73%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.ext4_mb_initialize_context.ext4_mb_new_blocks.ext4_ext_map_blocks
    276.82 ± 13%     -22.2%     215.50 ± 13%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.30 ± 74%   +9637.4%      28.76 ± 48%  perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.__find_get_block_slow.find_get_block_common
      1.44 ± 79%  +11858.3%     172.80 ±219%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ