lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d6db7e9-318f-4242-9883-9eee8ee20f5e@suse.com>
Date: Wed, 3 Sep 2025 18:18:01 +0930
From: Qu Wenruo <wqu@...e.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
 David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org
Subject: Re: [linus:master] [btrfs] bddf57a707:
 stress-ng.sync-file.ops_per_sec 44.2% regression



在 2025/9/3 18:14, kernel test robot 写道:
> 
> Hello,
> 
> kernel test robot noticed a 44.2% regression of stress-ng.sync-file.ops_per_sec on:
> 
> 
> commit: bddf57a70781ef8821d415200bdbcb71f443993a ("btrfs: delay btrfs_open_devices() until super block is created")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> [still regression on      linus/master fb679c832b6497f19fffb8274c419783909c0912]
> [still regression on linux-next/master 3cace99d63192a7250461b058279a42d91075d0c]
> 
> testcase: stress-ng
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	disk: 1HDD
> 	testtime: 60s
> 	fs: btrfs
> 	test: sync-file
> 	cpufreq_governor: performance
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202509031643.303d114c-lkp@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250903/202509031643.303d114c-lkp@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>    gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sync-file/stress-ng/60s
> 
> commit:
>    de339cbfb4 ("btrfs: call bdev_fput() to reclaim the blk_holder immediately")
>    bddf57a707 ("btrfs: delay btrfs_open_devices() until super block is created")

This doesn't sound sane to me.

The two commits are only affecting btrfs mounting/unmounting, I can not 
make any sense on why they would affect performance.

Or does stress-ng doing a lot of mounting/unmounting?

Thanks,
Qu

> 
> de339cbfb4027957 bddf57a70781ef8821d415200bd
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>     1885182 ±  2%     -35.0%    1226241        cpuidle..usage
>        1.35 ±  3%     +26.8%       1.71 ± 31%  iostat.cpu.iowait
>      114330           -10.0%     102922        meminfo.Shmem
>       17680 ±  2%     -39.7%      10656 ±  2%  vmstat.system.cs
>       32084 ±  3%     -33.6%      21290 ±  2%  vmstat.system.in
>        0.08 ±  2%      -0.0        0.05 ±  2%  mpstat.cpu.all.irq%
>        0.03 ±  6%      -0.0        0.02 ±  5%  mpstat.cpu.all.soft%
>        0.66 ±  3%      -0.2        0.45 ±  2%  mpstat.cpu.all.sys%
>      311692 ±  9%     -17.9%     255869 ± 12%  numa-numastat.node0.numa_hit
>      304181 ±  8%     -24.2%     230456 ± 20%  numa-numastat.node1.local_node
>      331109 ±  6%     -19.3%     267048 ± 11%  numa-numastat.node1.numa_hit
>      311531 ±  9%     -17.9%     255766 ± 13%  numa-vmstat.node0.numa_hit
>      330584 ±  6%     -19.3%     266623 ± 10%  numa-vmstat.node1.numa_hit
>      303656 ±  8%     -24.2%     230030 ± 20%  numa-vmstat.node1.numa_local
>       59.00 ± 13%     -41.5%      34.50 ± 10%  perf-c2c.DRAM.local
>        1139 ±  4%     -46.1%     613.67 ±  5%  perf-c2c.DRAM.remote
>        1254 ±  5%     -45.3%     686.50 ±  2%  perf-c2c.HITM.local
>      681.33 ±  3%     -45.8%     369.50 ±  6%  perf-c2c.HITM.remote
>        1.33 ± 41%     -93.8%       0.08 ±223%  sched_debug.cfs_rq:/.runnable_avg.min
>        1.33 ± 41%     -93.8%       0.08 ±223%  sched_debug.cfs_rq:/.util_avg.min
>       10502           -34.4%       6886        sched_debug.cpu.nr_switches.avg
>        8094 ±  2%     -41.8%       4710 ±  2%  sched_debug.cpu.nr_switches.min
>       21146 ±  2%     -44.2%      11809        stress-ng.sync-file.ops
>      352.20 ±  2%     -44.2%     196.65        stress-ng.sync-file.ops_per_sec
>       34.00 ±  2%     -43.6%      19.17        stress-ng.time.percent_of_cpu_this_job_got
>       20.20 ±  2%     -43.6%      11.38        stress-ng.time.system_time
>      513054 ±  2%     -45.5%     279629        stress-ng.time.voluntary_context_switches
>       28437           -10.3%      25522        proc-vmstat.nr_shmem
>       25303            -1.0%      25040        proc-vmstat.nr_slab_reclaimable
>      644388           -18.6%     524319        proc-vmstat.numa_hit
>      578153           -20.8%     458095        proc-vmstat.numa_local
>      682807           -18.2%     558809        proc-vmstat.pgalloc_normal
>      675599           -18.3%     551960 ±  2%  proc-vmstat.pgfree
>        1.61            -5.0%       1.53        perf-stat.i.MPKI
>   6.692e+08 ±  3%      -8.2%  6.144e+08 ±  6%  perf-stat.i.branch-instructions
>       23.54            -2.2       21.29        perf-stat.i.cache-miss-rate%
>     2665211 ±  3%     -27.0%    1946091 ±  4%  perf-stat.i.cache-misses
>    12037045 ±  3%     -18.2%    9840696 ±  3%  perf-stat.i.cache-references
>       18418 ±  3%     -40.1%      11025        perf-stat.i.context-switches
>        2.13            -5.4%       2.01        perf-stat.i.cpi
>   3.964e+09 ±  3%     -19.8%  3.177e+09 ±  4%  perf-stat.i.cpu-cycles
>      181.54 ±  3%     -23.8%     138.31 ±  4%  perf-stat.i.cpu-migrations
>        1472            +7.4%       1581        perf-stat.i.cycles-between-cache-misses
>   3.216e+09 ±  3%      -7.6%  2.972e+09 ±  6%  perf-stat.i.instructions
>        0.65            +8.4%       0.71 ±  2%  perf-stat.i.ipc
>        0.83           -20.9%       0.66 ±  2%  perf-stat.overall.MPKI
>        4.24            +0.3        4.58 ±  2%  perf-stat.overall.branch-miss-rate%
>       22.13            -2.4       19.76        perf-stat.overall.cache-miss-rate%
>        1.23           -13.1%       1.07 ±  2%  perf-stat.overall.cpi
>        1488            +9.8%       1634        perf-stat.overall.cycles-between-cache-misses
>        0.81           +15.1%       0.93 ±  2%  perf-stat.overall.ipc
>   6.587e+08 ±  3%      -8.2%  6.047e+08 ±  6%  perf-stat.ps.branch-instructions
>     2623092 ±  3%     -27.0%    1915109 ±  4%  perf-stat.ps.cache-misses
>    11851537 ±  3%     -18.3%    9688099 ±  3%  perf-stat.ps.cache-references
>       18125 ±  3%     -40.2%      10847        perf-stat.ps.context-switches
>   3.903e+09 ±  3%     -19.8%  3.129e+09 ±  4%  perf-stat.ps.cpu-cycles
>      178.73 ±  3%     -23.8%     136.12 ±  4%  perf-stat.ps.cpu-migrations
>   3.166e+09 ±  3%      -7.6%  2.925e+09 ±  6%  perf-stat.ps.instructions
>   2.004e+11            -9.3%  1.818e+11 ±  5%  perf-stat.total.instructions
>        0.00 ±223%   +4160.0%       0.04 ± 35%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.btrfs_create_pending_block_groups
>        0.01          -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
>        0.01 ± 15%    +246.8%       0.03 ± 96%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.barrier_all_devices.write_all_supers.btrfs_sync_log
>        0.00 ±223%   +4180.0%       0.04 ± 35%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.btrfs_create_pending_block_groups
>        0.02 ± 98%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
>        0.16 ±106%     -77.8%       0.04 ± 39%  perf-sched.sch_delay.max.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
>       27.42 ±  3%     +53.9%      42.21 ±  4%  perf-sched.total_wait_and_delay.average.ms
>       40831 ±  3%     -36.6%      25906 ±  4%  perf-sched.total_wait_and_delay.count.ms
>       27.41 ±  3%     +54.0%      42.21 ±  4%  perf-sched.total_wait_time.average.ms
>      229.23 ±  2%     +51.7%     347.78 ± 15%  perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       12.64 ±  3%     +56.9%      19.84 ±  3%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
>        2.33 ± 11%     +63.7%       3.81 ± 18%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
>        6.94 ±  2%     +29.6%       9.00 ± 11%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        0.31 ±  5%    +421.7%       1.64 ± 25%  perf-sched.wait_and_delay.avg.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
>       18.67 ±  5%     -35.7%      12.00 ± 16%  perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       22342 ±  4%     -40.1%      13375 ±  4%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
>        9405 ±  4%     -40.8%       5564 ±  4%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
>      666.83 ±  2%     -22.5%     516.50 ± 10%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        4582 ±  4%     -37.4%       2866 ±  5%  perf-sched.wait_and_delay.count.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
>        5.34 ± 21%    +756.6%      45.72 ±  4%  perf-sched.wait_time.avg.ms.io_schedule.bit_wait_io.__wait_on_bit.out_of_line_wait_on_bit
>       22.83 ±  2%     +15.9%      26.46 ±  8%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.write_all_supers.btrfs_sync_log
>      229.23 ±  2%     +51.6%     347.59 ± 15%  perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       12.63 ±  3%     +57.1%      19.83 ±  3%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
>        2.32 ± 12%     +64.0%       3.81 ± 18%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
>        8.58 ±  9%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
>        6.94 ±  2%     +29.6%       8.99 ± 11%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        0.31 ±  5%    +427.4%       1.63 ± 25%  perf-sched.wait_time.avg.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
>      101.57 ± 20%     +56.6%     159.06 ± 22%  perf-sched.wait_time.max.ms.io_schedule.bit_wait_io.__wait_on_bit.out_of_line_wait_on_bit
>      116.41 ± 27%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ