[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202509031643.303d114c-lkp@intel.com>
Date: Wed, 3 Sep 2025 16:44:49 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Qu Wenruo <wqu@...e.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
David Sterba <dsterba@...e.com>, <linux-btrfs@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: [linus:master] [btrfs] bddf57a707: stress-ng.sync-file.ops_per_sec
44.2% regression
Hello,
kernel test robot noticed a 44.2% regression of stress-ng.sync-file.ops_per_sec on:
commit: bddf57a70781ef8821d415200bdbcb71f443993a ("btrfs: delay btrfs_open_devices() until super block is created")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[still regression on linus/master fb679c832b6497f19fffb8274c419783909c0912]
[still regression on linux-next/master 3cace99d63192a7250461b058279a42d91075d0c]
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
disk: 1HDD
testtime: 60s
fs: btrfs
test: sync-file
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202509031643.303d114c-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250903/202509031643.303d114c-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sync-file/stress-ng/60s
commit:
de339cbfb4 ("btrfs: call bdev_fput() to reclaim the blk_holder immediately")
bddf57a707 ("btrfs: delay btrfs_open_devices() until super block is created")
de339cbfb4027957 bddf57a70781ef8821d415200bd
---------------- ---------------------------
%stddev %change %stddev
\ | \
1885182 ± 2% -35.0% 1226241 cpuidle..usage
1.35 ± 3% +26.8% 1.71 ± 31% iostat.cpu.iowait
114330 -10.0% 102922 meminfo.Shmem
17680 ± 2% -39.7% 10656 ± 2% vmstat.system.cs
32084 ± 3% -33.6% 21290 ± 2% vmstat.system.in
0.08 ± 2% -0.0 0.05 ± 2% mpstat.cpu.all.irq%
0.03 ± 6% -0.0 0.02 ± 5% mpstat.cpu.all.soft%
0.66 ± 3% -0.2 0.45 ± 2% mpstat.cpu.all.sys%
311692 ± 9% -17.9% 255869 ± 12% numa-numastat.node0.numa_hit
304181 ± 8% -24.2% 230456 ± 20% numa-numastat.node1.local_node
331109 ± 6% -19.3% 267048 ± 11% numa-numastat.node1.numa_hit
311531 ± 9% -17.9% 255766 ± 13% numa-vmstat.node0.numa_hit
330584 ± 6% -19.3% 266623 ± 10% numa-vmstat.node1.numa_hit
303656 ± 8% -24.2% 230030 ± 20% numa-vmstat.node1.numa_local
59.00 ± 13% -41.5% 34.50 ± 10% perf-c2c.DRAM.local
1139 ± 4% -46.1% 613.67 ± 5% perf-c2c.DRAM.remote
1254 ± 5% -45.3% 686.50 ± 2% perf-c2c.HITM.local
681.33 ± 3% -45.8% 369.50 ± 6% perf-c2c.HITM.remote
1.33 ± 41% -93.8% 0.08 ±223% sched_debug.cfs_rq:/.runnable_avg.min
1.33 ± 41% -93.8% 0.08 ±223% sched_debug.cfs_rq:/.util_avg.min
10502 -34.4% 6886 sched_debug.cpu.nr_switches.avg
8094 ± 2% -41.8% 4710 ± 2% sched_debug.cpu.nr_switches.min
21146 ± 2% -44.2% 11809 stress-ng.sync-file.ops
352.20 ± 2% -44.2% 196.65 stress-ng.sync-file.ops_per_sec
34.00 ± 2% -43.6% 19.17 stress-ng.time.percent_of_cpu_this_job_got
20.20 ± 2% -43.6% 11.38 stress-ng.time.system_time
513054 ± 2% -45.5% 279629 stress-ng.time.voluntary_context_switches
28437 -10.3% 25522 proc-vmstat.nr_shmem
25303 -1.0% 25040 proc-vmstat.nr_slab_reclaimable
644388 -18.6% 524319 proc-vmstat.numa_hit
578153 -20.8% 458095 proc-vmstat.numa_local
682807 -18.2% 558809 proc-vmstat.pgalloc_normal
675599 -18.3% 551960 ± 2% proc-vmstat.pgfree
1.61 -5.0% 1.53 perf-stat.i.MPKI
6.692e+08 ± 3% -8.2% 6.144e+08 ± 6% perf-stat.i.branch-instructions
23.54 -2.2 21.29 perf-stat.i.cache-miss-rate%
2665211 ± 3% -27.0% 1946091 ± 4% perf-stat.i.cache-misses
12037045 ± 3% -18.2% 9840696 ± 3% perf-stat.i.cache-references
18418 ± 3% -40.1% 11025 perf-stat.i.context-switches
2.13 -5.4% 2.01 perf-stat.i.cpi
3.964e+09 ± 3% -19.8% 3.177e+09 ± 4% perf-stat.i.cpu-cycles
181.54 ± 3% -23.8% 138.31 ± 4% perf-stat.i.cpu-migrations
1472 +7.4% 1581 perf-stat.i.cycles-between-cache-misses
3.216e+09 ± 3% -7.6% 2.972e+09 ± 6% perf-stat.i.instructions
0.65 +8.4% 0.71 ± 2% perf-stat.i.ipc
0.83 -20.9% 0.66 ± 2% perf-stat.overall.MPKI
4.24 +0.3 4.58 ± 2% perf-stat.overall.branch-miss-rate%
22.13 -2.4 19.76 perf-stat.overall.cache-miss-rate%
1.23 -13.1% 1.07 ± 2% perf-stat.overall.cpi
1488 +9.8% 1634 perf-stat.overall.cycles-between-cache-misses
0.81 +15.1% 0.93 ± 2% perf-stat.overall.ipc
6.587e+08 ± 3% -8.2% 6.047e+08 ± 6% perf-stat.ps.branch-instructions
2623092 ± 3% -27.0% 1915109 ± 4% perf-stat.ps.cache-misses
11851537 ± 3% -18.3% 9688099 ± 3% perf-stat.ps.cache-references
18125 ± 3% -40.2% 10847 perf-stat.ps.context-switches
3.903e+09 ± 3% -19.8% 3.129e+09 ± 4% perf-stat.ps.cpu-cycles
178.73 ± 3% -23.8% 136.12 ± 4% perf-stat.ps.cpu-migrations
3.166e+09 ± 3% -7.6% 2.925e+09 ± 6% perf-stat.ps.instructions
2.004e+11 -9.3% 1.818e+11 ± 5% perf-stat.total.instructions
0.00 ±223% +4160.0% 0.04 ± 35% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.btrfs_create_pending_block_groups
0.01 -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
0.01 ± 15% +246.8% 0.03 ± 96% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.barrier_all_devices.write_all_supers.btrfs_sync_log
0.00 ±223% +4180.0% 0.04 ± 35% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.btrfs_create_pending_block_groups
0.02 ± 98% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
0.16 ±106% -77.8% 0.04 ± 39% perf-sched.sch_delay.max.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
27.42 ± 3% +53.9% 42.21 ± 4% perf-sched.total_wait_and_delay.average.ms
40831 ± 3% -36.6% 25906 ± 4% perf-sched.total_wait_and_delay.count.ms
27.41 ± 3% +54.0% 42.21 ± 4% perf-sched.total_wait_time.average.ms
229.23 ± 2% +51.7% 347.78 ± 15% perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
12.64 ± 3% +56.9% 19.84 ± 3% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
2.33 ± 11% +63.7% 3.81 ± 18% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
6.94 ± 2% +29.6% 9.00 ± 11% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.31 ± 5% +421.7% 1.64 ± 25% perf-sched.wait_and_delay.avg.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
18.67 ± 5% -35.7% 12.00 ± 16% perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
22342 ± 4% -40.1% 13375 ± 4% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
9405 ± 4% -40.8% 5564 ± 4% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
666.83 ± 2% -22.5% 516.50 ± 10% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
4582 ± 4% -37.4% 2866 ± 5% perf-sched.wait_and_delay.count.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
5.34 ± 21% +756.6% 45.72 ± 4% perf-sched.wait_time.avg.ms.io_schedule.bit_wait_io.__wait_on_bit.out_of_line_wait_on_bit
22.83 ± 2% +15.9% 26.46 ± 8% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.write_all_supers.btrfs_sync_log
229.23 ± 2% +51.6% 347.59 ± 15% perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
12.63 ± 3% +57.1% 19.83 ± 3% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
2.32 ± 12% +64.0% 3.81 ± 18% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
8.58 ± 9% -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
6.94 ± 2% +29.6% 8.99 ± 11% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.31 ± 5% +427.4% 1.63 ± 25% perf-sched.wait_time.avg.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
101.57 ± 20% +56.6% 159.06 ± 22% perf-sched.wait_time.max.ms.io_schedule.bit_wait_io.__wait_on_bit.out_of_line_wait_on_bit
116.41 ± 27% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_timeout.btrfs_sync_log.btrfs_sync_file.do_fsync
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists