lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aBq7LIVVPQ9wg0Hs@xsang-OptiPlex-9020>
Date: Wed, 7 May 2025 09:45:16 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Christian Loehle <christian.loehle@....com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>, <linux-pm@...r.kernel.org>,
	<oliver.sang@...el.com>
Subject: Re: [linus:master] [cpuidle] 38f83090f5: fsmark.files_per_sec 5.1%
 regression

hi, Christian Loehle,

sorry for late, we are just back from a public holidays.

On Thu, May 01, 2025 at 09:57:27AM +0100, Christian Loehle wrote:
> Hi Oliver,
> 
> On 4/24/25 06:49, kernel test robot wrote:
> > 
> > Hello,
> > 
> > back in last Oct, we reported
> > "[linux-next:master] [cpuidle]  38f83090f5:  fsmark.app_overhead 51.9% regression"
> > (https://lore.kernel.org/all/202410072214.11d18a3c-oliver.sang@intel.com/)
> > but there is no obvious fsmark.files_per_sec difference at that time.
> > 
> > now on a different platform and with different fsmark parameters, we notice
> > a small regression of fsmark.files_per_sec. but no obvious fsmark.app_overhead
> > difference this time (so does not show in below detail table).
> 
> Any idea what's different on it in terms of cpuidle?
> There's a good chance 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
> is more useful than iowait metrics for such a workload, worth a try anyway.
> Any chance you could try that?

we tested the same tests with 85975daeaa4d and its parent.
* 85975daeaa4d6 cpuidle: menu: Avoid discarding useful information
* 8de7606f0fe2b cpuidle: menu: Eliminate outliers on both ends of the sample set

but found there is no performance diff, for both fsmark.files_per_sec and
fsmark.app_overhead.

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/9B/btrfs/8/x86_64-rhel-9.4/16d/256fpd/4/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-spr-2sp4/16G/fsmark

8de7606f0fe2bf5a 85975daeaa4d6ec560bfcd354fc
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   1553377            +0.7%    1564776        fsmark.app_overhead
     19741            -0.5%      19648        fsmark.files_per_sec


the config to test 85975daeaa4d and its parent are same, as attached.

(
BTW, the config used for 38f83090f5 and its parent v6.12-rc1 was uploaded as
https://download.01.org/0day-ci/archive/20250424/202504241314.fe89a536-lkp@intel.com/config-6.12.0-rc1-00001-g38f83090f515
)


> With the governor infrastructure you could just checkout menu at parent/38f83090f5
> and mainline, renaming the governor and then switching between the two at runtime.
> Or I can send a patch for mainline checking out menu of parent/38f83090f5.

not sure about this, do you mean the patch is for config? then we just need to
apply it to parent/38f83090f5, then redo the tests?

our infrastructure is running automatically, I cannot understand "switching
between the two at runtime". could you elaberate then we will check how to
run it in our automatical flows? thanks

> 
> 
> > 
> > last Oct report seems cause some confusion. so for this one, we try to rebuild
> > kernels and run more times to confirm the configs are same for parent/38f83090f5
> > and data is stable.
> > 
> > just FYI what we observed in our tests.
> > 
> > 
> > kernel test robot noticed a 5.1% regression of fsmark.files_per_sec on:
> > 
> > 
> > commit: 38f83090f515b4b5d59382dfada1e7457f19aa47 ("cpuidle: menu: Remove iowait influence")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > [test failed on linus/master      6fea5fabd3323cd27b2ab5143263f37ff29550cb]
> > [test failed on linux-next/master bc8aa6cdadcc00862f2b5720e5de2e17f696a081]
> > 
> > testcase: fsmark
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> > parameters:
> > 
> > 	iterations: 8
> > 	disk: 1SSD
> > 	nr_threads: 4
> > 	fs: btrfs
> > 	filesize: 9B
> > 	test_size: 16G
> > 	sync_method: fsyncBeforeClose
> > 	nr_directories: 16d
> > 	nr_files_per_directory: 256fpd
> > 	cpufreq_governor: performance
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202504241314.fe89a536-lkp@intel.com
> > 
> > 
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250424/202504241314.fe89a536-lkp@intel.com
> > 
> > =========================================================================================
> > compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> >   gcc-12/performance/1SSD/9B/btrfs/8/x86_64-rhel-9.4/16d/256fpd/4/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-spr-2sp4/16G/fsmark
> > 
> > commit: 
> >   v6.12-rc1
> >   38f83090f5 ("cpuidle: menu: Remove iowait influence")
> > 
> >        v6.12-rc1 38f83090f515b4b5d59382dfada 
> > ---------------- --------------------------- 
> >          %stddev     %change         %stddev
> >              \          |                \  
> >       0.12 ±  4%      +0.0        0.14 ±  2%  mpstat.cpu.all.iowait%
> >     940771            +2.8%     967141        proc-vmstat.pgfault
> >     120077            -5.7%     113267        vmstat.system.cs
> >      72197 ±  2%     -14.2%      61965 ±  2%  vmstat.system.in
> >      20496            -5.1%      19456        fsmark.files_per_sec
> >     219.58            +5.0%     230.50        fsmark.time.elapsed_time
> >     219.58            +5.0%     230.50        fsmark.time.elapsed_time.max
> >       0.02 ±  9%     +22.5%       0.03 ± 10%  perf-sched.wait_and_delay.avg.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
> >      15676 ±  9%     -25.2%      11732 ± 14%  perf-sched.wait_and_delay.count.btrfs_sync_log.btrfs_sync_file.do_fsync.__x64_sys_fsync
> >     130219 ±  7%     -14.8%     110942 ±  8%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
> >       0.02 ± 10%     +27.8%       0.02 ± 12%  perf-sched.wait_time.avg.ms.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
> >  1.457e+09            -4.4%  1.393e+09        perf-stat.i.branch-instructions
> >   14767792            -4.6%   14093766        perf-stat.i.branch-misses
> >   71872875            -3.4%   69406875        perf-stat.i.cache-references
> >     121769            -5.8%     114744        perf-stat.i.context-switches
> >  8.574e+09            -7.2%   7.96e+09        perf-stat.i.cpu-cycles
> >  7.979e+09            -3.6%  7.691e+09        perf-stat.i.instructions
> >       3773            -2.0%       3697        perf-stat.i.minor-faults
> >       3773            -2.0%       3697        perf-stat.i.page-faults
> >   1.45e+09            -4.4%  1.386e+09        perf-stat.ps.branch-instructions
> >   14670086            -4.5%   14003599        perf-stat.ps.branch-misses
> >   71521482            -3.4%   69080547        perf-stat.ps.cache-references
> >     121170            -5.7%     114204        perf-stat.ps.context-switches
> >  8.537e+09            -7.2%  7.925e+09        perf-stat.ps.cpu-cycles
> >  7.938e+09            -3.6%  7.654e+09        perf-stat.ps.instructions
> >       3717            -1.9%       3645        perf-stat.ps.minor-faults
> >       3717            -1.9%       3645        perf-stat.ps.page-faults
> >       9.65 ± 14%      -9.4        0.22 ±123%  perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> >      18.61 ±  6%      -4.7       13.95 ±  6%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> >      18.05 ±  7%      -4.5       13.51 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
> >      18.73 ±  7%      -4.5       14.22 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> >      22.60 ±  4%      -4.5       18.11 ±  3%  perf-profile.calltrace.cycles-pp.common_startup_64
> >      21.55 ±  5%      -4.5       17.08 ±  4%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> >      21.57 ±  5%      -4.5       17.11 ±  4%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
> >      21.58 ±  5%      -4.5       17.12 ±  4%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
> >       2.39 ±  6%      -0.6        1.78 ±  7%  perf-profile.calltrace.cycles-pp.btrfs_clone_write_end_io.blk_mq_end_request_batch.nvme_irq.__handle_irq_event_percpu.handle_irq_event
> >       2.32 ±  7%      -0.6        1.72 ±  7%  perf-profile.calltrace.cycles-pp.btrfs_orig_write_end_io.btrfs_clone_write_end_io.blk_mq_end_request_batch.nvme_irq.__handle_irq_event_percpu
> >       2.21 ±  7%      -0.6        1.64 ±  7%  perf-profile.calltrace.cycles-pp.end_bbio_meta_write.btrfs_orig_write_end_io.btrfs_clone_write_end_io.blk_mq_end_request_batch.nvme_irq
> >       0.52 ± 27%      +0.1        0.63 ±  6%  perf-profile.calltrace.cycles-pp.__blk_flush_plug.blk_finish_plug.btrfs_sync_log.btrfs_sync_file.do_fsync
> >       0.52 ± 27%      +0.1        0.63 ±  6%  perf-profile.calltrace.cycles-pp.blk_finish_plug.btrfs_sync_log.btrfs_sync_file.do_fsync.__x64_sys_fsync
> >       0.95 ±  6%      +0.2        1.10 ±  7%  perf-profile.calltrace.cycles-pp.__btrfs_wait_marked_extents.btrfs_wait_tree_log_extents.btrfs_sync_log.btrfs_sync_file.do_fsync
> >       1.04 ±  6%      +0.2        1.21 ±  7%  perf-profile.calltrace.cycles-pp.btrfs_wait_tree_log_extents.btrfs_sync_log.btrfs_sync_file.do_fsync.__x64_sys_fsync
> >       0.51 ± 27%      +0.2        0.68 ±  5%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> >       0.36 ± 70%      +0.3        0.62 ±  6%  perf-profile.calltrace.cycles-pp.btrfs_free_tree_block.btrfs_force_cow_block.btrfs_cow_block.btrfs_search_slot.btrfs_insert_empty_items
> >       0.36 ± 70%      +0.3        0.65 ±  5%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> >       2.20 ±  8%      +0.6        2.77 ± 11%  perf-profile.calltrace.cycles-pp.copy_extent_buffer_full.btrfs_force_cow_block.btrfs_cow_block.btrfs_search_slot.btrfs_insert_empty_items
> >       0.00            +0.8        0.76 ± 10%  perf-profile.calltrace.cycles-pp.folio_end_writeback.end_bbio_meta_write.btrfs_orig_write_end_io.blk_mq_end_request_batch.nvme_irq
> >       0.00            +0.8        0.85 ±  9%  perf-profile.calltrace.cycles-pp.end_bbio_meta_write.btrfs_orig_write_end_io.blk_mq_end_request_batch.nvme_irq.__handle_irq_event_percpu
> >       0.00            +0.9        0.89 ±  9%  perf-profile.calltrace.cycles-pp.btrfs_orig_write_end_io.blk_mq_end_request_batch.nvme_irq.__handle_irq_event_percpu.handle_irq_event
> >       7.54 ±  5%      +1.1        8.69 ±  4%  perf-profile.calltrace.cycles-pp.btrfs_force_cow_block.btrfs_cow_block.btrfs_search_slot.btrfs_insert_empty_items.copy_items
> >       7.56 ±  5%      +1.1        8.71 ±  4%  perf-profile.calltrace.cycles-pp.btrfs_cow_block.btrfs_search_slot.btrfs_insert_empty_items.copy_items.copy_inode_items_to_log
> >       3.57 ±  5%      +1.3        4.87 ±  7%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> >       1.99 ± 13%      +2.8        4.76 ± 10%  perf-profile.calltrace.cycles-pp.handle_edge_irq.__sysvec_posted_msi_notification.sysvec_posted_msi_notification.asm_sysvec_posted_msi_notification.cpuidle_enter_state
> >       2.25 ± 14%      +2.9        5.16 ± 11%  perf-profile.calltrace.cycles-pp.__sysvec_posted_msi_notification.sysvec_posted_msi_notification.asm_sysvec_posted_msi_notification.cpuidle_enter_state.cpuidle_enter
> >       2.35 ± 14%      +3.0        5.32 ± 11%  perf-profile.calltrace.cycles-pp.sysvec_posted_msi_notification.asm_sysvec_posted_msi_notification.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> >       2.51 ± 15%      +3.1        5.60 ± 11%  perf-profile.calltrace.cycles-pp.asm_sysvec_posted_msi_notification.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> >      50.39 ±  2%      +4.4       54.84 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_fsync.do_syscall_64.entry_SYSCALL_64_after_hwframe.fsync
> >      50.38 ±  2%      +4.4       54.82 ±  2%  perf-profile.calltrace.cycles-pp.do_fsync.__x64_sys_fsync.do_syscall_64.entry_SYSCALL_64_after_hwframe.fsync
> >      50.35 ±  2%      +4.4       54.80 ±  2%  perf-profile.calltrace.cycles-pp.btrfs_sync_file.do_fsync.__x64_sys_fsync.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      50.84 ±  2%      +4.5       55.30        perf-profile.calltrace.cycles-pp.fsync
> >      50.57 ±  2%      +4.5       55.03 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fsync
> >      50.57 ±  2%      +4.5       55.03 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fsync
> >       9.92 ± 13%      -9.4        0.52 ± 12%  perf-profile.children.cycles-pp.poll_idle
> >      18.84 ±  7%      -4.6       14.26 ±  6%  perf-profile.children.cycles-pp.cpuidle_enter_state
> >      18.85 ±  7%      -4.6       14.27 ±  6%  perf-profile.children.cycles-pp.cpuidle_enter
> >      19.59 ±  6%      -4.5       15.04 ±  6%  perf-profile.children.cycles-pp.cpuidle_idle_call
> >      22.60 ±  4%      -4.5       18.11 ±  3%  perf-profile.children.cycles-pp.common_startup_64
> >      22.60 ±  4%      -4.5       18.11 ±  3%  perf-profile.children.cycles-pp.cpu_startup_entry
> >      22.58 ±  4%      -4.5       18.09 ±  3%  perf-profile.children.cycles-pp.do_idle
> >      21.58 ±  5%      -4.5       17.12 ±  4%  perf-profile.children.cycles-pp.start_secondary
> >       2.46 ±  6%      -0.6        1.86 ±  7%  perf-profile.children.cycles-pp.btrfs_clone_write_end_io
> >       0.12 ± 13%      -0.1        0.06 ± 15%  perf-profile.children.cycles-pp.local_clock_noinstr
> >       0.22 ±  8%      +0.1        0.28 ±  6%  perf-profile.children.cycles-pp.__xa_set_mark
> >       0.57 ±  5%      +0.1        0.64 ±  5%  perf-profile.children.cycles-pp.blk_finish_plug
> >       0.59 ±  5%      +0.1        0.67 ±  5%  perf-profile.children.cycles-pp.__blk_flush_plug
> >       0.77 ±  5%      +0.1        0.87 ±  4%  perf-profile.children.cycles-pp.__folio_start_writeback
> >       0.60 ±  6%      +0.1        0.70 ±  6%  perf-profile.children.cycles-pp.pin_down_extent
> >       0.75 ±  6%      +0.1        0.88 ±  7%  perf-profile.children.cycles-pp.btrfs_free_tree_block
> >       0.96 ±  5%      +0.1        1.11 ±  7%  perf-profile.children.cycles-pp.__btrfs_wait_marked_extents
> >       1.11 ±  6%      +0.2        1.29 ±  6%  perf-profile.children.cycles-pp.set_extent_bit
> >       1.21 ±  5%      +0.2        1.38 ±  6%  perf-profile.children.cycles-pp.__set_extent_bit
> >       0.83 ±  5%      +0.2        1.01 ±  6%  perf-profile.children.cycles-pp.__folio_mark_dirty
> >       1.25 ±  6%      +0.2        1.47 ±  6%  perf-profile.children.cycles-pp.__folio_end_writeback
> >       2.28 ±  4%      +0.3        2.56 ±  4%  perf-profile.children.cycles-pp.__write_extent_buffer
> >       3.81 ±  5%      +0.5        4.34 ±  6%  perf-profile.children.cycles-pp.btrfs_alloc_tree_block
> >       3.73 ±  6%      +0.7        4.42 ±  8%  perf-profile.children.cycles-pp.copy_extent_buffer_full
> >       5.21 ±  4%      +0.8        5.97 ±  6%  perf-profile.children.cycles-pp.__memcpy
> >       3.62 ±  4%      +1.3        4.92 ±  7%  perf-profile.children.cycles-pp.intel_idle
> >      10.48 ±  4%      +1.5       11.94 ±  4%  perf-profile.children.cycles-pp.btrfs_force_cow_block
> >      10.51 ±  4%      +1.5       11.97 ±  4%  perf-profile.children.cycles-pp.btrfs_cow_block
> >      50.40 ±  2%      +4.4       54.84 ±  2%  perf-profile.children.cycles-pp.__x64_sys_fsync
> >      50.38 ±  2%      +4.4       54.83 ±  2%  perf-profile.children.cycles-pp.do_fsync
> >      50.36 ±  2%      +4.4       54.81 ±  2%  perf-profile.children.cycles-pp.btrfs_sync_file
> >      50.86 ±  2%      +4.5       55.32        perf-profile.children.cycles-pp.fsync
> >      69.02 ±  2%      +4.6       73.66        perf-profile.children.cycles-pp.do_syscall_64
> >      69.05 ±  2%      +4.6       73.69        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       6.48 ± 13%      -6.0        0.48 ± 12%  perf-profile.self.cycles-pp.poll_idle
> >       0.32 ±  8%      +0.1        0.39 ± 11%  perf-profile.self.cycles-pp.folio_mark_accessed
> >       5.10 ±  4%      +0.7        5.84 ±  6%  perf-profile.self.cycles-pp.__memcpy
> >       3.62 ±  4%      +1.3        4.92 ±  7%  perf-profile.self.cycles-pp.intel_idle
> > 
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> 

View attachment "config-6.13.0-rc7-00017-g85975daeaa4d" of type "text/plain" (242699 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ