lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202503261501.2a99ac6e-lkp@intel.com>
Date: Wed, 26 Mar 2025 16:04:48 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Pan Deng <pan.deng@...el.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Christian Brauner <brauner@...nel.org>, Lipeng Zhu <lipeng.zhu@...el.com>,
	Tianyou Li <tianyou.li@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>,
	<linux-fsdevel@...r.kernel.org>, <oliver.sang@...el.com>
Subject: [linus:master] [fs]  e249056c91:  stress-ng.mq.ops_per_sec 94.3%
 improvement



Hello,

kernel test robot noticed a 94.3% improvement of stress-ng.mq.ops_per_sec on:


commit: e249056c91a2f14ee40de2bf24cf72d8e68101f5 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V  CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: mq
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250326/202503261501.2a99ac6e-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/mq/stress-ng/60s

commit: 
  d3a194d95f ("epoll: simplify ep_busy_loop by removing always 0 argument")
  e249056c91 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")

d3a194d95fc8d535 e249056c91a2f14ee40de2bf24c 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  16952975 ±113%    +479.0%   98151856 ± 20%  cpuidle..usage
   6298915 ± 11%     +61.8%   10188752 ±  5%  vmstat.system.cs
    522158 ± 18%     +59.9%     835109 ±  6%  vmstat.system.in
      0.43 ± 32%      +0.2        0.67 ±  2%  mpstat.cpu.all.irq%
      0.06 ± 11%      -0.0        0.06 ±  7%  mpstat.cpu.all.soft%
      6.40 ±  2%      +1.3        7.74 ±  5%  mpstat.cpu.all.usr%
    143216 ± 23%     -71.1%      41346 ± 87%  numa-numastat.node0.other_node
   1203882 ± 13%     +73.8%    2092918 ± 29%  numa-numastat.node1.numa_hit
     55987 ± 58%    +180.9%     157244 ± 23%  numa-numastat.node1.other_node
      1042 ± 35%     -82.7%     180.83 ± 21%  perf-c2c.DRAM.local
     40886 ± 71%    +138.2%      97387 ± 23%  perf-c2c.HITM.local
     46261 ± 60%    +119.4%     101476 ± 23%  perf-c2c.HITM.total
   1835281 ± 25%    +151.0%    4606463 ± 38%  numa-meminfo.node1.Active
   1835281 ± 25%    +151.0%    4606463 ± 38%  numa-meminfo.node1.Active(anon)
    300616 ± 82%     +63.6%     491945 ± 44%  numa-meminfo.node1.AnonPages
   1535692 ± 22%    +168.0%    4115480 ± 41%  numa-meminfo.node1.Shmem
 2.507e+08 ±  9%     +94.3%  4.871e+08 ±  5%  stress-ng.mq.ops
   4178927 ±  9%     +94.3%    8118700 ±  5%  stress-ng.mq.ops_per_sec
     18053 ±  3%      -7.4%      16709        stress-ng.time.percent_of_cpu_this_job_got
     10197 ±  3%      -9.4%       9242        stress-ng.time.system_time
    688.89 ±  2%     +19.7%     824.66 ±  5%  stress-ng.time.user_time
 2.076e+08 ±  8%     +64.9%  3.423e+08 ±  5%  stress-ng.time.voluntary_context_switches
   2440860 ± 12%    +105.3%    5012226 ± 35%  meminfo.Active
   2440860 ± 12%    +105.3%    5012226 ± 35%  meminfo.Active(anon)
   5221055 ±  5%     +48.7%    7762119 ± 22%  meminfo.Cached
   7184748 ±  3%     +36.1%    9777020 ± 18%  meminfo.Committed_AS
    361568 ±  3%     +47.5%     533427 ± 23%  meminfo.Mapped
   9552329 ±  3%     +28.1%   12232469 ± 14%  meminfo.Memused
   1692979 ± 17%    +150.1%    4234070 ± 41%  meminfo.Shmem
   9605594 ±  2%     +28.3%   12319244 ± 14%  meminfo.max_used_kB
      4885 ± 48%     +33.7%       6532 ± 38%  numa-vmstat.node0.nr_page_table_pages
    143216 ± 23%     -71.1%      41345 ± 87%  numa-vmstat.node0.numa_other
    460013 ± 25%    +149.0%    1145233 ± 38%  numa-vmstat.node1.nr_active_anon
     75283 ± 82%     +63.0%     122733 ± 44%  numa-vmstat.node1.nr_anon_pages
    384991 ± 22%    +165.7%    1022742 ± 41%  numa-vmstat.node1.nr_shmem
    460006 ± 25%    +149.0%    1145231 ± 38%  numa-vmstat.node1.nr_zone_active_anon
   1204935 ± 13%     +73.1%    2086163 ± 29%  numa-vmstat.node1.numa_hit
     55987 ± 58%    +180.9%     157244 ± 23%  numa-vmstat.node1.numa_other
      0.05 ± 72%     -79.9%       0.01 ± 67%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc.d_alloc_parallel
      0.06 ±129%     -80.3%       0.01 ± 56%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.mqueue_alloc_inode.alloc_inode.new_inode
    530.28 ± 31%     -54.3%     242.29 ± 61%  perf-sched.sch_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
      0.28 ± 16%     -68.9%       0.09 ± 70%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend
      0.09 ± 88%     -81.0%       0.02 ± 64%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc.d_alloc_parallel
      1868 ± 67%     -73.6%     492.86 ±104%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1269 ± 25%     -49.9%     635.66 ± 59%  perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
      0.77 ± 15%     -60.9%       0.30 ± 75%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend
      3770 ± 66%     -73.8%     989.52 ±103%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.21 ±102%     -88.8%       0.02 ± 93%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.mqueue_alloc_inode.alloc_inode.new_inode
    739.22 ± 22%     -46.8%     393.37 ± 58%  perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
      1919 ± 64%     -73.7%     504.99 ±100%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    608977 ± 11%    +105.5%    1251722 ± 35%  proc-vmstat.nr_active_anon
    188245            +4.0%     195771 ±  2%  proc-vmstat.nr_anon_pages
   1303984 ±  5%     +48.7%    1939187 ± 22%  proc-vmstat.nr_file_pages
     90548 ±  4%     +47.9%     133892 ± 23%  proc-vmstat.nr_mapped
    421965 ± 16%    +150.5%    1057174 ± 41%  proc-vmstat.nr_shmem
     41883            +4.5%      43768 ±  2%  proc-vmstat.nr_slab_reclaimable
    122762            +1.7%     124807        proc-vmstat.nr_slab_unreclaimable
    608977 ± 11%    +105.5%    1251722 ± 35%  proc-vmstat.nr_zone_active_anon
     39944 ± 15%    +190.1%     115861 ± 59%  proc-vmstat.numa_hint_faults
     27410 ± 19%    +281.4%     104548 ± 69%  proc-vmstat.numa_hint_faults_local
   1684470 ±  8%     +58.2%    2665570 ± 23%  proc-vmstat.numa_hit
   1485253 ±  9%     +66.1%    2466944 ± 25%  proc-vmstat.numa_local
    102341 ± 28%     +62.0%     165807 ± 36%  proc-vmstat.numa_pte_updates
   1751319 ±  7%     +57.3%    2754572 ± 23%  proc-vmstat.pgalloc_normal
    609827 ±  2%     +16.2%     708345 ± 11%  proc-vmstat.pgfault
      0.45 ±  7%     +20.7%       0.55        sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.43 ±  6%     +18.6%       0.51 ±  2%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
    267.72 ± 12%     +27.2%     340.40 ±  3%  sched_debug.cfs_rq:/.util_est.stddev
    586830 ±  4%     -10.6%     524667 ±  3%  sched_debug.cpu.avg_idle.avg
   1735827 ± 29%     -33.7%    1150356 ±  8%  sched_debug.cpu.avg_idle.max
     15839 ±143%     -77.0%       3638 ± 10%  sched_debug.cpu.avg_idle.min
    139.91 ± 42%     -86.4%      19.08 ± 21%  sched_debug.cpu.clock.stddev
     24838 ± 35%     +87.7%      46614 ±  8%  sched_debug.cpu.curr->pid.max
      2342 ± 22%     +63.1%       3820 ± 10%  sched_debug.cpu.curr->pid.stddev
    631455 ± 10%     -19.1%     510552        sched_debug.cpu.max_idle_balance_cost.avg
   1697254 ± 18%     -53.8%     784378 ± 19%  sched_debug.cpu.max_idle_balance_cost.max
    175055 ± 25%     -80.0%      35047 ± 47%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 48%     -81.3%       0.00 ± 35%  sched_debug.cpu.next_balance.stddev
      0.44 ±  9%     +23.6%       0.54 ±  4%  sched_debug.cpu.nr_running.stddev
   1043526 ± 11%     +59.3%    1662017 ±  5%  sched_debug.cpu.nr_switches.avg
   1390822 ±  6%     +62.8%    2263820 ± 13%  sched_debug.cpu.nr_switches.max
      5.88 ±  8%     +31.3%       7.72 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
 1.336e+10 ±  5%     +63.8%  2.188e+10 ±  4%  perf-stat.i.branch-instructions
 1.059e+08 ±  7%     +66.9%  1.767e+08 ±  5%  perf-stat.i.branch-misses
  11257488 ±  7%     +73.7%   19553240 ± 19%  perf-stat.i.cache-misses
  1.11e+08 ± 87%    +281.9%  4.239e+08 ± 11%  perf-stat.i.cache-references
   6566144 ± 12%     +62.1%   10640456 ±  5%  perf-stat.i.context-switches
      9.35 ±  4%     -43.6%       5.28 ±  5%  perf-stat.i.cpi
    119675 ±145%    +424.8%     628084 ±  8%  perf-stat.i.cpu-migrations
     55311 ±  7%     -40.5%      32929 ± 13%  perf-stat.i.cycles-between-cache-misses
 6.609e+10 ±  5%     +64.8%  1.089e+11 ±  4%  perf-stat.i.instructions
      0.13 ±  8%     +56.9%       0.21 ±  4%  perf-stat.i.ipc
     34.67 ± 10%     +69.3%      58.71 ±  5%  perf-stat.i.metric.K/sec
      0.10 ± 45%    +102.6%       0.21 ±  4%  perf-stat.overall.ipc
 1.074e+10 ± 45%     +99.5%  2.144e+10 ±  4%  perf-stat.ps.branch-instructions
  85818545 ± 45%    +102.0%  1.733e+08 ±  5%  perf-stat.ps.branch-misses
   9043150 ± 45%    +111.7%   19144861 ± 19%  perf-stat.ps.cache-misses
 1.008e+08 ±101%    +313.5%  4.169e+08 ± 11%  perf-stat.ps.cache-references
   5233955 ± 46%    +100.1%   10474383 ±  5%  perf-stat.ps.context-switches
    117697 ±146%    +425.9%     618947 ±  8%  perf-stat.ps.cpu-migrations
 5.317e+10 ± 45%    +100.8%  1.067e+11 ±  4%  perf-stat.ps.instructions
      5717 ± 44%     +52.3%       8706 ± 16%  perf-stat.ps.minor-faults
      5717 ± 44%     +52.3%       8707 ± 16%  perf-stat.ps.page-faults
 3.319e+12 ± 44%     +98.7%  6.593e+12 ±  5%  perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ