[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202506102254.13cda0af-lkp@intel.com>
Date: Tue, 10 Jun 2025 22:38:45 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Suren Baghdasaryan <surenb@...gle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>, Lorenzo Stoakes
<lorenzo.stoakes@...cle.com>, Shivank Garg <shivankg@....com>, "Vlastimil
Babka" <vbabka@...e.cz>, Christian Brauner <brauner@...nel.org>, "David
Hildenbrand" <david@...hat.com>, David Howells <dhowells@...hat.com>,
Davidlohr Bueso <dave@...olabs.net>, Hugh Dickins <hughd@...gle.com>, "Jann
Horn" <jannh@...gle.com>, Johannes Weiner <hannes@...xchg.org>, "Jonathan
Corbet" <corbet@....net>, Klara Modin <klarasmodin@...il.com>, "Liam R.
Howlett" <Liam.Howlett@...cle.com>, Lokesh Gidra <lokeshgidra@...gle.com>,
Mateusz Guzik <mjguzik@...il.com>, Matthew Wilcox <willy@...radead.org>, "Mel
Gorman" <mgorman@...hsingularity.net>, Michal Hocko <mhocko@...e.com>,
"Minchan Kim" <minchan@...gle.com>, Oleg Nesterov <oleg@...hat.com>, Pasha
Tatashin <pasha.tatashin@...een.com>, "Paul E . McKenney"
<paulmck@...nel.org>, "Peter Xu" <peterx@...hat.com>, Peter Zijlstra
<peterz@...radead.org>, Shakeel Butt <shakeel.butt@...ux.dev>, Sourav Panda
<souravpanda@...gle.com>, Wei Yang <richard.weiyang@...il.com>, Will Deacon
<will@...nel.org>, Heiko Carstens <hca@...ux.ibm.com>, Stephen Rothwell
<sfr@...b.auug.org.au>, <linux-mm@...ck.org>, <oliver.sang@...el.com>
Subject: [linus:master] [mm] 6bef4c2f97: stress-ng.mlockmany.ops_per_sec
5.2% improvement
Hello,
kernel test robot noticed a 5.2% improvement of stress-ng.mlockmany.ops_per_sec on:
commit: 6bef4c2f97221f3b595d08c8656eb5845ef80fe9 ("mm: move lesser used vma_area_struct members into the last cacheline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: mlockmany
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250610/202506102254.13cda0af-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mlockmany/stress-ng/60s
commit:
f35ab95ca0 ("mm: replace vm_lock and detached flag with a reference count")
6bef4c2f97 ("mm: move lesser used vma_area_struct members into the last cacheline")
f35ab95ca0af7a27 6bef4c2f97221f3b595d08c8656
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.66 ± 5% -0.1 0.57 ± 9% mpstat.cpu.all.soft%
27183 +1.9% 27708 vmstat.system.cs
264643 +5.2% 278326 stress-ng.mlockmany.ops
4406 +5.2% 4634 stress-ng.mlockmany.ops_per_sec
314509 +4.9% 329874 stress-ng.time.voluntary_context_switches
343582 -3.7% 330742 ± 2% proc-vmstat.nr_active_anon
454064 -2.7% 441886 proc-vmstat.nr_anon_pages
54743 -3.5% 52828 proc-vmstat.nr_slab_unreclaimable
343583 -3.7% 330741 ± 2% proc-vmstat.nr_zone_active_anon
1.99 ± 8% -14.0% 1.72 ± 12% sched_debug.cfs_rq:/.h_nr_queued.stddev
1.98 ± 8% -13.9% 1.71 ± 12% sched_debug.cfs_rq:/.h_nr_runnable.stddev
0.00 ± 18% -24.8% 0.00 ± 20% sched_debug.cpu.next_balance.stddev
1.99 ± 8% -13.8% 1.72 ± 12% sched_debug.cpu.nr_running.stddev
0.25 +0.0 0.25 perf-stat.i.branch-miss-rate%
21663531 +1.7% 22033919 perf-stat.i.branch-misses
27855 +1.8% 28352 perf-stat.i.context-switches
0.25 +0.0 0.25 perf-stat.overall.branch-miss-rate%
21319615 +1.7% 21691011 perf-stat.ps.branch-misses
27388 +1.7% 27866 perf-stat.ps.context-switches
19.64 ± 7% -18.7% 15.97 ± 11% perf-sched.sch_delay.avg.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node.dup_task_struct
11.34 ± 8% -13.5% 9.80 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.__mm_populate.do_mlock.__x64_sys_mlock
17.11 ± 4% -8.2% 15.70 ± 5% perf-sched.sch_delay.avg.ms.__cond_resched.mlock_pte_range.walk_pmd_range.isra.0
10.51 ± 10% +35.6% 14.26 ± 15% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
52.76 ± 22% -31.2% 36.28 ± 18% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
50.19 ± 7% -26.9% 36.68 ± 45% perf-sched.wait_and_delay.avg.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node.dup_task_struct
23.36 ± 9% -14.2% 20.03 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.down_read.__mm_populate.do_mlock.__x64_sys_mlock
51.05 ± 10% -34.3% 33.53 ± 45% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
245.67 ± 6% -47.6% 128.83 ± 4% perf-sched.wait_and_delay.count.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
286.83 ± 7% -21.0% 226.67 ± 5% perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.anon_vma_fork.dup_mmap
120.67 ± 9% +32.6% 160.00 ± 8% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.anon_vma_fork
225.41 ± 31% -33.7% 149.44 ± 7% perf-sched.wait_and_delay.max.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
77.77 ± 73% +79.0% 139.22 ± 15% perf-sched.wait_and_delay.max.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0
12.02 ± 11% -14.9% 10.23 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.down_read.__mm_populate.do_mlock.__x64_sys_mlock
31.78 ± 18% -31.9% 21.63 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
16.57 ± 5% -9.3% 15.03 ± 5% perf-sched.wait_time.avg.ms.__cond_resched.mlock_pte_range.walk_pmd_range.isra.0
25.21 ± 7% +12.4% 28.34 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
24.68 ± 29% +39.0% 34.31 ± 15% perf-sched.wait_time.avg.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0
207.48 ± 35% -32.5% 140.02 ± 6% perf-sched.wait_time.max.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
70.62 ± 41% +75.6% 124.03 ± 15% perf-sched.wait_time.max.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists