lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8f5e8ae0-dfaf-4b33-ae79-ca6065dc96ec@lucifer.local>
Date: Tue, 8 Oct 2024 09:44:24 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Oliver Sang <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mark Brown <broonie@...nel.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Bert Karwatzki <spasswolf@....de>,
        Jeff Xu <jeffxu@...omium.org>, Jiri Olsa <olsajiri@...il.com>,
        Kees Cook <kees@...nel.org>, Lorenzo Stoakes <lstoakes@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Paul Moore <paul@...l-moore.com>,
        Sidhartha Kumar <sidhartha.kumar@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
        ying.huang@...el.com, feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec
 -5.0% regression

On Tue, Oct 08, 2024 at 04:31:59PM +0800, Oliver Sang wrote:
> hi, Lorenzo Stoakes,
>
> sorry for late, we are in holidays last week.
>
> On Mon, Sep 30, 2024 at 09:21:52AM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 30, 2024 at 10:21:27AM GMT, kernel test robot wrote:
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:
> > >
> > >
> > > commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > testcase: aim9
> > > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> >
> > Hm, quite an old microarchitecture no?
> >
> > Would it be possible to try this on a range of uarch's, especially more
> > recent noes, with some repeated runs to rule out statistical noise? Much
> > appreciated!
>
> we run this test on below platforms, and observed similar regression.
> one thing I want to mention is for performance tests, we run one commit at least
> 6 times. for this aim9 test, the data is quite stable, so there is no %stddev
> value in our table. we won't show this value if it's <2%

Thanks, though I do suggest going forward it's worth adding the number even
if it's <2% or highlighting that, I found that quite misleading.

Also might I suggest reporting the most recent uarch first? As this seeming
to be ivy bridge only delayed my responding to this (not to sound
ungrateful for the report, which is very useful, but it'd be great if you
guys could test in -next, as this was there for weeks with no apparent
issues).

I will look into this now, if I provide patches would you be able to test
them using the same boxes? It'd be much appreciated!

Thanks, Lorenzo

>
> (1)
>
> model: Granite Rapids
> nr_node: 1
> nr_cpu: 240
> memory: 192G
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-gnr-1ap1/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    3220697            -6.0%    3028867        aim9.brk_test.ops_per_sec
>
>
> (2)
>
> model: Emerald Rapids
> nr_node: 4
> nr_cpu: 256
> memory: 256G
> brand: INTEL(R) XEON(R) PLATINUM 8592+
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    3669298            -6.5%    3430070        aim9.brk_test.ops_per_sec
>
>
> (3)
>
> model: Sapphire Rapids
> nr_node: 2
> nr_cpu: 224
> memory: 512G
> brand: Intel(R) Xeon(R) Platinum 8480CTDX
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    3540976            -6.4%    3314159        aim9.brk_test.ops_per_sec
>
>
> (4)
>
> model: Ice Lake
> nr_node: 2
> nr_cpu: 64
> memory: 256G
> brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    2667734            -5.6%    2518021        aim9.brk_test.ops_per_sec
>
>
> >
> > > parameters:
> > >
> > > 	testtime: 300s
> > > 	test: brk_test
> > > 	cpufreq_governor: performance
> > >
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202409301043.629bea78-oliver.sang@intel.com
> > >
> > >
> > > Details are as below:
> > > -------------------------------------------------------------------------------------------------->
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20240930/202409301043.629bea78-oliver.sang@intel.com
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> > >   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s
> > >
> > > commit:
> > >   fc21959f74 ("mm: abstract vma_expand() to use vma_merge_struct")
> > >   cacded5e42 ("mm: avoid using vma_merge() for new VMAs")
> >
> > Yup this results in a different code path for brk(), but local testing
> > indicated no regression (a prior revision of the series had encountered
> > one, so I carefully assessed this, found the bug, and noted no clear
> > regression after this - but a lot of variance in the numbers).
> >
> > >
> > > fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >    1322908            -5.0%    1256536        aim9.brk_test.ops_per_sec
> >
> > Unfortunate there's no stddev figure here, and 5% feels borderline on noise
> > - as above it'd be great to get some multiple runs going to rule out
> > noise. Thanks!
>
> as above mentioned, the reason there is no %stddev here is it's <2%
>
> just list raw data FYI.
>
> for cacded5e42b9609b07b22d80c10
>
>   "aim9.brk_test.ops_per_sec": [
>     1268030.0,
>     1277110.76,
>     1226452.45,
>     1275850.0,
>     1249628.35,
>     1242148.6
>   ],
>
>
> for fc21959f74bc1138
>
>   "aim9.brk_test.ops_per_sec": [
>     1351624.95,
>     1316322.79,
>     1330363.33,
>     1289563.33,
>     1314100.0,
>     1335475.48
>   ],
>
>
> >
> > >     201.54            +2.9%     207.44        aim9.time.system_time
> > >      97.58            -6.0%      91.75        aim9.time.user_time
> > >       0.04 ± 82%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.10 ± 60%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.04 ± 82%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.10 ± 60%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >   8.33e+08            +3.9%  8.654e+08        perf-stat.i.branch-instructions
> > >       1.15            -0.1        1.09        perf-stat.i.branch-miss-rate%
> > >   12964626            -1.9%   12711922        perf-stat.i.branch-misses
> > >       1.11            -7.4%       1.03        perf-stat.i.cpi
> > >  3.943e+09            +6.0%   4.18e+09        perf-stat.i.instructions
> > >       0.91            +7.9%       0.98        perf-stat.i.ipc
> > >       0.29 ±  2%      -9.1%       0.27 ±  4%  perf-stat.overall.MPKI
> > >       1.56            -0.1        1.47        perf-stat.overall.branch-miss-rate%
> > >       1.08            -6.8%       1.01        perf-stat.overall.cpi
> > >       0.92            +7.2%       0.99        perf-stat.overall.ipc
> > >  8.303e+08            +3.9%  8.627e+08        perf-stat.ps.branch-instructions
> > >   12931205            -2.0%   12678170        perf-stat.ps.branch-misses
> > >   3.93e+09            +6.0%  4.167e+09        perf-stat.ps.instructions
> > >  1.184e+12            +6.1%  1.256e+12        perf-stat.total.instructions
> > >       7.16 ±  2%      -0.4        6.76 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
> > >       5.72 ±  2%      -0.4        5.35 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       6.13 ±  2%      -0.3        5.84 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       0.83 ± 11%      -0.1        0.71 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
> > >      16.73 ±  2%      +0.6       17.34        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> > >       0.00            +0.7        0.66 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
> > >      24.21            +0.7       24.90        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> > >      23.33            +0.7       24.05 ±  2%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> > >       0.00            +0.8        0.82 ±  4%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
> > >       0.00            +1.1        1.07 ±  9%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +1.1        1.10 ±  6%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +2.3        2.26 ±  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +7.6        7.56 ±  3%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.00            +8.6        8.62 ±  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       7.74 ±  2%      -0.4        7.30 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       5.81 ±  2%      -0.4        5.43 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap_event
> > >       6.18 ±  2%      -0.3        5.88 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap
> > >       3.93            -0.2        3.73 ±  3%  perf-profile.children.cycles-pp.perf_iterate_sb
> > >       0.22 ± 29%      -0.1        0.08 ± 17%  perf-profile.children.cycles-pp.may_expand_vm
> > >       0.96 ±  3%      -0.1        0.83 ±  4%  perf-profile.children.cycles-pp.vma_complete
> > >       0.61 ± 14%      -0.1        0.52 ±  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
> > >       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.children.cycles-pp.brk_test
> > >       0.08 ± 11%      +0.0        0.12 ± 14%  perf-profile.children.cycles-pp.mas_prev_setup
> > >       0.17 ± 12%      +0.1        0.27 ± 10%  perf-profile.children.cycles-pp.mas_wr_store_entry
> > >       0.00            +0.2        0.15 ± 11%  perf-profile.children.cycles-pp.mas_next_range
> > >       0.19 ±  8%      +0.2        0.38 ± 10%  perf-profile.children.cycles-pp.mas_next_slot
> > >       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
> > >      23.40            +0.7       24.12 ±  2%  perf-profile.children.cycles-pp.__do_sys_brk
> > >       0.00            +7.6        7.59 ±  3%  perf-profile.children.cycles-pp.vma_expand
> > >       0.00            +8.7        8.66 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
> > >       1.61 ± 10%      -0.9        0.69 ±  8%  perf-profile.self.cycles-pp.do_brk_flags
> > >       7.64 ±  2%      -0.4        7.20 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       0.22 ± 30%      -0.1        0.08 ± 17%  perf-profile.self.cycles-pp.may_expand_vm
> > >       0.57 ± 15%      -0.1        0.46 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
> > >       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.self.cycles-pp.brk_test
> > >       0.20 ±  5%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
> > >       0.07 ± 18%      +0.0        0.10 ± 18%  perf-profile.self.cycles-pp.mas_prev_setup
> > >       0.00            +0.1        0.09 ± 12%  perf-profile.self.cycles-pp.mas_next_range
> > >       0.36 ±  8%      +0.1        0.45 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
> > >       0.15 ± 13%      +0.1        0.25 ± 14%  perf-profile.self.cycles-pp.mas_wr_store_entry
> > >       0.17 ± 11%      +0.2        0.37 ± 11%  perf-profile.self.cycles-pp.mas_next_slot
> > >       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
> > >       0.00            +0.3        0.33 ±  5%  perf-profile.self.cycles-pp.vma_merge_new_range
> > >       0.00            +0.8        0.81 ±  9%  perf-profile.self.cycles-pp.vma_expand
> > >
> > >
> > >
> > >
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >
> >
> > Overall, previously we special-cased brk() to avoid regression, but the
> > special-casing is horribly duplicative and bug-prone so, while we can
> > revert to doing that again, I'd really, really like to avoid it if we
> > possibly can :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ