linux-kernel - Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <168f6e8d-926e-4a6c-9554-e2d606ebbead@paulmck-laptop>
Date: Wed, 29 Jan 2025 09:28:04 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Rik van Riel <riel@...riel.com>
Cc: Matthew Wilcox <willy@...radead.org>,
	Qi Zheng <zhengqi.arch@...edance.com>,
	Peter Zijlstra <peterz@...radead.org>,
	David Hildenbrand <david@...hat.com>,
	kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
	lkp@...el.com, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Andy Lutomirski <luto@...nel.org>,
	Catalin Marinas <catalin.marinas@....com>,
	David Rientjes <rientjes@...gle.com>,
	Hugh Dickins <hughd@...gle.com>, Jann Horn <jannh@...gle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Mel Gorman <mgorman@...e.de>, Muchun Song <muchun.song@...ux.dev>,
	Peter Xu <peterx@...hat.com>, Will Deacon <will@...nel.org>,
	Zach O'Keefe <zokeefe@...gle.com>,
	Dan Carpenter <dan.carpenter@...aro.org>,
	Frederic Weisbecker <frederic@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>
Subject: Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per_sec
 63.0% regression

On Wed, Jan 29, 2025 at 08:36:12AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 29, 2025 at 11:14:29AM -0500, Rik van Riel wrote:
> > On Wed, 2025-01-29 at 16:12 +0000, Matthew Wilcox wrote:
> > > On Wed, Jan 29, 2025 at 10:59:20AM -0500, Rik van Riel wrote:
> > > > Below is a tentative fix for the issue. It is kind of a big hammer,
> > > > and maybe the RCU people have a better idea on how to solve this
> > > > problem, but it may be worth giving this a try to see if it helps
> > > > with the regression you identified.
> > > 
> > > Perhaps better to do:
> > > 
> > > +++ b/mm/page_alloc.c
> > > @@ -97,8 +97,8 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
> > >   * On SMP, spin_trylock is sufficient protection.
> > >   * On PREEMPT_RT, spin_trylock is equivalent on both SMP and UP.
> > >   */
> > > -#define pcp_trylock_prepare(flags)     do { } while (0)
> > > -#define pcp_trylock_finish(flag)       do { } while (0)
> > > +#define pcp_trylock_prepare(flags)     rcu_read_lock()
> > > +#define pcp_trylock_finish(flag)       rcu_reada_unlock()
> > >  #else
> > > 
> > >  /* UP spin_trylock always succeeds so disable IRQs to prevent re-
> > > entrancy. */
> > > 
> > > with appropriate comment changes
> > > 
> > Agreed. Assuming this change even works :)
> > 
> > Paul, does this look like it could do the trick,
> > or do we need something else to make RCU freeing
> > happy again?
> 
> I don't claim to fully understand the issue, but this would prevent
> any RCU grace periods starting subsequently from completing.  It would
> not prevent RCU callbacks from being invoked for RCU grace periods that
> started earlier.
> 
> So it won't prevent RCU callbacks from being invoked.
> 
> It *will* ensure that only a finite number of RCU callbacks get invoked.
> For some perhaps rather large value of "finite".
> 
> Does that help, or is more required?

Would it make sense to force softirq processing to ksoftirqd on the
current CPU during the time that the pcp lock is held?  (I am not sure
that we have an API to do this, but it might be simpler than hacking every
code sequence during which mass freeing of memory from back-of-interrupt
softirq context, even if it needs to be implemented from scratch.)

							Thanx, Paul