linux-kernel - Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250129115320.1334ad5f@fangorn>
Date: Wed, 29 Jan 2025 11:53:20 -0500
From: Rik van Riel <riel@...riel.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Matthew Wilcox <willy@...radead.org>, Qi Zheng
 <zhengqi.arch@...edance.com>, Peter Zijlstra <peterz@...radead.org>, David
 Hildenbrand <david@...hat.com>, kernel test robot <oliver.sang@...el.com>,
 oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org, Andrew
 Morton <akpm@...ux-foundation.org>, Dave Hansen
 <dave.hansen@...ux.intel.com>, Andy Lutomirski <luto@...nel.org>, Catalin
 Marinas <catalin.marinas@....com>, David Rientjes <rientjes@...gle.com>,
 Hugh Dickins <hughd@...gle.com>, Jann Horn <jannh@...gle.com>, Lorenzo
 Stoakes <lorenzo.stoakes@...cle.com>, Mel Gorman <mgorman@...e.de>, Muchun
 Song <muchun.song@...ux.dev>, Peter Xu <peterx@...hat.com>, Will Deacon
 <will@...nel.org>, Zach O'Keefe <zokeefe@...gle.com>, Dan Carpenter
 <dan.carpenter@...aro.org>, Frederic Weisbecker <frederic@...nel.org>,
 Neeraj Upadhyay <neeraj.upadhyay@...nel.org>
Subject: Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per_sec
 63.0% regression

On Wed, 29 Jan 2025 08:36:12 -0800
"Paul E. McKenney" <paulmck@...nel.org> wrote:
> On Wed, Jan 29, 2025 at 11:14:29AM -0500, Rik van Riel wrote:

> > Paul, does this look like it could do the trick,
> > or do we need something else to make RCU freeing
> > happy again?  
> 
> I don't claim to fully understand the issue, but this would prevent
> any RCU grace periods starting subsequently from completing.  It would
> not prevent RCU callbacks from being invoked for RCU grace periods that
> started earlier.
> 
> So it won't prevent RCU callbacks from being invoked.

That makes things clear! I guess we need a different approach.

Qi, does the patch below resolve the regression for you?

---8<---

From 5de4fa686fca15678a7e0a186852f921166854a3 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@...riel.com>
Date: Wed, 29 Jan 2025 10:51:51 -0500
Subject: [PATCH 2/2] mm,rcu: prevent RCU callbacks from running with pcp lock
 held

Enabling MMU_GATHER_RCU_TABLE_FREE can create contention on the
zone->lock.  This turns out to be because in some configurations
RCU callbacks are called when IRQs are re-enabled inside
rmqueue_bulk, while the CPU is still holding the per-cpu pages lock.

That results in the RCU callbacks being unable to grab the
PCP lock, and taking the slow path with the zone->lock for
each item freed.

Speed things up by blocking RCU callbacks while holding the
PCP lock.

Signed-off-by: Rik van Riel <riel@...riel.com>
Suggested-by: Paul McKenney <paulmck@...nel.org>
Reported-by: Qi Zheng <zhengqi.arch@...edance.com>
---
 mm/page_alloc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e469c7ef9a4..73e334f403fd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -94,11 +94,15 @@ static DEFINE_MUTEX(pcp_batch_high_lock);

 #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
 /*
- * On SMP, spin_trylock is sufficient protection.
+ * On SMP, spin_trylock is sufficient protection against recursion.
  * On PREEMPT_RT, spin_trylock is equivalent on both SMP and UP.
+ *
+ * Block softirq execution to prevent RCU frees from running in softirq
+ * context while this CPU holds the PCP lock, which could result in a whole
+ * bunch of frees contending on the zone->lock.
  */
-#define pcp_trylock_prepare(flags)	do { } while (0)
-#define pcp_trylock_finish(flag)	do { } while (0)
+#define pcp_trylock_prepare(flags)	local_bh_disable()
+#define pcp_trylock_finish(flag)	local_bh_enable()
 #else

 /* UP spin_trylock always succeeds so disable IRQs to prevent re-entrancy. */
-- 
2.47.1