[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250129115320.1334ad5f@fangorn>
Date: Wed, 29 Jan 2025 11:53:20 -0500
From: Rik van Riel <riel@...riel.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Matthew Wilcox <willy@...radead.org>, Qi Zheng
<zhengqi.arch@...edance.com>, Peter Zijlstra <peterz@...radead.org>, David
Hildenbrand <david@...hat.com>, kernel test robot <oliver.sang@...el.com>,
oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org, Andrew
Morton <akpm@...ux-foundation.org>, Dave Hansen
<dave.hansen@...ux.intel.com>, Andy Lutomirski <luto@...nel.org>, Catalin
Marinas <catalin.marinas@....com>, David Rientjes <rientjes@...gle.com>,
Hugh Dickins <hughd@...gle.com>, Jann Horn <jannh@...gle.com>, Lorenzo
Stoakes <lorenzo.stoakes@...cle.com>, Mel Gorman <mgorman@...e.de>, Muchun
Song <muchun.song@...ux.dev>, Peter Xu <peterx@...hat.com>, Will Deacon
<will@...nel.org>, Zach O'Keefe <zokeefe@...gle.com>, Dan Carpenter
<dan.carpenter@...aro.org>, Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>
Subject: Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per_sec
63.0% regression
On Wed, 29 Jan 2025 08:36:12 -0800
"Paul E. McKenney" <paulmck@...nel.org> wrote:
> On Wed, Jan 29, 2025 at 11:14:29AM -0500, Rik van Riel wrote:
> > Paul, does this look like it could do the trick,
> > or do we need something else to make RCU freeing
> > happy again?
>
> I don't claim to fully understand the issue, but this would prevent
> any RCU grace periods starting subsequently from completing. It would
> not prevent RCU callbacks from being invoked for RCU grace periods that
> started earlier.
>
> So it won't prevent RCU callbacks from being invoked.
That makes things clear! I guess we need a different approach.
Qi, does the patch below resolve the regression for you?
---8<---
From 5de4fa686fca15678a7e0a186852f921166854a3 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@...riel.com>
Date: Wed, 29 Jan 2025 10:51:51 -0500
Subject: [PATCH 2/2] mm,rcu: prevent RCU callbacks from running with pcp lock
held
Enabling MMU_GATHER_RCU_TABLE_FREE can create contention on the
zone->lock. This turns out to be because in some configurations
RCU callbacks are called when IRQs are re-enabled inside
rmqueue_bulk, while the CPU is still holding the per-cpu pages lock.
That results in the RCU callbacks being unable to grab the
PCP lock, and taking the slow path with the zone->lock for
each item freed.
Speed things up by blocking RCU callbacks while holding the
PCP lock.
Signed-off-by: Rik van Riel <riel@...riel.com>
Suggested-by: Paul McKenney <paulmck@...nel.org>
Reported-by: Qi Zheng <zhengqi.arch@...edance.com>
---
mm/page_alloc.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e469c7ef9a4..73e334f403fd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -94,11 +94,15 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
/*
- * On SMP, spin_trylock is sufficient protection.
+ * On SMP, spin_trylock is sufficient protection against recursion.
* On PREEMPT_RT, spin_trylock is equivalent on both SMP and UP.
+ *
+ * Block softirq execution to prevent RCU frees from running in softirq
+ * context while this CPU holds the PCP lock, which could result in a whole
+ * bunch of frees contending on the zone->lock.
*/
-#define pcp_trylock_prepare(flags) do { } while (0)
-#define pcp_trylock_finish(flag) do { } while (0)
+#define pcp_trylock_prepare(flags) local_bh_disable()
+#define pcp_trylock_finish(flag) local_bh_enable()
#else
/* UP spin_trylock always succeeds so disable IRQs to prevent re-entrancy. */
--
2.47.1
Powered by blists - more mailing lists