[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250129105920.7a4bffa1@fangorn>
Date: Wed, 29 Jan 2025 10:59:20 -0500
From: Rik van Riel <riel@...riel.com>
To: Qi Zheng <zhengqi.arch@...edance.com>
Cc: Peter Zijlstra <peterz@...radead.org>, David Hildenbrand
<david@...hat.com>, kernel test robot <oliver.sang@...el.com>,
oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org, Andrew
Morton <akpm@...ux-foundation.org>, Dave Hansen
<dave.hansen@...ux.intel.com>, Andy Lutomirski <luto@...nel.org>, Catalin
Marinas <catalin.marinas@....com>, David Rientjes <rientjes@...gle.com>,
Hugh Dickins <hughd@...gle.com>, Jann Horn <jannh@...gle.com>, Lorenzo
Stoakes <lorenzo.stoakes@...cle.com>, Matthew Wilcox <willy@...radead.org>,
Mel Gorman <mgorman@...e.de>, Muchun Song <muchun.song@...ux.dev>, Peter Xu
<peterx@...hat.com>, Will Deacon <will@...nel.org>, Zach O'Keefe
<zokeefe@...gle.com>, Dan Carpenter <dan.carpenter@...aro.org>, "Paul E.
McKenney" <paulmck@...nel.org>, Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>
Subject: Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per_sec
63.0% regression
On Wed, 29 Jan 2025 16:14:01 +0800
Qi Zheng <zhengqi.arch@...edance.com> wrote:
>
> It seems that the pcp lock is held when doing tlb_remove_table_rcu(), so
> trylock fails, then bypassing PCP and calling free_one_page() directly,
> which leads to the hot spot of zone lock.
Below is a tentative fix for the issue. It is kind of a big hammer,
and maybe the RCU people have a better idea on how to solve this
problem, but it may be worth giving this a try to see if it helps
with the regression you identified.
---8<---
From 2b0302f821d1fc94c968ac533dcc62b9ffe00c38 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@...riel.com>
Date: Wed, 29 Jan 2025 10:51:51 -0500
Subject: [PATCH 2/2] mm,rcu: prevent RCU callbacks from running with pcp lock
held
Enabling MMU_GATHER_RCU_TABLE_FREE can create contention on the
zone->lock. This turns out to be because in some configurations
RCU callbacks are called when IRQs are re-enabled inside
rmqueue_bulk, while the CPU is still holding the per-cpu pages lock.
That results in the RCU callbacks being unable to grab the
PCP lock, and taking the slow path with the zone->lock for
each item freed.
Speed things up by blocking RCU callbacks while holding the
PCP lock.
Signed-off-by: Rik van Riel <riel@...riel.com>
Reported-by: Qi Zheng <zhengqi.arch@...edance.com>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e469c7ef9a4..b3c4002ab0ab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3036,6 +3036,13 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
return NULL;
}
+ /*
+ * Prevent RCU callbacks from being run from the spin_lock_irqrestore
+ * inside rmqueue_bulk, while the pcp lock is held; that would result
+ * in each RCU free taking the zone->lock, which can be very slow.
+ */
+ rcu_read_lock();
+
/*
* On allocation, reduce the number of pages that are batch freed.
* See nr_pcp_free() where free_factor is increased for subsequent
@@ -3046,6 +3053,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
pcp_spin_unlock(pcp);
pcp_trylock_finish(UP_flags);
+ rcu_read_unlock();
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
zone_statistics(preferred_zone, zone, 1);
--
2.47.1
Powered by blists - more mailing lists