linux-kernel - Re: [PATCH] mm/page_alloc: Occasionally relinquish zone lock in batch freeing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250819152853.2055256-1-joshua.hahnjy@gmail.com>
Date: Tue, 19 Aug 2025 08:28:52 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Kiryl Shutsemau <kirill@...temov.name>
Cc: Johannes Weiner <hannes@...xchg.org>,
	Chris Mason <clm@...com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Suren Baghdasaryan <surenb@...le.com>,
	Michal Hocko <mhocko@...e.com>,
	Brendan Jackman <jackmanb@...gle.com>,
	Zi Yan <ziy@...dia.com>,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	kernel-team@...a.com
Subject: Re: [PATCH] mm/page_alloc: Occasionally relinquish zone lock in batch freeing

On Tue, 19 Aug 2025 10:15:13 +0100 Kiryl Shutsemau <kirill@...temov.name> wrote:

Hello Kiryl,

Thank you for your review!

> On Mon, Aug 18, 2025 at 11:58:03AM -0700, Joshua Hahn wrote:
> > While testing workloads with high sustained memory pressure on large machines
> > (1TB memory, 316 CPUs), we saw an unexpectedly high number of softlockups.
> > Further investigation showed that the lock in free_pcppages_bulk was being held
> > for a long time, even being held while 2k+ pages were being freed.
> > 
> > Instead of holding the lock for the entirety of the freeing, check to see if
> > the zone lock is contended every pcp->batch pages. If there is contention,
> > relinquish the lock so that other processors have a change to grab the lock
> > and perform critical work.
> 
> Hm. It doesn't necessary to be contention on the lock, but just that you
> holding the lock for too long so the CPU is not available for the scheduler.

I see, I think that also makes sense. So you are suggesting that it is not
lock contention that is the issue, but rather, that the CPU is not being used
to perform more critical work?

I can definitely test this idea, and let you know what I see. With my very
limited understanding, I wonder though if 1 busy CPU will make a big difference
on a machine with 316 CPUs. That is, my instinct tells me that the zone lock is
a more hotly contended resource than the CPU is -- but I will try to run some
tests to figure out which of these is more heavily contended.

[...snip...]

> > ---
> >  mm/page_alloc.c | 15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a8a84c3b5fe5..bd7a8da3e159 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1238,6 +1238,8 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  	 * below while (list_empty(list)) loop.
> >  	 */
> >  	count = min(pcp->count, count);
> > +	if (!count)
> > +		return;
> >  
> >  	/* Ensure requested pindex is drained first. */
> >  	pindex = pindex - 1;
> > @@ -1247,6 +1249,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  	while (count > 0) {
> >  		struct list_head *list;
> >  		int nr_pages;
> > +		int batch = min(count, pcp->batch);
> >  
> >  		/* Remove pages from lists in a round-robin fashion. */
> >  		do {
> > @@ -1267,12 +1270,22 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  
> >  			/* must delete to avoid corrupting pcp list */
> >  			list_del(&page->pcp_list);
> > +			batch -= nr_pages;
> >  			count -= nr_pages;
> >  			pcp->count -= nr_pages;
> >  
> >  			__free_one_page(page, pfn, zone, order, mt, FPI_NONE);
> >  			trace_mm_page_pcpu_drain(page, order, mt);
> > -		} while (count > 0 && !list_empty(list));
> > +		} while (batch > 0 && !list_empty(list));
> > +
> > +		/*
> > +		 * Prevent starving the lock for other users; every pcp->batch
> > +		 * pages freed, relinquish the zone lock if it is contended.
> > +		 */
> > +		if (count && spin_is_contended(&zone->lock)) {
> 
> I would rather drop the count thing and do something like this:
> 
> 		if (need_resched() || spin_needbreak(&zone->lock) {
> 			spin_unlock_irqrestore(&zone->lock, flags);
> 			cond_resched();
> 			spin_lock_irqsave(&zone->lock, flags);
> 		}

Thank you for this idea, Kiryl. I think adding the cond_resched() is absolutely
necessary here, as Andrew has also kindly pointed out in his response. I also
like the idea of adding the need_resched() and spin_needbreak() checks here.

I still think having the if (count) check is important here, since I don't
think we want to stall the exit of this function if there are no more pages
remaining to be freed, we can simply spin_unlock_irqrestore() and exit the
function. So maybe something like this?

		if (count && (need_resched() || spin_needbreak(&zone->lock))

Thank you again for your review, Kiryl! I hope you have a great day : -)
Joshua