lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250919210612.1975459-1-joshua.hahnjy@gmail.com>
Date: Fri, 19 Sep 2025 14:06:11 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Johannes Weiner <hannes@...xchg.org>,
	Chris Mason <clm@...com>,
	Kiryl Shutsemau <kirill@...temov.name>,
	"Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Brendan Jackman <jackmanb@...gle.com>,
	David Hildenbrand <david@...hat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Michal Hocko <mhocko@...e.com>,
	Mike Rapoport <rppt@...nel.org>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Zi Yan <ziy@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	kernel-team@...a.com
Subject: Re: [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk

Hello Andrew,

Thank you for your review, as always!

On Fri, 19 Sep 2025 13:06:44 -0700 Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Fri, 19 Sep 2025 12:52:18 -0700 Joshua Hahn <joshua.hahnjy@...il.com> wrote:
> 
> > While testing workloads with high sustained memory pressure on large machines
> > (1TB memory, 316 CPUs), we saw an unexpectedly high number of softlockups.
> > Further investigation showed that the lock in free_pcppages_bulk was being held
> > for a long time, even being held while 2k+ pages were being freed [1].
> 
> What problems are caused by this, apart from a warning which can
> presumably be suppressed in some fashion?

There are softlockup panics that we (Meta) saw in the fleet previously. For
some reason I can't get it to reproduce again, but let me try to find a way
to trigger these softlockups again. 

> > This causes starvation in other processes for both the pcp and zone locks,
> > which can lead to softlockups that cause the system to stall [2].
> 
> [2] doesn't describe such stalls.

You're absolutely right -- I was revising this cover letter a bit and I was
going to link the below message separately, but decided to put it in the
message and fogot to remove the footnote. The message below isn't a softlockup,
but let me try and get one to add to the cover letter in a reply to this.

> >
> > ...
> >
> > In our fleet, we have seen that performing batched lock freeing has led to
> > significantly lower rates of softlockups, while incurring relatively small
> > regressions (relative to the workload and relative to the variation).
> 
> "our" == Meta?

Yes -- sorry, I think I made this same mistake in the original version as well.
I'll be more careful about this!

Thank you again for your feedback Andrew, I hope you have a great day!
Joshua

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ