lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250926172532.2814977-1-joshua.hahnjy@gmail.com>
Date: Fri, 26 Sep 2025 10:25:16 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Chris Mason <clm@...com>,
	Kiryl Shutsemau <kirill@...temov.name>,
	Brendan Jackman <jackmanb@...gle.com>,
	Michal Hocko <mhocko@...e.com>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Zi Yan <ziy@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	kernel-team@...a.com
Subject: Re: [PATCH v2 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone

On Fri, 26 Sep 2025 09:21:29 -0700 (PDT) "Christoph Lameter (Ampere)" <cl@...two.org> wrote:

> On Thu, 25 Sep 2025, Joshua Hahn wrote:
> 
> > > So we need an explanation as to why there is such high contention on the
> > > lock first before changing the logic here.
> > >
> > > The current logic seems to be designed to prevent the lock contention you
> > > are seeing.
> >
> > This is true, but my concern was mostly with the value that is being used
> > for the batching (2048 seems too high). But as I explain below, it seems
> > like the min(2048, count) operation is a no-op anyways, since it is never
> > called with count > 1000 (at least from the benchmarks that I was running,
> > on my machine).
> 

Hello Christoph, thank you for your feedback!

> The problem is that you likely increase zone lock contention with a
> reduced batch size.

I see, so is the suggestion here that by reducing batch size, we
increase zone lock contention because we are bouncing more often and
having to spend more contending on the zone lock?

> Actually that there is a lock in the pcp structure is weird and causes
> cacheline bouncing on such hot paths. Access should be only from the cpu
> that owns this structure. Remote cleaning (if needed) can be triggered via
> IPIs.
> 
> This is the way it used to be and the way it was tested for high core
> counts years ago.
> 
> You seem to run 176 cores here so its similar to what we tested way back
> when. If all cores are accessing the pcp structure then you have
> significant cacheline bouncing. Removing the lock and going back to the
> IPI solution would likely remove the problem.
> 
> The cachelines of the allocator per cpu structures are usually very hot
> and should only be touched in rare circumstances from other cpus.
> 
> Having a loop over all processors accessing all the hot percpus structurs
> is likely causing significant performance issues and therefore the issues
> that you are seeing here.

So just to be explicit about this -- in my last response, I noted that
drain_zone_pages really isn't a hot path. In fact, it's never called
with count > 1000. The real offenders are in the last 2 patches of this
series (decay_pcp_high and free_frozen_page_commit), and I believe these
callers don't have the same problem of iterating over the CPUs.
free_frozen_page_commit only iterates through the zones
(from free_unref_folios), so I don't think it has the same cacheline
bouncing issue as drain_pages_zone.

So perhaps drain_pages_zone is a good candidate to leave the batching
size as is, i.e. pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX. What do you
think? Again, happy to remove this patch from the series as all of the
proposed performance gains and lock contention reduction on the cover
letter come from the last 2 patches of this series.

Thank you again for your thoughtful response, have a great day!
Joshua

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ