linux-kernel - Re: [PATCH v3 10/11] mm: vmalloc: Set nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZamX5QFjCTGJf52x@dread.disaster.area>
Date: Fri, 19 Jan 2024 08:28:05 +1100
From: Dave Chinner <david@...morbit.com>
To: Uladzislau Rezki <urezki@...il.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, Baoquan He <bhe@...hat.com>,
	Lorenzo Stoakes <lstoakes@...il.com>,
	Christoph Hellwig <hch@...radead.org>,
	Matthew Wilcox <willy@...radead.org>,
	"Liam R . Howlett" <Liam.Howlett@...cle.com>,
	"Paul E . McKenney" <paulmck@...nel.org>,
	Joel Fernandes <joel@...lfernandes.org>,
	Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>
Subject: Re: [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a
 system

On Thu, Jan 18, 2024 at 07:23:47PM +0100, Uladzislau Rezki wrote:
> On Wed, Jan 17, 2024 at 09:06:02AM +1100, Dave Chinner wrote:
> > On Mon, Jan 15, 2024 at 08:09:29PM +0100, Uladzislau Rezki wrote:
> > > We can easily set nr_nodes to num_possible_cpus() and let it scale for
> > > anyone. But before doing this, i would like to give it a try as a first
> > > step because i have not tested it well on really big NUMA systems.
> > 
> > I don't think you need to have large NUMA systems to test it. We
> > have the "fakenuma" feature for a reason.  Essentially, once you
> > have enough CPU cores that catastrophic lock contention can be
> > generated in a fast path (can take as few as 4-5 CPU cores), then
> > you can effectively test NUMA scalability with fakenuma by creating
> > nodes with >=8 CPUs each.
> > 
> > This is how I've done testing of numa aware algorithms (like
> > shrinkers!) for the past decade - I haven't had direct access to a
> > big NUMA machine since 2008, yet it's relatively trivial to test
> > NUMA based scalability algorithms without them these days.
> > 
> I see your point. NUMA-aware scalability require reworking adding extra
> layer that allows such scaling.
> 
> If the socket has 256 CPUs, how do scale VAs inside that node among
> those CPUs?

It's called "sub-numa clustering" and is a bios option that presents
large core count CPU packages as multiple NUMA nodes. See:

https://www.intel.com/content/www/us/en/developer/articles/technical/fourth-generation-xeon-scalable-family-overview.html

Essentially, large core count CPUs are a cluster of smaller core
groups with their own resources and memory controllers. This is how
they are laid out either on a single die (intel) or as a collection
of smaller dies (AMD compute complexes) that are tied together by
the interconnect between the LLCs and memory controllers. They only
appear as a "unified" CPU because they are configured that way by
the bios, but can also be configured to actually expose their inner
non-uniform memory access topology for operating systems and
application stacks that are NUMA aware (like Linux).

This means a "256 core" CPU would probably present as 16 smaller 16
core CPUs each with their own L1/2/3 caches and memory controllers.
IOWs, a single socket appears to the kernel as a 16 node NUMA system
with 16 cores per node. Most NUMA aware scalability algorithms will
work just fine with this sort setup - it's just another set of
numbers in the NUMA distance table...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com