[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<AS2PR08MB97863FD18B6D7D779CBE9B9AF75AA@AS2PR08MB9786.eurprd08.prod.outlook.com>
Date: Mon, 28 Jul 2025 06:14:45 +0000
From: Justin He <Justin.He@....com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
CC: "Rafael J. Wysocki" <rafael@...nel.org>, Danilo Krummrich
<dakr@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] mm: percpu: Introduce normalized CPU-to-NUMA node mapping
to reduce max_distance
Hi Greg,
> -----Original Message-----
> From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Sent: Monday, July 28, 2025 12:28 PM
> To: Justin He <Justin.He@....com>
> Cc: Rafael J. Wysocki <rafael@...nel.org>; Danilo Krummrich
> <dakr@...nel.org>; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH] mm: percpu: Introduce normalized CPU-to-NUMA node
> mapping to reduce max_distance
>
> On Mon, Jul 28, 2025 at 02:54:42AM +0000, Justin He wrote:
> > Hi Greg
> >
> > > -----Original Message-----
> > > From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> > > Sent: Tuesday, July 22, 2025 1:45 PM
> > > To: Justin He <Justin.He@....com>
> > > Cc: Rafael J. Wysocki <rafael@...nel.org>; Danilo Krummrich
> > > <dakr@...nel.org>; linux-kernel@...r.kernel.org
> > > Subject: Re: [PATCH] mm: percpu: Introduce normalized CPU-to-NUMA
> > > node
>
> Odd quoting, please fix your email client :(
>
> > > > In this configuration, pcpu_embed_first_chunk() computes a large
> > > > max_distance:
> > > > percpu: max_distance=0x5fffbfac0000 too large for vmalloc space
> > > > 0x7bff70000000
> > > >
> > > > As a result, the allocator falls back to pcpu_page_first_chunk(),
> > > > which uses page-by-page allocation with nr_groups = 1, leading to
> > > > degraded performance.
> > >
> > > But that's intentional, you don't want to go across the nodes, right?
> > My intention is to
>
> Did something get dropped?
>
Sorry, the previous text should be:
My intention is to optimize the percpu allocation case to avoid to go to
pcpu_page_first_chunk() before trying again the normalization.
> > > > This patch introduces a normalized CPU-to-NUMA node mapping to
> > > > mitigate the issue. Distances of 10 and 16 are treated as local
> > > > (LOCAL_DISTANCE),
> > >
> > > Why? What is this going to now break on those systems that assumed
> > > that those were NOT local?
> > The normalization only affects percpu allocations - possibly only dynamic
> ones.
>
> "possibly" doesn't instill much confidence here...
>
> > Other mechanisms, such as cpu_to_node_map, remain unaffected and
> > continue to function as before in those contexts.
>
> percpu allocations are the "hottest" path we have, so without testing this on
> systems that were working well before your change, I don't think we could
> ever accept this, right?
>
> > > What did you test this on?
> > >
> > This was conducted on an Arm64 N2 server with 256 CPUs and 64 GB of
> memory.
> > (Apologies, but I am not authorized to disclose the exact hardware
> > specifications.)
>
> That's fine, but why didn't you test this on older systems that this code was
> originally written for? You don't want to have regressions on them, right?
Besides the N2 server I mentioned in the commit msg, I tested this on an
ARM64 N2 legacy system with 2 nodes, 128 CPUs, and 128 GB of memory.
It works well both with and without the patch.
The updated pseudo-code logic is as follows:
- Attempt pcpu_embed_first_chunk() — original logic
- If it fails and normalization is worthwhile, retry pcpu_embed_first_chunk() — with the patch, in normalization mode
- If it still fails, fall back to pcpu_page_first_chunk()
In practice, I believe most legacy systems won't enter normalization mode, except for my N2 server.
---
Cheers,
Justin He(Jia He)
Powered by blists - more mailing lists