[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aS_JzWHHn8hBHSCe@gourry-fedora-PF4VCD3F>
Date: Wed, 3 Dec 2025 00:25:33 -0500
From: Gregory Price <gourry@...rry.net>
To: Balbir Singh <balbirs@...dia.com>
Cc: linux-mm@...ck.org, kernel-team@...a.com, linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev,
linux-fsdevel@...r.kernel.org, cgroups@...r.kernel.org,
dave@...olabs.net, jonathan.cameron@...wei.com,
dave.jiang@...el.com, alison.schofield@...el.com,
vishal.l.verma@...el.com, ira.weiny@...el.com,
dan.j.williams@...el.com, longman@...hat.com,
akpm@...ux-foundation.org, david@...hat.com,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
osalvador@...e.de, ziy@...dia.com, matthew.brost@...el.com,
joshua.hahnjy@...il.com, rakie.kim@...com, byungchul@...com,
ying.huang@...ux.alibaba.com, apopple@...dia.com, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, tj@...nel.org, hannes@...xchg.org,
mkoutny@...e.com, kees@...nel.org, muchun.song@...ux.dev,
roman.gushchin@...ux.dev, shakeel.butt@...ux.dev,
rientjes@...gle.com, jackmanb@...gle.com, cl@...two.org,
harry.yoo@...cle.com, axelrasmussen@...gle.com, yuanchu@...gle.com,
weixugc@...gle.com, zhengqi.arch@...edance.com,
yosry.ahmed@...ux.dev, nphamcs@...il.com, chengming.zhou@...ux.dev,
fabio.m.de.francesco@...ux.intel.com, rrichter@....com,
ming.li@...omail.com, usamaarif642@...il.com, brauner@...nel.org,
oleg@...hat.com, namcao@...utronix.de, escape@...ux.alibaba.com,
dongjoo.seo1@...sung.com
Subject: Re: [RFC LPC2026 PATCH v2 00/11] Specific Purpose Memory NUMA Nodes
On Wed, Dec 03, 2025 at 03:36:33PM +1100, Balbir Singh wrote:
> > - I discussed in my note to David that this is probably the right
> > way to go about doing it. I think N_MEMORY can still be set, if
> > a new global-default-node policy is created.
> >
>
> I still think N_MEMORY as a flag should mean something different from
> N_SPM_NODE_MEMORY because their characteristics are different
>
... snip ... (I agree, see later)
> > - Instead, I can see either per-component policies (reclaim->nodes)
> > or a global policy that covers all of those components (similar to
> > my sysram_nodes). Drivers would then be responsible to register
> > their hotplugged memory nodes with those components accordingly.
> >
>
> To me node zonelists provide the right abstraction of where to allocate from
> and how to fallback as needed. I'll read your patches to figure out how your
> approach is different. I wanted the isolation at allocation time
>
... snip ... (I agree, see later)
>
> Yes, we should look at the pros and cons. To be honest, I'd wouldn't be
> opposed to having kswapd and reclaim look different for these nodes, it
> would also mean that we'd need pagecache hooks if we want page cache on
> these nodes. Everything else, including move_pages() should just work.
>
Basically my series does (roughly) the same as yours, but adds the
cpusets controls and a GFP flag. The MHP extention should ultimately
be converted to N_SPM_NODE_MEMORY (or whatever we decide to name it).
After some more time to think, I think we want all of it.
- N_SPM_NODE_MEMORY (or whatever we call it) handles filtering out
SPM at allocation time by default and protects all current users
of N_MEMORY from exposure to SPM.
- cpusets controls allow userland isolation control and a default sysram
mask (I think cpusets.sysram_nodes doesn't even need to be exposed via
sysfs to be honest). cpusets fix is needed due to task->mems_allowed
being used as a default nodemask on systems using cgroups/cpusets.
- GFP_SP_NODE protects against someone doing something like:
get_page_from_freelist(..., node_states[N_POSSIBLE])
or
numactl --interleave --all ./my_program
While providing a way to punch an explicit hole in the isolation
(GFP_SP_NODE means "Use N_SPM_NODE_MEMORY instead of N_MEMORY")
This could be argued against so long as we restrict mempolicy.c
to N_MEMORY nodes (to avoid `--interleave --all` issues), but this
limitation may not be preferable.
My concern is for breaking existing userland software that happens
to run on a system with SPM - but you can probably imagine many more
bad scenarios.
~Gregory
Powered by blists - more mailing lists