[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWUs8Fx2CG07F81e@yury>
Date: Mon, 12 Jan 2026 12:18:40 -0500
From: Yury Norov <ynorov@...dia.com>
To: Gregory Price <gourry@...rry.net>
Cc: Balbir Singh <balbirs@...dia.com>, linux-mm@...ck.org,
cgroups@...r.kernel.org, linux-cxl@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, kernel-team@...a.com,
longman@...hat.com, tj@...nel.org, hannes@...xchg.org,
mkoutny@...e.com, corbet@....net, gregkh@...uxfoundation.org,
rafael@...nel.org, dakr@...nel.org, dave@...olabs.net,
jonathan.cameron@...wei.com, dave.jiang@...el.com,
alison.schofield@...el.com, vishal.l.verma@...el.com,
ira.weiny@...el.com, dan.j.williams@...el.com,
akpm@...ux-foundation.org, vbabka@...e.cz, surenb@...gle.com,
mhocko@...e.com, jackmanb@...gle.com, ziy@...dia.com,
david@...nel.org, lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com, rppt@...nel.org, axelrasmussen@...gle.com,
yuanchu@...gle.com, weixugc@...gle.com, yury.norov@...il.com,
linux@...musvillemoes.dk, rientjes@...gle.com,
shakeel.butt@...ux.dev, chrisl@...nel.org, kasong@...cent.com,
shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
baohua@...nel.org, yosry.ahmed@...ux.dev, chengming.zhou@...ux.dev,
roman.gushchin@...ux.dev, muchun.song@...ux.dev, osalvador@...e.de,
matthew.brost@...el.com, joshua.hahnjy@...il.com, rakie.kim@...com,
byungchul@...com, ying.huang@...ux.alibaba.com, apopple@...dia.com,
cl@...two.org, harry.yoo@...cle.com, zhengqi.arch@...edance.com
Subject: Re: [RFC PATCH v3 0/8] mm,numa: N_PRIVATE node isolation for
device-managed memory
On Mon, Jan 12, 2026 at 09:36:49AM -0500, Gregory Price wrote:
> On Mon, Jan 12, 2026 at 10:12:23PM +1100, Balbir Singh wrote:
> > On 1/9/26 06:37, Gregory Price wrote:
> > > This series introduces N_PRIVATE, a new node state for memory nodes
> > > whose memory is not intended for general system consumption. Today,
> > > device drivers (CXL, accelerators, etc.) hotplug their memory to access
> > > mm/ services like page allocation and reclaim, but this exposes general
> > > workloads to memory with different characteristics and reliability
> > > guarantees than system RAM.
> > >
> > > N_PRIVATE provides isolation by default while enabling explicit access
> > > via __GFP_THISNODE for subsystems that understand how to manage these
> > > specialized memory regions.
> > >
> >
> > I assume each class of N_PRIVATE is a separate set of NUMA nodes, these
> > could be real or virtual memory nodes?
> >
>
> This has the the topic of a long, long discussion on the CXL discord -
> how do we get extra nodes if we intend to make HPA space flexibly
> configurable by "intended use".
>
> tl;dr: open to discussion. As of right now, there's no way (that I
> know of) to allocate additional NUMA nodes at boot without having some
> indication that one is needed in the ACPI table (srat touches a PXM, or
> CEDT defines a region not present in SRAT).
>
> Best idea we have right now is to have a build config that reserves some
> extra nodes which can be used later (they're in N_POSSIBLE but otherwise
> not used by anything).
>
> > > Design
> > > ======
> > >
> > > The series introduces:
> > >
> > > 1. N_PRIVATE node state (mutually exclusive with N_MEMORY)
> >
> > We should call it N_PRIVATE_MEMORY
> >
>
> Dan Williams convinced me to go with N_PRIVATE, but this is really a
> bikeshed topic
No it's not. To me (OK, an almost random reader in this discussion),
N_PRIVATE is a pretty confusing name. It doesn't answer the question:
private what? N_PRIVATE_MEMORY is better in that department, isn't?
But taking into account isolcpus, maybe N_ISOLMEM?
> - we could call it N_BOBERT until we find consensus.
Please give it the right name well describing the scope and purpose of
the new restriction policy before moving forward.
> > > enum private_memtype {
> > > NODE_MEM_NOTYPE, /* No type assigned (invalid state) */
> > > NODE_MEM_ZSWAP, /* Swap compression target */
> > > NODE_MEM_COMPRESSED, /* General compressed RAM */
> > > NODE_MEM_ACCELERATOR, /* Accelerator-attached memory */
> > > NODE_MEM_DEMOTE_ONLY, /* Memory-tier demotion target only */
> > > NODE_MAX_MEMTYPE,
> > > };
> > >
> > > These types serve as policy hints for subsystems:
> > >
> >
> > Do these nodes have fallback(s)? Are these nodes prone to OOM when memory is exhausted
> > in one class of N_PRIVATE node(s)?
> >
>
> Right now, these nodes do not have fallbacks, and even if they did the
> use of __GFP_THISNODE would prevent this. That's intended.
>
> In theory you could have nodes of similar types fall back to each other,
> but that feels like increased complexity for questionable value. The
> service requested __GFP_THISNODE should be aware that it needs to manage
> fallback.
Yeah, and most GFP_THISNODE users also pass GFP_NOWARN, which makes it
looking more like an emergency feature. Maybe add a symmetric GFP_PRIVATE
flag that would allow for more flexibility, and highlight the intention
better?
> > What about page cache allocation form these nodes? Since default allocations
> > never use them, a file system would need to do additional work to allocate
> > on them, if there was ever a desire to use them.
>
> Yes, in-fact that is the intent. Anything requesting memory from these
> nodes would need to be aware of how to manage them.
>
> Similar to ZONE_DEVICE memory - which is wholly unmanaged by the page
This is quite opposite to what you are saying in the motivation
section:
Several emerging memory technologies require kernel memory management
services but should not be used for general allocations
So, is it completely unmanaged node, or only general allocation isolated?
Thanks,
Yury
> allocator. There's potential for re-using some of the ZONE_DEVICE or
> HMM callback infrastructure to implement the callbacks for N_PRIVATE
> instead of re-inventing it.
>
> > Would memory
> > migration would work between N_PRIVATE and N_MEMORY using move_pages()?
> >
>
> N_PRIVATE -> N_MEMORY would probably be easy and trivial, but could also
> be a controllable bit.
>
> A side-discussion not present in these notes has been whether memtype
> should be an enum or a bitfield.
>
> N_MEMORY -> N_PRIVATE via migrate.c would probably require some changes
> to migration_target_control and the alloc callback (in vmscan.c, see
> alloc_migrate_folio) would need to be N_PRIVATE aware.
>
>
> Thanks for taking a look,
> ~Gregory
Powered by blists - more mailing lists