lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWUHAboKw28XepWr@gourry-fedora-PF4VCD3F>
Date: Mon, 12 Jan 2026 09:36:49 -0500
From: Gregory Price <gourry@...rry.net>
To: Balbir Singh <balbirs@...dia.com>
Cc: linux-mm@...ck.org, cgroups@...r.kernel.org, linux-cxl@...r.kernel.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, kernel-team@...a.com,
	longman@...hat.com, tj@...nel.org, hannes@...xchg.org,
	mkoutny@...e.com, corbet@....net, gregkh@...uxfoundation.org,
	rafael@...nel.org, dakr@...nel.org, dave@...olabs.net,
	jonathan.cameron@...wei.com, dave.jiang@...el.com,
	alison.schofield@...el.com, vishal.l.verma@...el.com,
	ira.weiny@...el.com, dan.j.williams@...el.com,
	akpm@...ux-foundation.org, vbabka@...e.cz, surenb@...gle.com,
	mhocko@...e.com, jackmanb@...gle.com, ziy@...dia.com,
	david@...nel.org, lorenzo.stoakes@...cle.com,
	Liam.Howlett@...cle.com, rppt@...nel.org, axelrasmussen@...gle.com,
	yuanchu@...gle.com, weixugc@...gle.com, yury.norov@...il.com,
	linux@...musvillemoes.dk, rientjes@...gle.com,
	shakeel.butt@...ux.dev, chrisl@...nel.org, kasong@...cent.com,
	shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
	baohua@...nel.org, yosry.ahmed@...ux.dev, chengming.zhou@...ux.dev,
	roman.gushchin@...ux.dev, muchun.song@...ux.dev, osalvador@...e.de,
	matthew.brost@...el.com, joshua.hahnjy@...il.com, rakie.kim@...com,
	byungchul@...com, ying.huang@...ux.alibaba.com, apopple@...dia.com,
	cl@...two.org, harry.yoo@...cle.com, zhengqi.arch@...edance.com
Subject: Re: [RFC PATCH v3 0/8] mm,numa: N_PRIVATE node isolation for
 device-managed memory

On Mon, Jan 12, 2026 at 10:12:23PM +1100, Balbir Singh wrote:
> On 1/9/26 06:37, Gregory Price wrote:
> > This series introduces N_PRIVATE, a new node state for memory nodes 
> > whose memory is not intended for general system consumption.  Today,
> > device drivers (CXL, accelerators, etc.) hotplug their memory to access
> > mm/ services like page allocation and reclaim, but this exposes general
> > workloads to memory with different characteristics and reliability
> > guarantees than system RAM.
> > 
> > N_PRIVATE provides isolation by default while enabling explicit access
> > via __GFP_THISNODE for subsystems that understand how to manage these
> > specialized memory regions.
> > 
> 
> I assume each class of N_PRIVATE is a separate set of NUMA nodes, these
> could be real or virtual memory nodes?
>

This has the the topic of a long, long discussion on the CXL discord -
how do we get extra nodes if we intend to make HPA space flexibly
configurable by "intended use".

tl;dr:  open to discussion.  As of right now, there's no way (that I
know of) to allocate additional NUMA nodes at boot without having some
indication that one is needed in the ACPI table (srat touches a PXM, or
CEDT defines a region not present in SRAT).

Best idea we have right now is to have a build config that reserves some
extra nodes which can be used later (they're in N_POSSIBLE but otherwise
not used by anything).

> > Design
> > ======
> > 
> > The series introduces:
> > 
> >   1. N_PRIVATE node state (mutually exclusive with N_MEMORY)
> 
> We should call it N_PRIVATE_MEMORY
>

Dan Williams convinced me to go with N_PRIVATE, but this is really a
bikeshed topic - we could call it N_BOBERT until we find consensus.

> > 
> >   enum private_memtype {
> >       NODE_MEM_NOTYPE,      /* No type assigned (invalid state) */
> >       NODE_MEM_ZSWAP,       /* Swap compression target */
> >       NODE_MEM_COMPRESSED,  /* General compressed RAM */
> >       NODE_MEM_ACCELERATOR, /* Accelerator-attached memory */
> >       NODE_MEM_DEMOTE_ONLY, /* Memory-tier demotion target only */
> >       NODE_MAX_MEMTYPE,
> >   };
> > 
> > These types serve as policy hints for subsystems:
> > 
> 
> Do these nodes have fallback(s)? Are these nodes prone to OOM when memory is exhausted
> in one class of N_PRIVATE node(s)?
> 

Right now, these nodes do not have fallbacks, and even if they did the
use of __GFP_THISNODE would prevent this.  That's intended.

In theory you could have nodes of similar types fall back to each other,
but that feels like increased complexity for questionable value.  The
service requested __GFP_THISNODE should be aware that it needs to manage
fallback.

> 
> What about page cache allocation form these nodes? Since default allocations
> never use them, a file system would need to do additional work to allocate
> on them, if there was ever a desire to use them. 

Yes, in-fact that is the intent.  Anything requesting memory from these
nodes would need to be aware of how to manage them.

Similar to ZONE_DEVICE memory - which is wholly unmanaged by the page
allocator.  There's potential for re-using some of the ZONE_DEVICE or
HMM callback infrastructure to implement the callbacks for N_PRIVATE
instead of re-inventing it.

> Would memory
> migration would work between N_PRIVATE and N_MEMORY using move_pages()?
> 

N_PRIVATE -> N_MEMORY would probably be easy and trivial, but could also
be a controllable bit.

A side-discussion not present in these notes has been whether memtype
should be an enum or a bitfield.

N_MEMORY -> N_PRIVATE via migrate.c would probably require some changes
to migration_target_control and the alloc callback (in vmscan.c, see
alloc_migrate_folio) would need to be N_PRIVATE aware.


Thanks for taking a look,
~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ