lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aS_JzWHHn8hBHSCe@gourry-fedora-PF4VCD3F>
Date: Wed, 3 Dec 2025 00:25:33 -0500
From: Gregory Price <gourry@...rry.net>
To: Balbir Singh <balbirs@...dia.com>
Cc: linux-mm@...ck.org, kernel-team@...a.com, linux-cxl@...r.kernel.org,
	linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev,
	linux-fsdevel@...r.kernel.org, cgroups@...r.kernel.org,
	dave@...olabs.net, jonathan.cameron@...wei.com,
	dave.jiang@...el.com, alison.schofield@...el.com,
	vishal.l.verma@...el.com, ira.weiny@...el.com,
	dan.j.williams@...el.com, longman@...hat.com,
	akpm@...ux-foundation.org, david@...hat.com,
	lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
	rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
	osalvador@...e.de, ziy@...dia.com, matthew.brost@...el.com,
	joshua.hahnjy@...il.com, rakie.kim@...com, byungchul@...com,
	ying.huang@...ux.alibaba.com, apopple@...dia.com, mingo@...hat.com,
	peterz@...radead.org, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	vschneid@...hat.com, tj@...nel.org, hannes@...xchg.org,
	mkoutny@...e.com, kees@...nel.org, muchun.song@...ux.dev,
	roman.gushchin@...ux.dev, shakeel.butt@...ux.dev,
	rientjes@...gle.com, jackmanb@...gle.com, cl@...two.org,
	harry.yoo@...cle.com, axelrasmussen@...gle.com, yuanchu@...gle.com,
	weixugc@...gle.com, zhengqi.arch@...edance.com,
	yosry.ahmed@...ux.dev, nphamcs@...il.com, chengming.zhou@...ux.dev,
	fabio.m.de.francesco@...ux.intel.com, rrichter@....com,
	ming.li@...omail.com, usamaarif642@...il.com, brauner@...nel.org,
	oleg@...hat.com, namcao@...utronix.de, escape@...ux.alibaba.com,
	dongjoo.seo1@...sung.com
Subject: Re: [RFC LPC2026 PATCH v2 00/11] Specific Purpose Memory NUMA Nodes

On Wed, Dec 03, 2025 at 03:36:33PM +1100, Balbir Singh wrote:
> >    - I discussed in my note to David that this is probably the right
> >      way to go about doing it. I think N_MEMORY can still be set, if
> >      a new global-default-node policy is created.
> > 
> 
> I still think N_MEMORY as a flag should mean something different from
> N_SPM_NODE_MEMORY because their characteristics are different
> 
... snip ...  (I agree, see later)

> >    - Instead, I can see either per-component policies (reclaim->nodes)
> >      or a global policy that covers all of those components (similar to
> >      my sysram_nodes).  Drivers would then be responsible to register
> >      their hotplugged memory nodes with those components accordingly.
> > 
> 
> To me node zonelists provide the right abstraction of where to allocate from
> and how to fallback as needed. I'll read your patches to figure out how your
> approach is different. I wanted the isolation at allocation time
>
... snip ... (I agree, see later)

> 
> Yes, we should look at the pros and cons. To be honest, I'd wouldn't be 
> opposed to having kswapd and reclaim look different for these nodes, it
> would also mean that we'd need pagecache hooks if we want page cache on
> these nodes. Everything else, including move_pages() should just work.
> 

Basically my series does (roughly) the same as yours, but adds the
cpusets controls and a GFP flag.  The MHP extention should ultimately
be converted to N_SPM_NODE_MEMORY (or whatever we decide to name it).

After some more time to think, I think we want all of it.

- N_SPM_NODE_MEMORY (or whatever we call it) handles filtering out
  SPM at allocation time by default and protects all current users
  of N_MEMORY from exposure to SPM.

- cpusets controls allow userland isolation control and a default sysram
  mask (I think cpusets.sysram_nodes doesn't even need to be exposed via
  sysfs to be honest).  cpusets fix is needed due to task->mems_allowed
  being used as a default nodemask on systems using cgroups/cpusets.

- GFP_SP_NODE protects against someone doing something like:
      get_page_from_freelist(..., node_states[N_POSSIBLE])
      or
      numactl --interleave --all ./my_program

  While providing a way to punch an explicit hole in the isolation
  (GFP_SP_NODE means "Use N_SPM_NODE_MEMORY instead of N_MEMORY")

  This could be argued against so long as we restrict mempolicy.c
  to N_MEMORY nodes (to avoid `--interleave --all` issues), but this
  limitation may not be preferable.

  My concern is for breaking existing userland software that happens
  to run on a system with SPM - but you can probably imagine many more
  bad scenarios.

~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ