lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251107224956.477056-1-gourry@gourry.net>
Date: Fri,  7 Nov 2025 17:49:45 -0500
From: Gregory Price <gourry@...rry.net>
To: linux-mm@...ck.org
Cc: linux-cxl@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	nvdimm@...ts.linux.dev,
	linux-fsdevel@...r.kernel.org,
	cgroups@...r.kernel.org,
	dave@...olabs.net,
	jonathan.cameron@...wei.com,
	dave.jiang@...el.com,
	alison.schofield@...el.com,
	vishal.l.verma@...el.com,
	ira.weiny@...el.com,
	dan.j.williams@...el.com,
	longman@...hat.com,
	akpm@...ux-foundation.org,
	david@...hat.com,
	lorenzo.stoakes@...cle.com,
	Liam.Howlett@...cle.com,
	vbabka@...e.cz,
	rppt@...nel.org,
	surenb@...gle.com,
	mhocko@...e.com,
	osalvador@...e.de,
	ziy@...dia.com,
	matthew.brost@...el.com,
	joshua.hahnjy@...il.com,
	rakie.kim@...com,
	byungchul@...com,
	gourry@...rry.net,
	ying.huang@...ux.alibaba.com,
	apopple@...dia.com,
	mingo@...hat.com,
	peterz@...radead.org,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	tj@...nel.org,
	hannes@...xchg.org,
	mkoutny@...e.com,
	kees@...nel.org,
	muchun.song@...ux.dev,
	roman.gushchin@...ux.dev,
	shakeel.butt@...ux.dev,
	rientjes@...gle.com,
	jackmanb@...gle.com,
	cl@...two.org,
	harry.yoo@...cle.com,
	axelrasmussen@...gle.com,
	yuanchu@...gle.com,
	weixugc@...gle.com,
	zhengqi.arch@...edance.com,
	yosry.ahmed@...ux.dev,
	nphamcs@...il.com,
	chengming.zhou@...ux.dev,
	fabio.m.de.francesco@...ux.intel.com,
	rrichter@....com,
	ming.li@...omail.com,
	usamaarif642@...il.com,
	brauner@...nel.org,
	oleg@...hat.com,
	namcao@...utronix.de,
	escape@...ux.alibaba.com,
	dongjoo.seo1@...sung.com
Subject: [RFC LPC2026 PATCH 0/9] Protected Memory NUMA Nodes

Author Note
-----------
This is a code RFC for discussion related to

"Mempolicy is dead, long live memory policy!"
https://lpc.events/event/19/contributions/2143/

Given the subtlety of some of these changes, and the upcoming holidays
I wanted to publish this well ahead of time for discussion. This is
the baseline patch set which predicates a new kind of mempolicy based
on NUMA node memory features - which can be defined by the components
adding memory to such NUMA nodes.

Included is an example of a Compressed Memory Node, and how compressed
RAM could be managed by zswap.  Compressed memory is its own rabbit
hole - I recommend not getting hung up on the example. 

The core discussion should be around whether such a "Protected Node"
based system is reasonable - and whether there are sufficient potential
users to warrant support.

Also please do not get hung up on naming. "Protected" just means
"Not-System-RAM".  If you see "Default" just assume "Systam RAM".

base-commit: 1c353dc8d962de652bc7ad2ba2e63f553331391c
-----------

With this set, we aim to enable allocation of "special purpose memory"
with the page allocator (mm/page_alloc.c) without exposing the same
memory as "Typical System RAM".  Unless a non-userland component
explicitly asks for the node, and does so with a GFP_PROTECTED flag,
memory on that node cannot be "accidentally" used as normal ram.

We present an example of using this mechanism within ZSWAP, as-if
a "compressed memory node" was present.  How to describe the features
of memory present on nodes is left up to comment here and at LPC '26.

Important Note: Since userspace interfaces are restricted by the
default_node mask (sysram), nothing in userspace can explicitly
request memory from protected nodes.  Instead, the intent is to
create new components which understand different node features,
which abstracts the hardware complexity away from userland.

The ZSWAP example demonstrates this with `mt_compressed_nodemask`
which is simply a hack to simply demonstrate the idea.

There are 4 major changes in this set:

1) Introducing default_sysram_nodes in mm/memory-tiers.c which denotes
   the set of default nodes which are eligible for use as normal sysram

   Some existing users noew pass default_sysram_nodes into the page
   allocator instead of NULL, but passing a NULL pointer in will simply
   have it replaced by default_sysram_nodes anyway.

   default_sysram_nodes is always guaranteed to contain the N_MEMORY
   nodes that were present at boot time, and so it can never be empty.


2) The addition of `cpuset.mems.default` which restricts cgroups to
   using `default_sysram_nodes` by default, while allowing non-sysram
   nodes into mems_effective (mems_allowed).

   This is done to allow separate control over sysram and protected node
   sets by cgroups while maintaining the hierarchical rules.

   current cpuset configuration
   cpuset.mems_allowed
    |.mems_effective         < (mems_allowed ∩ parent.mems_effective)
    |->tasks.mems_allowed    < cpuset.mems_effective

   new cpuset configuration
   cpuset.mems_allowed
    |.mems_effective        < (mems_allowed ∩ parent.mems_effective)
    |.mems_default          < (mems_effective ∩ default_sys_nodemask)
      |->task.mems_default  < cpuset.mems_default - (note renamed)

3) Addition of MHP_PROTECTED_MEMORY flag to denote to memory-hotplug
   that the memory capacity being added should mark the node as a
   protected memory node.  A node is either SysRAM or Protected, and
   cannot contain both (adding protected to an existing SysRAM node
   will result in EINVAL).

   DAX and CXL are made aware of the bit and have `protected_memory`
   bits added to their relevant subsystems.

4) Adding GFP_PROTECTED - which allows page_alloc.c to request memory
   from the provided node or nodemask.  It changes the behavior of
   the cpuset mems_allowed check.

   Probably there needs to be some additional work done here to
   restrict non-cgroup kernels.

Gregory Price (9):
  gfp: Add GFP_PROTECTED for protected-node allocations
  memory-tiers: create default_sysram_nodes
  mm: default slub, oom_kill, compaction, and page_alloc to sysram
  mm,cpusets: rename task->mems_allowed to task->mems_default
  cpuset: introduce cpuset.mems.default
  mm/memory_hotplug: add MHP_PROTECTED_MEMORY flag
  drivers/dax: add protected memory bit to dev_dax
  drivers/cxl: add protected_memory bit to cxl region
  [HACK] mm/zswap: compressed ram integration example

 drivers/cxl/core/region.c       |  30 ++++++
 drivers/cxl/cxl.h               |   2 +
 drivers/dax/bus.c               |  39 ++++++++
 drivers/dax/bus.h               |   1 +
 drivers/dax/cxl.c               |   1 +
 drivers/dax/dax-private.h       |   1 +
 drivers/dax/kmem.c              |   2 +
 fs/proc/array.c                 |   2 +-
 include/linux/cpuset.h          |  52 +++++------
 include/linux/gfp_types.h       |   3 +
 include/linux/memory-tiers.h    |   4 +
 include/linux/memory_hotplug.h  |  10 ++
 include/linux/mempolicy.h       |   2 +-
 include/linux/sched.h           |   6 +-
 init/init_task.c                |   2 +-
 kernel/cgroup/cpuset-internal.h |   8 ++
 kernel/cgroup/cpuset-v1.c       |   7 ++
 kernel/cgroup/cpuset.c          | 157 +++++++++++++++++++++-----------
 kernel/fork.c                   |   2 +-
 kernel/sched/fair.c             |   4 +-
 mm/hugetlb.c                    |   8 +-
 mm/memcontrol.c                 |   2 +-
 mm/memory-tiers.c               |  25 ++++-
 mm/memory_hotplug.c             |  25 +++++
 mm/mempolicy.c                  |  34 +++----
 mm/migrate.c                    |   4 +-
 mm/oom_kill.c                   |  11 ++-
 mm/page_alloc.c                 |  28 +++---
 mm/show_mem.c                   |   2 +-
 mm/slub.c                       |   4 +-
 mm/vmscan.c                     |   2 +-
 mm/zswap.c                      |  65 ++++++++++++-
 32 files changed, 411 insertions(+), 134 deletions(-)

-- 
2.51.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ