lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48078454-f441-4699-9c50-db93783f00fd@nvidia.com>
Date: Wed, 26 Nov 2025 14:23:23 +1100
From: Balbir Singh <balbirs@...dia.com>
To: Gregory Price <gourry@...rry.net>, linux-mm@...ck.org
Cc: kernel-team@...a.com, linux-cxl@...r.kernel.org,
 linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev,
 linux-fsdevel@...r.kernel.org, cgroups@...r.kernel.org, dave@...olabs.net,
 jonathan.cameron@...wei.com, dave.jiang@...el.com,
 alison.schofield@...el.com, vishal.l.verma@...el.com, ira.weiny@...el.com,
 dan.j.williams@...el.com, longman@...hat.com, akpm@...ux-foundation.org,
 david@...hat.com, lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com,
 vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
 osalvador@...e.de, ziy@...dia.com, matthew.brost@...el.com,
 joshua.hahnjy@...il.com, rakie.kim@...com, byungchul@...com,
 ying.huang@...ux.alibaba.com, apopple@...dia.com, mingo@...hat.com,
 peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, vschneid@...hat.com, tj@...nel.org, hannes@...xchg.org,
 mkoutny@...e.com, kees@...nel.org, muchun.song@...ux.dev,
 roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, rientjes@...gle.com,
 jackmanb@...gle.com, cl@...two.org, harry.yoo@...cle.com,
 axelrasmussen@...gle.com, yuanchu@...gle.com, weixugc@...gle.com,
 zhengqi.arch@...edance.com, yosry.ahmed@...ux.dev, nphamcs@...il.com,
 chengming.zhou@...ux.dev, fabio.m.de.francesco@...ux.intel.com,
 rrichter@....com, ming.li@...omail.com, usamaarif642@...il.com,
 brauner@...nel.org, oleg@...hat.com, namcao@...utronix.de,
 escape@...ux.alibaba.com, dongjoo.seo1@...sung.com
Subject: Re: [RFC LPC2026 PATCH v2 00/11] Specific Purpose Memory NUMA Nodes

On 11/13/25 06:29, Gregory Price wrote:
> This is a code RFC for discussion related to
> 
> "Mempolicy is dead, long live memory policy!"
> https://lpc.events/event/19/contributions/2143/
> 

:)

I am trying to read through your series, but in the past I tried
https://lwn.net/Articles/720380/

> base-commit: 24172e0d79900908cf5ebf366600616d29c9b417
> (version notes at end)
> 
> At LSF 2026, I plan to discuss:
> - Why? (In short: shunting to DAX is a failed pattern for users)
> - Other designs I considered (mempolicy, cpusets, zone_device)
> - Why mempolicy.c and cpusets as-is are insufficient
> - SPM types seeking this form of interface (Accelerator, Compression)
> - Platform extensions that would be nice to see (SPM-only Bits)
> 
> Open Questions
> - Single SPM nodemask, or multiple based on features?
> - Apply SPM/SysRAM bit on-boot only or at-hotplug?
> - Allocate extra "possible" NUMA nodes for flexbility?
> - Should SPM Nodes be zone-restricted? (MOVABLE only?)
> - How to handle things like reclaim and compaction on these nodes.
> 
> 
> With this set, we aim to enable allocation of "special purpose memory"
> with the page allocator (mm/page_alloc.c) without exposing the same
> memory as "System RAM".  Unless a non-userland component, and does so
> with the GFP_SPM_NODE flag, memory on these nodes cannot be allocated.
> 
> This isolation mechanism is a requirement for memory policies which
> depend on certain sets of memory never being used outside special
> interfaces (such as a specific mm/component or driver).
> 
> We present an example of using this mechanism within ZSWAP, as-if
> a "compressed memory node" was present.  How to describe the features
> of memory present on nodes is left up to comment here and at LPC '26.
> 
> Userspace-driven allocations are restricted by the sysram_nodes mask,
> nothing in userspace can explicitly request memory from SPM nodes.
> 
> Instead, the intent is to create new components which understand memory
> features and register those nodes with those components. This abstracts
> the hardware complexity away from userland while also not requiring new
> memory innovations to carry entirely new allocators.
> 
> The ZSwap example demonstrates this with the `mt_spm_nodemask`.  This
> hack treats all spm nodes as-if they are compressed memory nodes, and
> we bypass the software compression logic in zswap in favor of simply
> copying memory directly to the allocated page.  In a real design
> 
> There are 4 major changes in this set:
> 
> 1) Introducing mt_sysram_nodelist in mm/memory-tiers.c which denotes
>    the set of nodes which are eligible for use as normal system ram
> 
>    Some existing users now pass mt_sysram_nodelist into the page
>    allocator instead of NULL, but passing a NULL pointer in will simply
>    have it replaced by mt_sysram_nodelist anyway.  Should a fully NULL
>    pointer still make it to the page allocator, without GFP_SPM_NODE
>    SPM node zones will simply be skipped.
> 
>    mt_sysram_nodelist is always guaranteed to contain the N_MEMORY nodes
>    present during __init, but if empty the use of mt_sysram_nodes()
>    will return a NULL to preserve current behavior.
> 
> 
> 2) The addition of `cpuset.mems.sysram` which restricts allocations to
>    `mt_sysram_nodes` unless GFP_SPM_NODE is used.
> 
>    SPM Nodes are still allowed in cpuset.mems.allowed and effective.
> 
>    This is done to allow separate control over sysram and SPM node sets
>    by cgroups while maintaining the existing hierarchical rules.
> 
>    current cpuset configuration
>    cpuset.mems_allowed
>     |.mems_effective         < (mems_allowed ∩ parent.mems_effective)
>     |->tasks.mems_allowed    < cpuset.mems_effective
> 
>    new cpuset configuration
>    cpuset.mems_allowed
>     |.mems_effective         < (mems_allowed ∩ parent.mems_effective)
>     |.sysram_nodes           < (mems_effective ∩ default_sys_nodemask)
>     |->task.sysram_nodes     < cpuset.sysram_nodes
> 
>    This means mems_allowed still restricts all node usage in any given
>    task context, which is the existing behavior.
> 
> 3) Addition of MHP_SPM_NODE flag to instruct memory_hotplug.c that the
>    capacity being added should mark the node as an SPM Node. 
> 
>    A node is either SysRAM or SPM - never both.  Attempting to add
>    incompatible memory to a node results in hotplug failure.
> 
>    DAX and CXL are made aware of the bit and have `spm_node` bits added
>    to their relevant subsystems.
> 
> 4) Adding GFP_SPM_NODE - which allows page_alloc.c to request memory
>    from the provided node or nodemask.  It changes the behavior of
>    the cpuset mems_allowed and mt_node_allowed() checks.
> 
> v1->v2:
> - naming improvements
>     default_node -> sysram_node
>     protected    -> spm (Specific Purpose Memory)
> - add missing constify patch
> - add patch to update callers of __cpuset_zone_allowed
> - add additional logic to the mm sysram_nodes patch
> - fix bot build issues (ifdef config builds)
> - fix out-of-tree driver build issues (function renames)
> - change compressed_nodelist to spm_nodelist
> - add latch mechanism for sysram/spm nodes (Dan Williams)
>   this drops some extra memory-hotplug logic which is nice
> v1: https://lore.kernel.org/linux-mm/20251107224956.477056-1-gourry@gourry.net/
> 
> Gregory Price (11):
>   mm: constify oom_control, scan_control, and alloc_context nodemask
>   mm: change callers of __cpuset_zone_allowed to cpuset_zone_allowed
>   gfp: Add GFP_SPM_NODE for Specific Purpose Memory (SPM) allocations
>   memory-tiers: Introduce SysRAM and Specific Purpose Memory Nodes
>   mm: restrict slub, oom, compaction, and page_alloc to sysram by
>     default
>   mm,cpusets: rename task->mems_allowed to task->sysram_nodes
>   cpuset: introduce cpuset.mems.sysram
>   mm/memory_hotplug: add MHP_SPM_NODE flag
>   drivers/dax: add spm_node bit to dev_dax
>   drivers/cxl: add spm_node bit to cxl region
>   [HACK] mm/zswap: compressed ram integration example
> 
>  drivers/cxl/core/region.c       |  30 ++++++
>  drivers/cxl/cxl.h               |   2 +
>  drivers/dax/bus.c               |  39 ++++++++
>  drivers/dax/bus.h               |   1 +
>  drivers/dax/cxl.c               |   1 +
>  drivers/dax/dax-private.h       |   1 +
>  drivers/dax/kmem.c              |   2 +
>  fs/proc/array.c                 |   2 +-
>  include/linux/cpuset.h          |  62 +++++++------
>  include/linux/gfp_types.h       |   5 +
>  include/linux/memory-tiers.h    |  47 ++++++++++
>  include/linux/memory_hotplug.h  |  10 ++
>  include/linux/mempolicy.h       |   2 +-
>  include/linux/mm.h              |   4 +-
>  include/linux/mmzone.h          |   6 +-
>  include/linux/oom.h             |   2 +-
>  include/linux/sched.h           |   6 +-
>  include/linux/swap.h            |   2 +-
>  init/init_task.c                |   2 +-
>  kernel/cgroup/cpuset-internal.h |   8 ++
>  kernel/cgroup/cpuset-v1.c       |   7 ++
>  kernel/cgroup/cpuset.c          | 158 ++++++++++++++++++++------------
>  kernel/fork.c                   |   2 +-
>  kernel/sched/fair.c             |   4 +-
>  mm/compaction.c                 |  10 +-
>  mm/hugetlb.c                    |   8 +-
>  mm/internal.h                   |   2 +-
>  mm/memcontrol.c                 |   3 +-
>  mm/memory-tiers.c               |  66 ++++++++++++-
>  mm/memory_hotplug.c             |   7 ++
>  mm/mempolicy.c                  |  34 +++----
>  mm/migrate.c                    |   4 +-
>  mm/mmzone.c                     |   5 +-
>  mm/oom_kill.c                   |  11 ++-
>  mm/page_alloc.c                 |  57 +++++++-----
>  mm/show_mem.c                   |  11 ++-
>  mm/slub.c                       |  15 ++-
>  mm/vmscan.c                     |   6 +-
>  mm/zswap.c                      |  66 ++++++++++++-
>  39 files changed, 532 insertions(+), 178 deletions(-)
> 

Balbir

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ