lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWUSeFzxouq2vwg8@gourry-fedora-PF4VCD3F>
Date: Mon, 12 Jan 2026 10:25:44 -0500
From: Gregory Price <gourry@...rry.net>
To: Michal Koutný <mkoutny@...e.com>
Cc: linux-mm@...ck.org, cgroups@...r.kernel.org, linux-cxl@...r.kernel.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, kernel-team@...a.com,
	longman@...hat.com, tj@...nel.org, hannes@...xchg.org,
	corbet@....net, gregkh@...uxfoundation.org, rafael@...nel.org,
	dakr@...nel.org, dave@...olabs.net, jonathan.cameron@...wei.com,
	dave.jiang@...el.com, alison.schofield@...el.com,
	vishal.l.verma@...el.com, ira.weiny@...el.com,
	dan.j.williams@...el.com, akpm@...ux-foundation.org, vbabka@...e.cz,
	surenb@...gle.com, mhocko@...e.com, jackmanb@...gle.com,
	ziy@...dia.com, david@...nel.org, lorenzo.stoakes@...cle.com,
	Liam.Howlett@...cle.com, rppt@...nel.org, axelrasmussen@...gle.com,
	yuanchu@...gle.com, weixugc@...gle.com, yury.norov@...il.com,
	linux@...musvillemoes.dk, rientjes@...gle.com,
	shakeel.butt@...ux.dev, chrisl@...nel.org, kasong@...cent.com,
	shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
	baohua@...nel.org, yosry.ahmed@...ux.dev, chengming.zhou@...ux.dev,
	roman.gushchin@...ux.dev, muchun.song@...ux.dev, osalvador@...e.de,
	matthew.brost@...el.com, joshua.hahnjy@...il.com, rakie.kim@...com,
	byungchul@...com, ying.huang@...ux.alibaba.com, apopple@...dia.com,
	cl@...two.org, harry.yoo@...cle.com, zhengqi.arch@...edance.com
Subject: Re: [RFC PATCH v3 5/8] Documentation/admin-guide/cgroups: update
 docs for mems_allowed

On Mon, Jan 12, 2026 at 03:30:26PM +0100, Michal Koutný wrote:
> Hello.
> 
> On Thu, Jan 08, 2026 at 03:37:52PM -0500, Gregory Price <gourry@...rry.net> wrote:
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -2530,8 +2530,11 @@ Cpuset Interface Files
> >  	cpuset-enabled cgroups.
> >  
> >  	It lists the onlined memory nodes that are actually granted to
> > -	this cgroup by its parent. These memory nodes are allowed to
> > -	be used by tasks within the current cgroup.
> > +	this cgroup by its parent.  This includes both regular SystemRAM
> > +	nodes (N_MEMORY) and Private Nodes (N_PRIVATE) that provide
> > +	device-specific memory not intended for general consumption.
> > +	Tasks within this cgroup may access Private Nodes using explicit
> > +	__GFP_THISNODE allocations if the node is in this mask.
> 
> Notice that these files are exposed for userspace. Hence I'm not sure
> they'd be able to ask for allocations like this (or even need to know
> about this implementation detail).
>

Fair, I can drop this, the intent is actually to limit user-space
knowledge of this at all.

> >  
> >  	If "cpuset.mems" is empty, it shows all the memory nodes from the
> >  	parent cgroup that will be available to be used by this cgroup.
> > @@ -2541,6 +2544,25 @@ Cpuset Interface Files
> >  
> >  	Its value will be affected by memory nodes hotplug events.
> >  
> > +  cpuset.mems.sysram
> > +	A read-only multiple values file which exists on all
> > +	cpuset-enabled cgroups.
> > +
> > +	It lists the SystemRAM nodes (N_MEMORY) that are available for
> > +	general memory allocation by tasks within this cgroup.  This is
> > +	a subset of "cpuset.mems.effective" that excludes Private Nodes.
> > +
> > +	Normal page allocations are restricted to nodes in this mask.
> > +	The kernel page allocator, slab allocator, and compaction only
> > +	consider SystemRAM nodes when allocating memory for tasks.
> > +
> > +	Private Nodes are excluded from this mask because their memory
> > +	is managed by device drivers for specific purposes (e.g., CXL
> > +	compressed memory, accelerator memory) and should not be used
> > +	for general allocations.
> 
> So I wonder whether the N_PRIVATE nodes should be included in
> cpuset.mems[.effective] at all.

I think it makes the control path easier (both more intuitive and easier
to write in the cpuset code), but I can take another look at this.

Although omitting them from .effective i think prevents the user from
controlling whether their memory ends up on that node. 

i.e. the user might be aware that they have compressed memory on node N,
and they have a cgroup that they don't want on node N - not having it
included in mems.allowed / mems.effective means they can't control this.

> (It resembles CPU isolation to me a bit ~ cpuset.cpus.isolated.)
> Maybe you only want to expose it on the root cpuset cg and inverted like
> cpuset.mems.private?
>

Hm, I had not considered adding the separate mask for .private as
opposed to sysram.

If all we actually need to change is the allowed() callback to check an
additional nodemask, that might end up cleaner.

Thank you, I'll take another look at this piece.

~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ