[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWUSeFzxouq2vwg8@gourry-fedora-PF4VCD3F>
Date: Mon, 12 Jan 2026 10:25:44 -0500
From: Gregory Price <gourry@...rry.net>
To: Michal Koutný <mkoutny@...e.com>
Cc: linux-mm@...ck.org, cgroups@...r.kernel.org, linux-cxl@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, kernel-team@...a.com,
longman@...hat.com, tj@...nel.org, hannes@...xchg.org,
corbet@....net, gregkh@...uxfoundation.org, rafael@...nel.org,
dakr@...nel.org, dave@...olabs.net, jonathan.cameron@...wei.com,
dave.jiang@...el.com, alison.schofield@...el.com,
vishal.l.verma@...el.com, ira.weiny@...el.com,
dan.j.williams@...el.com, akpm@...ux-foundation.org, vbabka@...e.cz,
surenb@...gle.com, mhocko@...e.com, jackmanb@...gle.com,
ziy@...dia.com, david@...nel.org, lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com, rppt@...nel.org, axelrasmussen@...gle.com,
yuanchu@...gle.com, weixugc@...gle.com, yury.norov@...il.com,
linux@...musvillemoes.dk, rientjes@...gle.com,
shakeel.butt@...ux.dev, chrisl@...nel.org, kasong@...cent.com,
shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
baohua@...nel.org, yosry.ahmed@...ux.dev, chengming.zhou@...ux.dev,
roman.gushchin@...ux.dev, muchun.song@...ux.dev, osalvador@...e.de,
matthew.brost@...el.com, joshua.hahnjy@...il.com, rakie.kim@...com,
byungchul@...com, ying.huang@...ux.alibaba.com, apopple@...dia.com,
cl@...two.org, harry.yoo@...cle.com, zhengqi.arch@...edance.com
Subject: Re: [RFC PATCH v3 5/8] Documentation/admin-guide/cgroups: update
docs for mems_allowed
On Mon, Jan 12, 2026 at 03:30:26PM +0100, Michal Koutný wrote:
> Hello.
>
> On Thu, Jan 08, 2026 at 03:37:52PM -0500, Gregory Price <gourry@...rry.net> wrote:
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -2530,8 +2530,11 @@ Cpuset Interface Files
> > cpuset-enabled cgroups.
> >
> > It lists the onlined memory nodes that are actually granted to
> > - this cgroup by its parent. These memory nodes are allowed to
> > - be used by tasks within the current cgroup.
> > + this cgroup by its parent. This includes both regular SystemRAM
> > + nodes (N_MEMORY) and Private Nodes (N_PRIVATE) that provide
> > + device-specific memory not intended for general consumption.
> > + Tasks within this cgroup may access Private Nodes using explicit
> > + __GFP_THISNODE allocations if the node is in this mask.
>
> Notice that these files are exposed for userspace. Hence I'm not sure
> they'd be able to ask for allocations like this (or even need to know
> about this implementation detail).
>
Fair, I can drop this, the intent is actually to limit user-space
knowledge of this at all.
> >
> > If "cpuset.mems" is empty, it shows all the memory nodes from the
> > parent cgroup that will be available to be used by this cgroup.
> > @@ -2541,6 +2544,25 @@ Cpuset Interface Files
> >
> > Its value will be affected by memory nodes hotplug events.
> >
> > + cpuset.mems.sysram
> > + A read-only multiple values file which exists on all
> > + cpuset-enabled cgroups.
> > +
> > + It lists the SystemRAM nodes (N_MEMORY) that are available for
> > + general memory allocation by tasks within this cgroup. This is
> > + a subset of "cpuset.mems.effective" that excludes Private Nodes.
> > +
> > + Normal page allocations are restricted to nodes in this mask.
> > + The kernel page allocator, slab allocator, and compaction only
> > + consider SystemRAM nodes when allocating memory for tasks.
> > +
> > + Private Nodes are excluded from this mask because their memory
> > + is managed by device drivers for specific purposes (e.g., CXL
> > + compressed memory, accelerator memory) and should not be used
> > + for general allocations.
>
> So I wonder whether the N_PRIVATE nodes should be included in
> cpuset.mems[.effective] at all.
I think it makes the control path easier (both more intuitive and easier
to write in the cpuset code), but I can take another look at this.
Although omitting them from .effective i think prevents the user from
controlling whether their memory ends up on that node.
i.e. the user might be aware that they have compressed memory on node N,
and they have a cgroup that they don't want on node N - not having it
included in mems.allowed / mems.effective means they can't control this.
> (It resembles CPU isolation to me a bit ~ cpuset.cpus.isolated.)
> Maybe you only want to expose it on the root cpuset cg and inverted like
> cpuset.mems.private?
>
Hm, I had not considered adding the separate mask for .private as
opposed to sysram.
If all we actually need to change is the allowed() callback to check an
additional nodemask, that might end up cleaner.
Thank you, I'll take another look at this piece.
~Gregory
Powered by blists - more mailing lists