linux-kernel - Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 10 Nov 2023 17:05:50 -1000
From:   "tj@...nel.org" <tj@...nel.org>
To:     Gregory Price <gregory.price@...verge.com>
Cc:     John Groves <john@...alactic.com>,
        Gregory Price <gourry.memverge@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "ying.huang@...el.com" <ying.huang@...el.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "lizefan.x@...edance.com" <lizefan.x@...edance.com>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "corbet@....net" <corbet@....net>,
        "roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
        "shakeelb@...gle.com" <shakeelb@...gle.com>,
        "muchun.song@...ux.dev" <muchun.song@...ux.dev>,
        "jgroves@...ron.com" <jgroves@...ron.com>
Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

Hello, Gregory.

On Fri, Nov 10, 2023 at 05:29:25PM -0500, Gregory Price wrote:
> I did originally implement it this way, but note that it will either
> require some creative extension of set_mempolicy or even set_mempolicy2
> as proposed here:
> 
> https://lore.kernel.org/all/20231003002156.740595-1-gregory.price@memverge.com/
> 
> One of the problems to consider is task migration.  If a task is
> migrated from one socket to another, for example by being moved to a new
> cgroup with a different cpuset - the weights might be completely nonsensical
> for the new allowed topology.
> 
> Unfortunately mpol has no way of being changed from outside the task
> itself once it's applied, other than changing its nodemasks via cpusets.

Maybe it's time to add one?

> So one concrete use case: kubernetes might like change cpusets or move
> tasks from one cgroup to another, or a vm might be migrated from one set
> of nodes to enother (technically not mutually exclusive here).  Some
> memory policy settings (like weights) may no longer apply when this
> happens, so it would be preferable to have a way to change them.

Neither covers all use cases. As you noted in your mempolicy message, if the
application wants finer grained control, cgroup interface isn't great. In
general, any changes which are dynamically initiated by the application
itself isn't a great fit for cgroup.

I'm generally pretty awry of adding non-resource group configuration
interface especially when they don't have counter part in the regular
per-process/thread API for a few reasons:

1. The reason why people try to add those through cgroup somtimes is because
   it seems easier to add those new features through cgroup, which may be
   true to some degree, but shortcuts often aren't very conducive to long
   term maintainability.

2. As noted above, just having cgroup often excludes a signficant portion of
   use cases. Not all systems enable cgroups and programatic accesses from
   target processes / threads are coarse-grained and can be really awakward.

3. Cgroup can be convenient when group config change is necessary. However,
   we really don't want to keep adding kernel interface just for changing
   configs for a group of threads. For config changes which aren't high
   frequency, userspace iterating the member processes and applying the
   changes if possible is usually good enough which usually involves looping
   until no new process is found. If the looping is problematic, cgroup
   freezer can be used to atomically stop all member threads to provide
   atomicity too.

Thanks.

-- 
tejun