linux-kernel - Re: [RFC/PATCH] sched: Support moving kthreads into cpuset cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aBtp98E9q37FLeMv@localhost.localdomain>
Date: Wed, 7 May 2025 16:11:03 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Xi Wang <xii@...gle.com>
Cc: Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
	cgroups@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	David Rientjes <rientjes@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Waiman Long <longman@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Koutný <mkoutny@...e.com>,
	Lai Jiangshan <jiangshanlai@...il.com1>,
	Vlastimil Babka <vbabka@...e.cz>,
	Dan Carpenter <dan.carpenter@...aro.org>,
	Chen Yu <yu.c.chen@...el.com>, Kees Cook <kees@...nel.org>,
	Yu-Chun Lin <eleanor15x@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Mickaël Salaün <mic@...ikod.net>
Subject: Re: [RFC/PATCH] sched: Support moving kthreads into cpuset cgroups

Le Tue, May 06, 2025 at 08:43:57PM -0700, Xi Wang a écrit :
> On Tue, May 6, 2025 at 5:17 PM Tejun Heo <tj@...nel.org> wrote:
> For the use cases, there are two major requirements at the moment:
> 
> Dynamic cpu affinity based isolation: CPUs running latency sensitive threads
> (vcpu threads) can change over time. We'd like to configure kernel thread
> affinity at run time too.

I would expect such latency sensitive application to run on isolated
partitions. And those already don't pull unbound kthreads.

> Changing cpu affinity at run time requires cpumask
> calculations and thread migrations. Sharing cpuset code would be nice.

There is already some (recent) affinity management in the kthread subsystem.
A list of kthreads having a preferred affinity (but !PF_NO_SETAFFINITY)
is maintained and automatically handled against hotplug events and housekeeping
state.

> 
> Support numa based memory daemon affinity: We'd like to restrict kernel memory
> daemons but maintain their numa affinity at the same time. cgroup hierarchies
> can be helpful, e.g. create kernel, kernel/node0 and kernel/node1 and move the
> daemons to the right cgroup.

The kthread subsystem also handles node affinity. See kswapd / kcompactd. And it
takes care of that while still honouring isolated / isolcpus partitions:

      d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node")

> 
> Workqueue coverage is optional. kworker threads can use their separate
> mechanisms too.
> 
> Since the goal is isolation, we'd like to restrict as many kthreads as possible,
> even the ones that don't directly interact with user applications.
> 
> The kthreadd case is handled - a new kthread can be forked inside a non root
> cgroup, but based on flags it can move itself to the root cgroup before threadfn
> is called.

kthreadd and other kthreads that don't have a preferred affinity are also
affine outside isolcpus/nohz_full. And since isolated cpuset partitions
create NULL domains, those kthreads won't run there either.

What am I missing?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs