[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBtp98E9q37FLeMv@localhost.localdomain>
Date: Wed, 7 May 2025 16:11:03 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Xi Wang <xii@...gle.com>
Cc: Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
David Rientjes <rientjes@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Waiman Long <longman@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Michal Koutný <mkoutny@...e.com>,
Lai Jiangshan <jiangshanlai@...il.com1>,
Vlastimil Babka <vbabka@...e.cz>,
Dan Carpenter <dan.carpenter@...aro.org>,
Chen Yu <yu.c.chen@...el.com>, Kees Cook <kees@...nel.org>,
Yu-Chun Lin <eleanor15x@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Mickaël Salaün <mic@...ikod.net>
Subject: Re: [RFC/PATCH] sched: Support moving kthreads into cpuset cgroups
Le Tue, May 06, 2025 at 08:43:57PM -0700, Xi Wang a écrit :
> On Tue, May 6, 2025 at 5:17 PM Tejun Heo <tj@...nel.org> wrote:
> For the use cases, there are two major requirements at the moment:
>
> Dynamic cpu affinity based isolation: CPUs running latency sensitive threads
> (vcpu threads) can change over time. We'd like to configure kernel thread
> affinity at run time too.
I would expect such latency sensitive application to run on isolated
partitions. And those already don't pull unbound kthreads.
> Changing cpu affinity at run time requires cpumask
> calculations and thread migrations. Sharing cpuset code would be nice.
There is already some (recent) affinity management in the kthread subsystem.
A list of kthreads having a preferred affinity (but !PF_NO_SETAFFINITY)
is maintained and automatically handled against hotplug events and housekeeping
state.
>
> Support numa based memory daemon affinity: We'd like to restrict kernel memory
> daemons but maintain their numa affinity at the same time. cgroup hierarchies
> can be helpful, e.g. create kernel, kernel/node0 and kernel/node1 and move the
> daemons to the right cgroup.
The kthread subsystem also handles node affinity. See kswapd / kcompactd. And it
takes care of that while still honouring isolated / isolcpus partitions:
d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node")
>
> Workqueue coverage is optional. kworker threads can use their separate
> mechanisms too.
>
> Since the goal is isolation, we'd like to restrict as many kthreads as possible,
> even the ones that don't directly interact with user applications.
>
> The kthreadd case is handled - a new kthread can be forked inside a non root
> cgroup, but based on flags it can move itself to the root cgroup before threadfn
> is called.
kthreadd and other kthreads that don't have a preferred affinity are also
affine outside isolcpus/nohz_full. And since isolated cpuset partitions
create NULL domains, those kthreads won't run there either.
What am I missing?
Thanks.
--
Frederic Weisbecker
SUSE Labs
Powered by blists - more mailing lists