[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b7aa4b10-1afb-476f-ac5d-d8db7151d866@redhat.com>
Date: Thu, 8 May 2025 15:34:56 -0400
From: Waiman Long <llong@...hat.com>
To: Xi Wang <xii@...gle.com>, Frederic Weisbecker <frederic@...nel.org>
Cc: Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
David Rientjes <rientjes@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, Vlastimil Babka <vbabka@...e.cz>,
Dan Carpenter <dan.carpenter@...aro.org>, Chen Yu <yu.c.chen@...el.com>,
Kees Cook <kees@...nel.org>, Yu-Chun Lin <eleanor15x@...il.com>,
Thomas Gleixner <tglx@...utronix.de>, Mickaël Salaün
<mic@...ikod.net>, jiangshanlai@...il.com
Subject: Re: [RFC/PATCH] sched: Support moving kthreads into cpuset cgroups
On 5/8/25 1:51 PM, Xi Wang wrote:
> I think our problem spaces are different. Perhaps your problems are closer to
> hard real-time systems but our problems are about improving latency of existing
> systems while maintaining efficiency (max supported cpu util).
>
> For hard real-time systems we sometimes throw cores at the problem and run no
> more than one thread per cpu. But if we want efficiency we have to share cpus
> with scheduling policies. Disconnecting the cpu scheduler with isolcpus results
> in losing too much of the machine capacity. CPU scheduling is needed for both
> kernel and userspace threads.
>
> For our use case we need to move kernel threads away from certain vcpu threads,
> but other vcpu threads can share cpus with kernel threads. The ratio changes
> from time to time. Permanently putting aside a few cpus results in a reduction
> in machine capacity.
>
> The PF_NO_SETAFFINTIY case is already handled by the patch. These threads will
> run in root cgroup with affinities just like before.
>
> The original justifications for the cpuset feature is here and the reasons are
> still applicable:
>
> "The management of large computer systems, with many processors (CPUs), complex
> memory cache hierarchies and multiple Memory Nodes having non-uniform access
> times (NUMA) presents additional challenges for the efficient scheduling and
> memory placement of processes."
>
> "But larger systems, which benefit more from careful processor and memory
> placement to reduce memory access times and contention.."
>
> "These subsets, or “soft partitions” must be able to be dynamically adjusted, as
> the job mix changes, without impacting other concurrently executing jobs."
>
> https://docs.kernel.org/admin-guide/cgroup-v1/cpusets.html
>
> -Xi
>
If you create a cpuset root partition, we are pushing some kthreads
aways from CPUs dedicated to the newly created partition which has its
own scheduling domain separate from the cgroup root. I do realize that
the current way of excluding only per cpu kthreads isn't quite right. So
I send out a new patch to extend to all the PF_NO_SETAFFINITY kthreads.
So instead of putting kthreads into the dedicated cpuset, we still keep
them in the root cgroup. Instead we can create a separate cpuset
partition to run the workload without interference from the background
kthreads. Will that functionality suit your current need?
Cheers,
Longman
Powered by blists - more mailing lists