[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aB4AmUtEM-qQ1Xoa@localhost.localdomain>
Date: Fri, 9 May 2025 15:18:17 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Waiman Long <longman@...hat.com>
Cc: Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Koutný <mkoutny@...e.com>,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
Xi Wang <xii@...gle.com>
Subject: Re: [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to
all PF_NO_SETAFFINITY tasks
Le Thu, May 08, 2025 at 03:24:13PM -0400, Waiman Long a écrit :
> Commit ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask()
> on top_cpuset") enabled us to pull CPUs dedicated to child partitions
> from tasks in top_cpuset by ignoring per cpu kthreads. However, there
> can be other kthreads that are not per cpu but have PF_NO_SETAFFINITY
> flag set to indicate that we shouldn't mess with their CPU affinity.
> For other kthreads, their affinity will be changed to skip CPUs dedicated
> to child partitions whether it is an isolating or a scheduling one.
>
> As all the per cpu kthreads have PF_NO_SETAFFINITY set, the
> PF_NO_SETAFFINITY tasks are essentially a superset of per cpu kthreads.
> Fix this issue by dropping the kthread_is_per_cpu() check and checking
> the PF_NO_SETAFFINITY flag instead.
>
> Fixes: ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset")
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
> kernel/cgroup/cpuset.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index d0143b3dce47..967603300ee3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1130,9 +1130,11 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>
> if (top_cs) {
> /*
> - * Percpu kthreads in top_cpuset are ignored
> + * PF_NO_SETAFFINITY tasks are ignored.
> + * All per cpu kthreads should have PF_NO_SETAFFINITY
> + * flag set, see kthread_set_per_cpu().
> */
> - if (kthread_is_per_cpu(task))
> + if (task->flags & PF_NO_SETAFFINITY)
> continue;
> cpumask_andnot(new_cpus, possible_mask, subpartitions_cpus);
Acked-by: Frederic Weisbecker <frederic@...nel.org>
But this makes me realize I overlooked that when I introduced the unbound kthreads
centralized affinity.
cpuset_update_tasks_cpumask() seem to blindly affine to subpartitions_cpus
while unbound kthreads might have their preferences (per-nodes or random cpumasks).
So I need to make that pass through kthread API.
It seems that subpartition_cpus doesn't contain nohz_full= CPUs.
But it excludes isolcpus=. And it's usually sane to assume that
nohz_full= CPUs are isolated.
I think I can just rename update_unbound_workqueue_cpumask()
to update_unbound_kthreads_cpumask() and then handle unbound
kthreads from there along with workqueues. And then completely
ignore kthreads from cpuset_update_tasks_cpumask().
Let me think about it (but feel free to apply the current patch meanwhile).
Thanks.
--
Frederic Weisbecker
SUSE Labs
Powered by blists - more mailing lists