[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00914d25-f0ae-4b00-9608-2457821c071c@suse.cz>
Date: Mon, 5 Aug 2024 16:53:51 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>, Kees Cook <kees@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
"Paul E. McKenney" <paulmck@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>, Boqun Feng <boqun.feng@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, rcu@...r.kernel.org
Subject: Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
On 8/5/24 16:28, Frederic Weisbecker wrote:
> Le Tue, Jul 30, 2024 at 05:49:51PM +0200, Vlastimil Babka a écrit :
>> Acked-by: Vlastimil Babka <vbabka@...e.cz>
>>
>> Nit:
>>
>> > +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
>> > +{
>> > + struct kthread *kthread = to_kthread(p);
>> > + cpumask_var_t affinity;
>> > + unsigned long flags;
>> > + int ret;
>> > +
>> > + if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
>> > + WARN_ON(1);
>> > + return -EINVAL;
>> > + }
>> > +
>>
>> Should we also fail if kthread->preferred_affinity already exist? In
>> case somebody calls this twice.
>
> Good point!
>
>>
>> Also for some of the use cases (kswapd, kcompactd) it would make sense
>> to be able to add cpus of a node as they are onlined. Which seems we
>> didn't do, except some corner case handling in kcompactd, but maybe we
>> should? I wonder if the current implementation of onlining a completely
>> new node with cpus does the right thing as a result of the individual
>> onlining operations, or we end up with being affined to a single cpu (or
>> none).
>>
>> But that would need some kind of kthread_affine_preferred_update()
>> implementation?
>
> So you mean that the "for_each_node_state()" loop in kcompactd doesn't
> handle all possible nodes but only those online when it's called? Or
> am I confused?
If you mean the loop in kcompactd_init() then indeed, but we also have a
hook in online_pages() to start new threads on newly onlined nodes, so
that's not a problem.
The problem (I think) I see is cpumask_of_node(pgdat->node_id) is a snapshot
of cpus running on the NUMA node the time, and is never updated later as new
cpus might be brought up.
kcompactd_cpu_online() does try to update that when cpus are onlined (in a
clumsy way), there was nothing like that for kswapd and after your series
this update is also removed for kcompactd.
> If all users of preferred affinity were to use NUMA nodes, it could be
> a good idea to do a flavour of kernel/smpboot.c which would handle
> per-node kthreads instead of per-cpu kthreads. I initially thought
> about that. It would have handled all the lifecycle of those kthreads,
> including creation, against hotplug. Unfortunately RCU doesn't rely on
> per-NUMA nodes but rather use its own tree.
>
> If there be more users of real per NUMA nodes kthreads than kswapd and
> kcompactd, of course that would be much worth considering.
Yeah it's not that compelling, but a way to update the preferred affine mask
in response to cpu hotplug events, that kswapd and kcompactd could use,
would be sufficient. And maybe more widely useful.
I guess there could be a callback defined for kthread to provide a new
preferred_affinity, that you'd call from kthreads_hotplug_update() ?
And kcompactd and kswapd could both use the same callback that interprets
kthread_data() as pgdat and fetches a new cpumask of it?
> Thanks.
Powered by blists - more mailing lists