[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f86acec-8aa0-4448-843f-509a182b5459@suse.cz>
Date: Tue, 17 Sep 2024 09:14:40 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Michal Hocko <mhocko@...e.com>
Cc: Frederic Weisbecker <frederic@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>, Kees Cook <kees@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
linux-mm@...ck.org, "Paul E. McKenney" <paulmck@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>, Boqun Feng <boqun.feng@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, rcu@...r.kernel.org
Subject: Re: [PATCH 12/19] kthread: Default affine kthread to its preferred
NUMA node
On 9/17/24 9:05 AM, Michal Hocko wrote:
> On Tue 17-09-24 09:01:08, Vlastimil Babka wrote:
>> On 9/17/24 8:26 AM, Michal Hocko wrote:
>>> On Tue 17-09-24 00:49:16, Frederic Weisbecker wrote:
>>>> Kthreads attached to a preferred NUMA node for their task structure
>>>> allocation can also be assumed to run preferrably within that same node.
>>>>
>>>> A more precise affinity is usually notified by calling
>>>> kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup.
>>>>
>>>> For the others, a default affinity to the node is desired and sometimes
>>>> implemented with more or less success when it comes to deal with hotplug
>>>> events and nohz_full / CPU Isolation interactions:
>>>>
>>>> - kcompactd is affine to its node and handles hotplug but not CPU Isolation
>>>> - kswapd is affine to its node and ignores hotplug and CPU Isolation
>>>> - A bunch of drivers create their kthreads on a specific node and
>>>> don't take care about affining further.
>>>>
>>>> Handle that default node affinity preference at the generic level
>>>> instead, provided a kthread is created on an actual node and doesn't
>>>> apply any specific affinity such as a given CPU or a custom cpumask to
>>>> bind to before its first wake-up.
>>>
>>> Makes sense.
>>>
>>>> This generic handling is aware of CPU hotplug events and CPU isolation
>>>> such that:
>>>>
>>>> * When a housekeeping CPU goes up and is part of the node of a given
>>>> kthread, it is added to its applied affinity set (and
>>>> possibly the default last resort online housekeeping set is removed
>>>> from the set).
>>>>
>>>> * When a housekeeping CPU goes down while it was part of the node of a
>>>> kthread, it is removed from the kthread's applied
>>>> affinity. The last resort is to affine the kthread to all online
>>>> housekeeping CPUs.
>>>
>>> But I am not really sure about this part. Sure it makes sense to set the
>>> affinity to exclude isolated CPUs but why do we care about hotplug
>>> events at all. Let's say we offline all cpus from a given node (or
>>> that all but isolated cpus are offline - is this even
>>> realistic/reasonable usecase?). Wouldn't scheduler ignore the kthread's
>>> affinity in such a case? In other words how is that different from
>>> tasksetting an userspace task to a cpu that goes offline? We still do
>>> allow such a task to run, right? We just do not care about affinity
>>> anymore.
>>
>> AFAIU it handles better the situation where all houskeeping cpus from
>> the preferred node go down, then it affines to houskeeping cpus from any
>> node vs any cpu including isolated ones.
>
> Doesn't that happen automagically? Or can it end up on a random
> isolated cpu?
Good question, perhaps it can and there's no automagic, as I see code like:
+ /* Make sure the kthread never gets re-affined globally */
+ set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
Powered by blists - more mailing lists