[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z5EmtNh_lryVj0S3@localhost.localdomain>
Date: Wed, 22 Jan 2025 18:11:16 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Costa Shulyupin <costa.shul@...hat.com>
Cc: Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Koutný <mkoutny@...e.com>,
Dan Carpenter <dan.carpenter@...aro.org>,
Vlastimil Babka <vbabka@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Chen Yu <yu.c.chen@...el.com>, Kees Cook <kees@...nel.org>,
Randy Dunlap <rdunlap@...radead.org>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org
Subject: Re: [RFC PATCH v1] Add kthreads_update_affinity()
Hi Costa,
Le Mon, Jan 13, 2025 at 09:08:54PM +0200, Costa Shulyupin a écrit :
> Changing and using housekeeping and isolated CPUs requires reboot.
>
> The goal is to change CPU isolation dynamically without reboot.
>
> This patch is based on the parent patch
> cgroup/cpuset: Exclude isolated CPUs from housekeeping CPU masks
> https://lore.kernel.org/lkml/20240821142312.236970-3-longman@redhat.com/
> Its purpose is to update isolation cpumasks.
>
> However, some subsystems may use outdated housekeeping CPU masks. To
> prevent the use of these isolated CPUs, it is essential to explicitly
> propagate updates to the housekeeping masks across all subsystems that
> depend on them.
>
> This patch is not intended to be merged and disrupt the kernel.
> It is still a proof-of-concept for research purposes.
>
> The questions are:
> - Is this the right direction, or should I explore an alternative approach?
> - What factors need to be considered?
> - Any suggestions or advice?
> - Have similar attempts been made before?
Since the kthreads preferred affinity patchset just got merged,
I don't think anything needs to be done anymore in the kthreads front to toggle
nohz_full through cpusets safely. Let's see the details below:
>
> Update the affinity of kthreadd and trigger the recalculation of kthread
> affinities using kthreads_online_cpu().
>
> The argument passed to kthreads_online_cpu() is irrelevant, as the
> function reassigns affinities of kthreads based on their
> preferred_affinity and the updated housekeeping_cpumask(HK_TYPE_KTHREAD).
>
> Currently only RCU uses kthread_affine_preferred().
>
> I dare to try calling kthread_affine_preferred() from kthread_run() to
> set preferred_affinity as cpu_possible_mask for kthreads without a
> specific affinity, enabling their management through
> kthreads_online_cpu().
>
> Any objections?
>
> For details about kthreads affinity patterns please see:
> https://lore.kernel.org/lkml/20241211154035.75565-16-frederic@kernel.org/
>
> Signed-off-by: Costa Shulyupin <costa.shul@...hat.com>
> ---
> include/linux/kthread.h | 5 ++++-
> kernel/cgroup/cpuset.c | 1 +
> kernel/kthread.c | 6 ++++++
> 3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 8d27403888ce9..b43c5aeb2cfd7 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -52,8 +52,10 @@ bool kthread_is_per_cpu(struct task_struct *k);
> ({ \
> struct task_struct *__k \
> = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
> - if (!IS_ERR(__k)) \
> + if (!IS_ERR(__k)) { \
> + kthread_affine_preferred(__k, cpu_possible_mask); \
> wake_up_process(__k); \
> + } \
> __k; \
> })
>
> @@ -270,4 +272,5 @@ struct cgroup_subsys_state *kthread_blkcg(void);
> #else
> static inline void kthread_associate_blkcg(struct cgroup_subsys_state *css) { }
> #endif
> +void kthreads_update_affinity(void);
> #endif /* _LINUX_KTHREAD_H */
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 65658a5c2ac81..7d71acc7f46b6 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1355,6 +1355,7 @@ static void update_isolation_cpumasks(bool isolcpus_updated)
> trl();
> ret = housekeeping_exlude_isolcpus(isolated_cpus, HOUSEKEEPING_FLAGS);
> WARN_ON_ONCE((ret < 0) && (ret != -EOPNOTSUPP));
> + kthreads_update_affinity();
A few things to consider:
1) update_isolation_cpumasks() will be called with cpus_read_lock()
(cf: sched_partition_write() and cpuset_write_resmask()), therefore
kthreads_online_cpu() can't run concurrently.
2) The constraint to turn on/off a CPU as nohz_full will be that the
target CPU is offline.
So when the target CPU will later become offline, kthreads_online_cpu()
will take care of the newly modified housekeeping_mask() (which will be visible
because cpus_write_lock() is held) and apply accordingly the appropriate
effective affinity.
The only thing that might need to be done by update_isolation_cpumasks() is
to hold kthreads_hotplug_lock so that a subsequent call to
kthread_affine_preferred() sees the "freshest" update to housekeeping_mask().
But even that may not be mandatory because the concerned CPUs are offline
and kthreads_online_cpu() will fix the race when they boot.
Oh and the special case kthreadd might need a direct affinity update.
So the good news is that we have a lot of things sorted out to prepare
for that cpuset interface:
_ RCU NOCB is ready
_ kthreads are ready
_ timers are fine since we toggle only offline CPUs and timer_migration.c
should work without changes.
Still some details need to be taken care of:
* scheduler (see the housekeeping_mask() references, especially the ilb which is
my biggest worry, get_nohz_timer_target() shouldn't be an issue)
* posix cpu timers (make tick_dep unconditional ?)
* kernel/watchdog.c (make proc_watchdog_cpumask() and
lockup_detector_online_cpu() safe against update_isolation_cpumasks()
* workqueue unbound mask (just apply the new one?)
* some RCU tick_dep to handle (make them unconditional since they
apply on slow path anyway?)
* other things? (grep for tick_nohz_full_cpu(), housekeeping_* and tick_dep_* )
But we are getting closer!
Thanks!
Powered by blists - more mailing lists