lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z5EmtNh_lryVj0S3@localhost.localdomain>
Date: Wed, 22 Jan 2025 18:11:16 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Costa Shulyupin <costa.shul@...hat.com>
Cc: Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Koutný <mkoutny@...e.com>,
	Dan Carpenter <dan.carpenter@...aro.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Chen Yu <yu.c.chen@...el.com>, Kees Cook <kees@...nel.org>,
	Randy Dunlap <rdunlap@...radead.org>, linux-kernel@...r.kernel.org,
	cgroups@...r.kernel.org
Subject: Re: [RFC PATCH v1] Add kthreads_update_affinity()

Hi Costa,

Le Mon, Jan 13, 2025 at 09:08:54PM +0200, Costa Shulyupin a écrit :
> Changing and using housekeeping and isolated CPUs requires reboot.
> 
> The goal is to change CPU isolation dynamically without reboot.
> 
> This patch is based on the parent patch
> cgroup/cpuset: Exclude isolated CPUs from housekeeping CPU masks
> https://lore.kernel.org/lkml/20240821142312.236970-3-longman@redhat.com/
> Its purpose is to update isolation cpumasks.
> 
> However, some subsystems may use outdated housekeeping CPU masks. To
> prevent the use of these isolated CPUs, it is essential to explicitly
> propagate updates to the housekeeping masks across all subsystems that
> depend on them.
> 
> This patch is not intended to be merged and disrupt the kernel.
> It is still a proof-of-concept for research purposes.
> 
> The questions are:
> - Is this the right direction, or should I explore an alternative approach?
> - What factors need to be considered?
> - Any suggestions or advice?
> - Have similar attempts been made before?

Since the kthreads preferred affinity patchset just got merged,
I don't think anything needs to be done anymore in the kthreads front to toggle
nohz_full through cpusets safely. Let's see the details below:

> 
> Update the affinity of kthreadd and trigger the recalculation of kthread
> affinities using kthreads_online_cpu().
> 
> The argument passed to kthreads_online_cpu() is irrelevant, as the
> function reassigns affinities of kthreads based on their
> preferred_affinity and the updated housekeeping_cpumask(HK_TYPE_KTHREAD).
> 
> Currently only RCU uses kthread_affine_preferred().
> 
> I dare to try calling kthread_affine_preferred() from kthread_run() to
> set preferred_affinity as cpu_possible_mask for kthreads without a
> specific affinity, enabling their management through
> kthreads_online_cpu().
> 
> Any objections?
> 
> For details about kthreads affinity patterns please see:
> https://lore.kernel.org/lkml/20241211154035.75565-16-frederic@kernel.org/
> 
> Signed-off-by: Costa Shulyupin <costa.shul@...hat.com>
> ---
>  include/linux/kthread.h | 5 ++++-
>  kernel/cgroup/cpuset.c  | 1 +
>  kernel/kthread.c        | 6 ++++++
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 8d27403888ce9..b43c5aeb2cfd7 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -52,8 +52,10 @@ bool kthread_is_per_cpu(struct task_struct *k);
>  ({									   \
>  	struct task_struct *__k						   \
>  		= kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
> -	if (!IS_ERR(__k))						   \
> +	if (!IS_ERR(__k)) {						   \
> +		kthread_affine_preferred(__k, cpu_possible_mask);	   \
>  		wake_up_process(__k);					   \
> +	}								   \
>  	__k;								   \
>  })
>  
> @@ -270,4 +272,5 @@ struct cgroup_subsys_state *kthread_blkcg(void);
>  #else
>  static inline void kthread_associate_blkcg(struct cgroup_subsys_state *css) { }
>  #endif
> +void kthreads_update_affinity(void);
>  #endif /* _LINUX_KTHREAD_H */
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 65658a5c2ac81..7d71acc7f46b6 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1355,6 +1355,7 @@ static void update_isolation_cpumasks(bool isolcpus_updated)
>  	trl();
>  	ret = housekeeping_exlude_isolcpus(isolated_cpus, HOUSEKEEPING_FLAGS);
>  	WARN_ON_ONCE((ret < 0) && (ret != -EOPNOTSUPP));
> +	kthreads_update_affinity();

A few things to consider:

1) update_isolation_cpumasks() will be called with cpus_read_lock()
  (cf: sched_partition_write() and cpuset_write_resmask()), therefore
  kthreads_online_cpu() can't run concurrently.

2) The constraint to turn on/off a CPU as nohz_full will be that the
   target CPU is offline.

So when the target CPU will later become offline, kthreads_online_cpu()
will take care of the newly modified housekeeping_mask() (which will be visible
because cpus_write_lock() is held) and apply accordingly the appropriate
effective affinity.

The only thing that might need to be done by update_isolation_cpumasks() is
to hold kthreads_hotplug_lock so that a subsequent call to
kthread_affine_preferred() sees the "freshest" update to housekeeping_mask().
But even that may not be mandatory because the concerned CPUs are offline
and kthreads_online_cpu() will fix the race when they boot.

Oh and the special case kthreadd might need a direct affinity update.

So the good news is that we have a lot of things sorted out to prepare
for that cpuset interface:

_ RCU NOCB is ready
_ kthreads are ready
_ timers are fine since we toggle only offline CPUs and timer_migration.c
  should work without changes.

Still some details need to be taken care of:

* scheduler (see the housekeeping_mask() references, especially the ilb which is
  my biggest worry, get_nohz_timer_target() shouldn't be an issue)
  
* posix cpu timers (make tick_dep unconditional ?)

* kernel/watchdog.c (make proc_watchdog_cpumask() and
  lockup_detector_online_cpu() safe against update_isolation_cpumasks()
  
* workqueue unbound mask (just apply the new one?)

* some RCU tick_dep to handle (make them unconditional since they
  apply on slow path anyway?)
  
* other things? (grep for tick_nohz_full_cpu(), housekeeping_* and tick_dep_* )

But we are getting closer!

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ