[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3bf95ee2-1340-41b1-9f5c-1563f953c6eb@redhat.com>
Date: Mon, 23 Jun 2025 13:34:58 -0400
From: Waiman Long <llong@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>,
LKML <linux-kernel@...r.kernel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Marco Crivellari <marco.crivellari@...e.com>, Michal Hocko
<mhocko@...e.com>, Peter Zijlstra <peterz@...radead.org>,
Tejun Heo <tj@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH 02/27] sched/isolation: Introduce housekeeping per-cpu
rwsem
On 6/20/25 11:22 AM, Frederic Weisbecker wrote:
> The HK_TYPE_DOMAIN isolation cpumask, and further the
> HK_TYPE_KERNEL_NOISE cpumask will be made modifiable at runtime in the
> future.
>
> The affected subsystems will need to synchronize against those cpumask
> changes so that:
>
> * The reader get a coherent snapshot
> * The housekeeping subsystem can safely propagate a cpumask update to
> the susbsytems after it has been published.
>
> Protect against readsides that can sleep with per-cpu rwsem. Updates are
> expected to be very rare given that CPU isolation is a niche usecase and
> related cpuset setup happen only in preparation work. On the other hand
> read sides can occur in more frequent paths.
>
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
Thanks for the patch series and it certainly has some good ideas.
However I am a bit concern about the overhead of using percpu-rwsem for
synchronization especially when the readers have to wait for the
completion on the writer side. From my point of view, during the
transition period when new isolated CPUs are being added or old ones
being removed, the reader will either get the old CPU data or the new
one depending on the exact timing. The effect the CPU selection may
persist for a while after the end of the critical section.
Can we just rely on RCU to make sure that it either get the new one or
the old one but nothing in between without the additional overhead?
My current thinking is to make use CPU hotplug to enable better CPU
isolation. IOW, I would shut down the affected CPUs, change the
housekeeping masks and then bring them back online again. That means the
writer side will take a while to complete.
Cheers,
Longman
> ---
> include/linux/sched/isolation.h | 7 +++++++
> kernel/sched/isolation.c | 12 ++++++++++++
> kernel/sched/sched.h | 1 +
> 3 files changed, 20 insertions(+)
>
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index f98ba0d71c52..8de4f625a5c1 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -41,6 +41,9 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type type)
> return true;
> }
>
> +extern void housekeeping_lock(void);
> +extern void housekeeping_unlock(void);
> +
> extern void __init housekeeping_init(void);
>
> #else
> @@ -73,6 +76,8 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type type)
> return true;
> }
>
> +static inline void housekeeping_lock(void) { }
> +static inline void housekeeping_unlock(void) { }
> static inline void housekeeping_init(void) { }
> #endif /* CONFIG_CPU_ISOLATION */
>
> @@ -84,4 +89,6 @@ static inline bool cpu_is_isolated(int cpu)
> cpuset_cpu_is_isolated(cpu);
> }
>
> +DEFINE_LOCK_GUARD_0(housekeeping, housekeeping_lock(), housekeeping_unlock())
> +
> #endif /* _LINUX_SCHED_ISOLATION_H */
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 83cec3853864..8c02eeccea3b 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -18,12 +18,24 @@ static cpumask_var_t housekeeping_cpumasks[HK_TYPE_MAX];
> unsigned long housekeeping_flags;
> EXPORT_SYMBOL_GPL(housekeeping_flags);
>
> +DEFINE_STATIC_PERCPU_RWSEM(housekeeping_pcpu_lock);
> +
> bool housekeeping_enabled(enum hk_type type)
> {
> return !!(housekeeping_flags & BIT(type));
> }
> EXPORT_SYMBOL_GPL(housekeeping_enabled);
>
> +void housekeeping_lock(void)
> +{
> + percpu_down_read(&housekeeping_pcpu_lock);
> +}
> +
> +void housekeeping_unlock(void)
> +{
> + percpu_up_read(&housekeeping_pcpu_lock);
> +}
> +
> int housekeeping_any_cpu(enum hk_type type)
> {
> int cpu;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 475bb5998295..0cdb560ef2f3 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -46,6 +46,7 @@
> #include <linux/mm.h>
> #include <linux/module.h>
> #include <linux/mutex_api.h>
> +#include <linux/percpu-rwsem.h>
> #include <linux/plist.h>
> #include <linux/poll.h>
> #include <linux/proc_fs.h>
Powered by blists - more mailing lists