[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cabaaa35-5207-439d-b09d-bea741194535@gmail.com>
Date: Fri, 25 Jul 2025 15:33:28 -0700
From: Doug Berger <opendmb@...il.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Florian Fainelli <florian.fainelli@...adcom.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/topology: clear freecpu bit on detach
I have observed a separate hazard that can occur when offlining a CPU
that is not addressed by this work around.
I intend to submit a more targeted solution to this issue in the near
future, so please continue to disregard this submission :).
Thanks,
Doug
On 4/22/2025 12:48 PM, Doug Berger wrote:
> There is a hazard in the deadline scheduler where an offlined CPU
> can have its free_cpus bit left set in the def_root_domain when
> the schedutil cpufreq governor is used. This can allow a deadline
> thread to be pushed to the runqueue of a powered down CPU which
> breaks scheduling. The details can be found here:
> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
>
> The free_cpus mask is expected to be cleared by set_rq_offline();
> however, the hazard occurs before the root domain is made online
> during CPU hotplug so that function is not invoked for the CPU
> that is being made active.
>
> This commit works around the issue by ensuring the free_cpus bit
> for a CPU is always cleared when the CPU is removed from a
> root_domain. This likely makes the call of cpudl_clear_freecpu()
> in rq_offline_dl() fully redundant, but I have not removed it
> here because I am not certain of all flows.
>
> It seems likely that a better solution is possible from someone
> more familiar with the scheduler implementation, but this
> approach is minimally invasive from someone who is not.
>
> Signed-off-by: Doug Berger <opendmb@...il.com>
> ---
> kernel/sched/topology.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index a2a38e1b6f18..c10c5385031f 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -496,6 +496,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
> set_rq_offline(rq);
>
> cpumask_clear_cpu(rq->cpu, old_rd->span);
> + cpudl_clear_freecpu(&old_rd->cpudl, rq->cpu);
>
> /*
> * If we don't want to free the old_rd yet then
Powered by blists - more mailing lists