[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a50c223f-f439-4cbf-b061-ed1015e1ee68@gmail.com>
Date: Wed, 19 Feb 2025 12:35:23 -0800
From: Doug Berger <opendmb@...il.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Florian Fainelli <florian.fainelli@...adcom.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/topology: clear freecpu bit on detach
Does anyone have any additional feedback or suggestions for a better
solution to this issue, or should I resubmit without the RFC prefix?
Thanks,
Doug
On 1/14/2025 3:04 PM, Doug Berger wrote:
> There is a hazard in the deadline scheduler where an offlined CPU
> can have its free_cpus bit left set in the def_root_domain when
> the schedutil cpufreq governor is used. This can allow a deadline
> thread to be pushed to the runqueue of a powered down CPU which
> breaks scheduling. The details can be found here:
> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
>
> The free_cpus mask is expected to be cleared by set_rq_offline();
> however, the hazard occurs before the root domain is made online
> during CPU hotplug so that function is not invoked for the CPU
> that is being made active.
>
> This commit works around the issue by ensuring the free_cpus bit
> for a CPU is always cleared when the CPU is removed from a
> root_domain. This likely makes the call of cpudl_clear_freecpu()
> in rq_offline_dl() fully redundant, but I have not removed it
> here because I am not certain of all flows.
>
> It seems likely that a better solution is possible from someone
> more familiar with the scheduler implementation, but this
> approach is minimally invasive from someone who is not.
>
> Fixes: 120455c514f7 ("sched: Fix hotplug vs CPU bandwidth control")
> Signed-off-by: Doug Berger <opendmb@...il.com>
> ---
> kernel/sched/topology.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index da33ec9e94ab..3cbc14953c36 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -499,6 +499,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
> set_rq_offline(rq);
>
> cpumask_clear_cpu(rq->cpu, old_rd->span);
> + cpudl_clear_freecpu(&old_rd->cpudl, rq->cpu);
>
> /*
> * If we don't want to free the old_rd yet then
Powered by blists - more mailing lists