lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a50c223f-f439-4cbf-b061-ed1015e1ee68@gmail.com>
Date: Wed, 19 Feb 2025 12:35:23 -0800
From: Doug Berger <opendmb@...il.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Daniel Bristot de Oliveira <bristot@...hat.com>,
 Florian Fainelli <florian.fainelli@...adcom.com>,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/topology: clear freecpu bit on detach

Does anyone have any additional feedback or suggestions for a better 
solution to this issue, or should I resubmit without the RFC prefix?

Thanks,
     Doug

On 1/14/2025 3:04 PM, Doug Berger wrote:
> There is a hazard in the deadline scheduler where an offlined CPU
> can have its free_cpus bit left set in the def_root_domain when
> the schedutil cpufreq governor is used. This can allow a deadline
> thread to be pushed to the runqueue of a powered down CPU which
> breaks scheduling. The details can be found here:
> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
> 
> The free_cpus mask is expected to be cleared by set_rq_offline();
> however, the hazard occurs before the root domain is made online
> during CPU hotplug so that function is not invoked for the CPU
> that is being made active.
> 
> This commit works around the issue by ensuring the free_cpus bit
> for a CPU is always cleared when the CPU is removed from a
> root_domain. This likely makes the call of cpudl_clear_freecpu()
> in rq_offline_dl() fully redundant, but I have not removed it
> here because I am not certain of all flows.
> 
> It seems likely that a better solution is possible from someone
> more familiar with the scheduler implementation, but this
> approach is minimally invasive from someone who is not.
> 
> Fixes: 120455c514f7 ("sched: Fix hotplug vs CPU bandwidth control")
> Signed-off-by: Doug Berger <opendmb@...il.com>
> ---
>   kernel/sched/topology.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index da33ec9e94ab..3cbc14953c36 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -499,6 +499,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
>   			set_rq_offline(rq);
>   
>   		cpumask_clear_cpu(rq->cpu, old_rd->span);
> +		cpudl_clear_freecpu(&old_rd->cpudl, rq->cpu);
>   
>   		/*
>   		 * If we don't want to free the old_rd yet then


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ