lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cabaaa35-5207-439d-b09d-bea741194535@gmail.com>
Date: Fri, 25 Jul 2025 15:33:28 -0700
From: Doug Berger <opendmb@...il.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Florian Fainelli <florian.fainelli@...adcom.com>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/topology: clear freecpu bit on detach

I have observed a separate hazard that can occur when offlining a CPU
that is not addressed by this work around.

I intend to submit a more targeted solution to this issue in the near 
future, so please continue to disregard this submission :).

Thanks,
     Doug

On 4/22/2025 12:48 PM, Doug Berger wrote:
> There is a hazard in the deadline scheduler where an offlined CPU
> can have its free_cpus bit left set in the def_root_domain when
> the schedutil cpufreq governor is used. This can allow a deadline
> thread to be pushed to the runqueue of a powered down CPU which
> breaks scheduling. The details can be found here:
> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
> 
> The free_cpus mask is expected to be cleared by set_rq_offline();
> however, the hazard occurs before the root domain is made online
> during CPU hotplug so that function is not invoked for the CPU
> that is being made active.
> 
> This commit works around the issue by ensuring the free_cpus bit
> for a CPU is always cleared when the CPU is removed from a
> root_domain. This likely makes the call of cpudl_clear_freecpu()
> in rq_offline_dl() fully redundant, but I have not removed it
> here because I am not certain of all flows.
> 
> It seems likely that a better solution is possible from someone
> more familiar with the scheduler implementation, but this
> approach is minimally invasive from someone who is not.
> 
> Signed-off-by: Doug Berger <opendmb@...il.com>
> ---
>   kernel/sched/topology.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index a2a38e1b6f18..c10c5385031f 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -496,6 +496,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
>   			set_rq_offline(rq);
>   
>   		cpumask_clear_cpu(rq->cpu, old_rd->span);
> +		cpudl_clear_freecpu(&old_rd->cpudl, rq->cpu);
>   
>   		/*
>   		 * If we don't want to free the old_rd yet then


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ