linux-kernel - Re: [PATCH v4] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce8ec4491cfd17a374177918537a6b3be34dbb43.camel@mediatek.com>
Date: Wed, 2 Apr 2025 06:53:24 +0000
From: Kuyo Chang (張建文) <Kuyo.Chang@...iatek.com>
To: "frederic@...nel.org" <frederic@...nel.org>,
	Walter Chang (張維哲) <Walter.Chang@...iatek.com>
CC: wsd_upstream <wsd_upstream@...iatek.com>, "boqun.feng@...il.com"
	<boqun.feng@...il.com>, "vlad.wing@...il.com" <vlad.wing@...il.com>,
	Cheng-Jui Wang (王正睿)
	<Cheng-Jui.Wang@...iatek.com>, "kernel-team@...a.com" <kernel-team@...a.com>,
	Alex Hoh (賀振坤) <Alex.Hoh@...iatek.com>,
	"usamaarif642@...il.com" <usamaarif642@...il.com>, "anna-maria@...utronix.de"
	<anna-maria@...utronix.de>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "neeraj.upadhyay@....com"
	<neeraj.upadhyay@....com>, "leitao@...ian.org" <leitao@...ian.org>,
	Freddy Hsin (辛恒豐) <Freddy.Hsin@...iatek.com>,
	"urezki@...il.com" <urezki@...il.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "qiang.zhang1211@...il.com"
	<qiang.zhang1211@...il.com>, "paulmck@...nel.org" <paulmck@...nel.org>,
	Xinghua Yang (杨兴华) <Xinghua.Yang@...iatek.com>,
	"joel@...lfernandes.org" <joel@...lfernandes.org>, "rcu@...r.kernel.org"
	<rcu@...r.kernel.org>, Chun-Hung Wu (巫駿宏)
	<Chun-hung.Wu@...iatek.com>
Subject: Re: [PATCH v4] hrtimers: Force migrate away hrtimers queued after
 CPUHP_AP_HRTIMERS_DYING

On Wed, 2025-03-26 at 17:44 +0100, Frederic Weisbecker wrote:
> Hi Walter Chang,
> 
> Le Wed, Mar 26, 2025 at 05:46:38AM +0000, Walter Chang (張維哲) a écrit
> :
> > On Tue, 2025-01-21 at 09:08 -0800, Paul E. McKenney wrote:
> > > On Sat, Jan 18, 2025 at 12:24:33AM +0100, Frederic Weisbecker
> > > wrote:
> > > > hrtimers are migrated away from the dying CPU to any online
> > > > target
> > > > at
> > > > the CPUHP_AP_HRTIMERS_DYING stage in order not to delay
> > > > bandwidth
> > > > timers
> > > > handling tasks involved in the CPU hotplug forward progress.
> > > > 
> > > > However wake ups can still be performed by the outgoing CPU
> > > > after
> > > > CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth
> > > > timers
> > > > being armed. Depending on several considerations (crystal ball
> > > > power management based election, earliest timer already
> > > > enqueued,
> > > > timer
> > > > migration enabled or not), the target may eventually be the
> > > > current
> > > > CPU even if offline. If that happens, the timer is eventually
> > > > ignored.
> > > > 
> > > > The most notable example is RCU which had to deal with each and
> > > > every of
> > > > those wake-ups by deferring them to an online CPU, along with
> > > > related
> > > > workarounds:
> > > > 
> > > > _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is
> > > > dying)
> > > > _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from
> > > > offline CPU)
> > > > _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline
> > > > softirq)
> > > > 
> > > > The problem isn't confined to RCU though as the stop machine
> > > > kthread
> > > > (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at
> > > > the
> > > > end
> > > > of its work through cpu_stop_signal_done() and performs a wake
> > > > up
> > > > that
> > > > eventually arms the deadline server timer:
> > > > 
> > > >            WARNING: CPU: 94 PID: 588 at
> > > > kernel/time/hrtimer.c:1086
> > > > hrtimer_start_range_ns+0x289/0x2d0
> > > >            CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not
> > > > tainted
> > > >            Stopper: multi_cpu_stop+0x0/0x120 <-
> > > > stop_machine_cpuslocked+0x66/0xc0
> > > >            RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
> > > >            Call Trace:
> > > >             <TASK>
> > > >             ? hrtimer_start_range_ns
> > > >             start_dl_timer
> > > >             enqueue_dl_entity
> > > >             dl_server_start
> > > >             enqueue_task_fair
> > > >             enqueue_task
> > > >             ttwu_do_activate
> > > >             try_to_wake_up
> > > >             complete
> > > >             cpu_stopper_thread
> > > >             smpboot_thread_fn
> > > >             kthread
> > > >             ret_from_fork
> > > >             ret_from_fork_asm
> > > >             </TASK>
> > > > 
> > > > Instead of providing yet another bandaid to work around the
> > > > situation,
> > > > fix it from hrtimers infrastructure instead: always migrate
> > > > away a
> > > > timer to an online target whenever it is enqueued from an
> > > > offline
> > > > CPU.
> > > > 
> > > > This will also allow to revert all the above RCU disgraceful
> > > > hacks.
> > > > 
> > > > Reported-by: Vlad Poenaru <vlad.wing@...il.com>
> > > > Reported-by: Usama Arif <usamaarif642@...il.com>
> > > > Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from
> > > > outgoing CPU earlier")
> > > > Closes: 20241213203739.1519801-1-usamaarif642@...il.com
> > > > Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> > > > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > > 
> > > This passes over-holiday testing rcutorture, so, perhaps
> > > redundantly:
> > > 
> > > Tested-by: Paul E. McKenney <paulmck@...nel.org>
> > 
> > Hi,
> > 
> > I encountered the same issue even after applying this patch.
> > Below are the details of the warning and call trace.
> > 
> > 
> > migration/3: ------------[ cut here ]------------
> > migration/3: WARNING: CPU: 3 PID: 42 at kernel/time/hrtimer.c:1125
> > enqueue_hrtimer+0x7c/0xec
> > migration/3: CPU: 3 UID: 0 PID: 42 Comm: migration/3 Tainted:
> > G       
> > OE      6.12.18-android16-0-g59cb5a849beb-4k #1
> > 0b440e43fa7b24aaa3b7e6e5d2b938948e0cacdb
> > migration/3: Stopper: multi_cpu_stop+0x0/0x184 <-
> > stop_machine_cpuslocked+0xc0/0x15c
> 
> It's not the first time I get such a report on an out of tree
> kernel. The problem is I don't know if the tainted modules are
> involved. But something is probably making an offline CPU visible
> within
> the hierarchy on get_nohz_timer_target(). And that new warning made
> that visible.
> 
Hi,

By review the get_nohz_timer_target(), it's probably making an offline
CPU visible at timer candidates, maybe this patch could fix it?


[PATCH] sched/core: Exclude offline CPUs from the timer candidates

The timer target is chosen from the HK_TYPE_KERNEL_NOISE.
However,the candidate may be an offline CPU,
so exclude offline CPUs and choose only from online CPUs.

Signed-off-by: kuyo chang <kuyo.chang@...iatek.com>
---
 kernel/sched/core.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cfaca3040b2f..efcc2576e622 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1182,7 +1182,7 @@ int get_nohz_timer_target(void)
        struct sched_domain *sd;
        const struct cpumask *hk_mask;

-       if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE)) {
+       if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) &&
cpu_online(cpu)) {
                if (!idle_cpu(cpu))
                        return cpu;
                default_cpu = cpu;
@@ -1197,13 +1197,16 @@ int get_nohz_timer_target(void)
                        if (cpu == i)
                                continue;

-                       if (!idle_cpu(i))
+                       if (!idle_cpu(i) && cpu_online(i))
                                return i;
                }
        }

-       if (default_cpu == -1)
+       if (default_cpu == -1) {
                default_cpu =
housekeeping_any_cpu(HK_TYPE_KERNEL_NOISE);
+               if (!cpu_online(default_cpu))
+                       default_cpu = cpumask_any(cpu_online_mask);
+       }

        return default_cpu;
 }
> Can you try this and tell us if the warning fires?
> 
> Thanks.
> 
> diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
> index 6d67e9a5af6b..f49512628269 100644
> --- a/include/linux/sched/nohz.h
> +++ b/include/linux/sched/nohz.h
> @@ -9,6 +9,7 @@
>  #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
>  extern void nohz_balance_enter_idle(int cpu);
>  extern int get_nohz_timer_target(void);
> +extern void assert_domain_online(void);
>  #else
>  static inline void nohz_balance_enter_idle(int cpu) { }
>  #endif
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 07455d25329c..98c8f8408403 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -13,6 +13,7 @@
>  #include <linux/sched/isolation.h>
>  #include <linux/sched/task.h>
>  #include <linux/sched/smt.h>
> +#include <linux/sched/nohz.h>
>  #include <linux/unistd.h>
>  #include <linux/cpu.h>
>  #include <linux/oom.h>
> @@ -1277,6 +1278,7 @@ static int take_cpu_down(void *_param)
>  	if (err < 0)
>  		return err;
>  
> +	assert_domain_online();
>  	/*
>  	 * Must be called from CPUHP_TEARDOWN_CPU, which means, as
> we are going
>  	 * down, that the current state is CPUHP_TEARDOWN_CPU - 1.
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 175a5a7ac107..88157b1645cc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1163,6 +1163,20 @@ void resched_cpu(int cpu)
>  
>  #ifdef CONFIG_SMP
>  #ifdef CONFIG_NO_HZ_COMMON
> +void assert_domain_online(void)
> +{
> +	int cpu = smp_processor_id();
> +	int i;
> +	struct sched_domain *sd;
> +
> +	guard(rcu)();
> +
> +	for_each_domain(cpu, sd) {
> +		for_each_cpu(i, sched_domain_span(sd)) {
> +			WARN_ON_ONCE(cpu_is_offline(i));
> +		}
> +	}
> +}
>  /*
>   * In the semi idle case, use the nearest busy CPU for migrating
> timers
>   * from an idle CPU.  This is good for power-savings.