linux-kernel - Re: [PATCH v4] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7cb64fb-1c17-4316-abf8-e6a7e07ba4d1@paulmck-laptop>
Date: Tue, 21 Jan 2025 09:08:01 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>, vlad.wing@...il.com,
	rcu@...r.kernel.org, boqun.feng@...il.com, joel@...lfernandes.org,
	neeraj.upadhyay@....com, urezki@...il.com,
	qiang.zhang1211@...il.com, Cheng-Jui.Wang@...iatek.com,
	leitao@...ian.org, kernel-team@...a.com,
	Usama Arif <usamaarif642@...il.com>,
	Anna-Maria Behnsen <anna-maria@...utronix.de>
Subject: Re: [PATCH v4] hrtimers: Force migrate away hrtimers queued after
 CPUHP_AP_HRTIMERS_DYING

On Sat, Jan 18, 2025 at 12:24:33AM +0100, Frederic Weisbecker wrote:
> hrtimers are migrated away from the dying CPU to any online target at
> the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
> handling tasks involved in the CPU hotplug forward progress.
> 
> However wake ups can still be performed by the outgoing CPU after
> CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers
> being armed. Depending on several considerations (crystal ball
> power management based election, earliest timer already enqueued, timer
> migration enabled or not), the target may eventually be the current
> CPU even if offline. If that happens, the timer is eventually ignored.
> 
> The most notable example is RCU which had to deal with each and every of
> those wake-ups by deferring them to an online CPU, along with related
> workarounds:
> 
> _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
> _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
> _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
> 
> The problem isn't confined to RCU though as the stop machine kthread
> (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
> of its work through cpu_stop_signal_done() and performs a wake up that
> eventually arms the deadline server timer:
> 
>            WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
>            CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
>            Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
>            RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
>            Call Trace:
>             <TASK>
>             ? hrtimer_start_range_ns
>             start_dl_timer
>             enqueue_dl_entity
>             dl_server_start
>             enqueue_task_fair
>             enqueue_task
>             ttwu_do_activate
>             try_to_wake_up
>             complete
>             cpu_stopper_thread
>             smpboot_thread_fn
>             kthread
>             ret_from_fork
>             ret_from_fork_asm
>             </TASK>
> 
> Instead of providing yet another bandaid to work around the situation,
> fix it from hrtimers infrastructure instead: always migrate away a
> timer to an online target whenever it is enqueued from an offline CPU.
> 
> This will also allow to revert all the above RCU disgraceful hacks.
> 
> Reported-by: Vlad Poenaru <vlad.wing@...il.com>
> Reported-by: Usama Arif <usamaarif642@...il.com>
> Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> Closes: 20241213203739.1519801-1-usamaarif642@...il.com
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

This passes over-holiday testing rcutorture, so, perhaps redundantly:

Tested-by: Paul E. McKenney <paulmck@...nel.org>

> ---
>  include/linux/hrtimer_defs.h |  1 +
>  kernel/time/hrtimer.c        | 92 +++++++++++++++++++++++++++++-------
>  2 files changed, 75 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h
> index c3b4b7ed7c16..84a5045f80f3 100644
> --- a/include/linux/hrtimer_defs.h
> +++ b/include/linux/hrtimer_defs.h
> @@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
>  	ktime_t				softirq_expires_next;
>  	struct hrtimer			*softirq_next_timer;
>  	struct hrtimer_clock_base	clock_base[HRTIMER_MAX_CLOCK_BASES];
> +	call_single_data_t		csd;
>  } ____cacheline_aligned;
>  
>  
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 030426c8c944..082d4b687fa1 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -58,6 +58,8 @@
>  #define HRTIMER_ACTIVE_SOFT	(HRTIMER_ACTIVE_HARD << MASK_SHIFT)
>  #define HRTIMER_ACTIVE_ALL	(HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
>  
> +static void retrigger_next_event(void *arg);
> +
>  /*
>   * The timer bases:
>   *
> @@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
>  			.clockid = CLOCK_TAI,
>  			.get_time = &ktime_get_clocktai,
>  		},
> -	}
> +	},
> +	.csd = CSD_INIT(retrigger_next_event, NULL)
>  };
>  
>  static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
> @@ -124,6 +127,14 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
>  	[CLOCK_TAI]		= HRTIMER_BASE_TAI,
>  };
>  
> +static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base)
> +{
> +	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> +		return true;
> +	else
> +		return likely(base->online);
> +}
> +
>  /*
>   * Functions and macros which are different for UP/SMP systems are kept in a
>   * single place
> @@ -183,27 +194,58 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
>  }
>  
>  /*
> - * We do not migrate the timer when it is expiring before the next
> - * event on the target cpu. When high resolution is enabled, we cannot
> - * reprogram the target cpu hardware and we would cause it to fire
> - * late. To keep it simple, we handle the high resolution enabled and
> - * disabled case similar.
> + * Check if the elected target is suitable considering its next
> + * event and the hotplug state of the current CPU.
> + *
> + * If the elected target is remote and its next event is after the timer
> + * to queue, then a remote reprogram is necessary. However there is no
> + * guarantee the IPI handling the operation would arrive in time to meet
> + * the high resolution deadline. In this case the local CPU becomes a
> + * preferred target, unless it is offline.
> + *
> + * High and low resolution modes are handled the same way for simplicity.
>   *
>   * Called with cpu_base->lock of target cpu held.
>   */
> -static int
> -hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)
> +static bool
> +hrtimer_suitable_target(struct hrtimer *timer,
> +			struct hrtimer_clock_base *new_base,
> +			struct hrtimer_cpu_base *new_cpu_base,
> +			struct hrtimer_cpu_base *this_cpu_base)
>  {
>  	ktime_t expires;
>  
> +	/*
> +	 * The local CPU clockevent can be reprogrammed. Also get_target_base()
> +	 * guarantees it is online.
> +	 */
> +	if (new_cpu_base == this_cpu_base)
> +		return true;
> +
> +	/*
> +	 * The offline local CPU can't be the default target if the
> +	 * next remote target event is after this timer. Keep the
> +	 * elected new base. An IPI will we issued to reprogram
> +	 * it as a last resort.
> +	 */
> +	if (!hrtimer_base_is_online(this_cpu_base))
> +		return true;
> +
>  	expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
> -	return expires < new_base->cpu_base->expires_next;
> +
> +	return expires >= new_base->cpu_base->expires_next;
>  }
>  
>  static inline
>  struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base,
>  					 int pinned)
>  {
> +	if (!hrtimer_base_is_online(base)) {
> +		int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER));
> +
> +		return &per_cpu(hrtimer_bases, cpu);
> +	}
> +
>  #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
>  	if (static_branch_likely(&timers_migration_enabled) && !pinned)
>  		return &per_cpu(hrtimer_bases, get_nohz_timer_target());
> @@ -254,8 +296,8 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
>  		raw_spin_unlock(&base->cpu_base->lock);
>  		raw_spin_lock(&new_base->cpu_base->lock);
>  
> -		if (new_cpu_base != this_cpu_base &&
> -		    hrtimer_check_target(timer, new_base)) {
> +		if (!hrtimer_suitable_target(timer, new_base, new_cpu_base,
> +					     this_cpu_base)) {
>  			raw_spin_unlock(&new_base->cpu_base->lock);
>  			raw_spin_lock(&base->cpu_base->lock);
>  			new_cpu_base = this_cpu_base;
> @@ -264,8 +306,8 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
>  		}
>  		WRITE_ONCE(timer->base, new_base);
>  	} else {
> -		if (new_cpu_base != this_cpu_base &&
> -		    hrtimer_check_target(timer, new_base)) {
> +		if (!hrtimer_suitable_target(timer, new_base,  new_cpu_base,
> +					     this_cpu_base)) {
>  			new_cpu_base = this_cpu_base;
>  			goto again;
>  		}
> @@ -716,8 +758,6 @@ static inline int hrtimer_is_hres_enabled(void)
>  	return hrtimer_hres_enabled;
>  }
>  
> -static void retrigger_next_event(void *arg);
> -
>  /*
>   * Switch to high resolution mode
>   */
> @@ -1206,6 +1246,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
>  				    u64 delta_ns, const enum hrtimer_mode mode,
>  				    struct hrtimer_clock_base *base)
>  {
> +	struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
>  	struct hrtimer_clock_base *new_base;
>  	bool force_local, first;
>  
> @@ -1217,9 +1258,15 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
>  	 * and enforce reprogramming after it is queued no matter whether
>  	 * it is the new first expiring timer again or not.
>  	 */
> -	force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
> +	force_local = base->cpu_base == this_cpu_base;
>  	force_local &= base->cpu_base->next_timer == timer;
>  
> +	/*
> +	 * Don't force local queuing if this enqueue happens on a unplugged
> +	 * CPU after hrtimer_cpu_dying() has been invoked.
> +	 */
> +	force_local &= this_cpu_base->online;
> +
>  	/*
>  	 * Remove an active timer from the queue. In case it is not queued
>  	 * on the current CPU, make sure that remove_hrtimer() updates the
> @@ -1249,8 +1296,17 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
>  	}
>  
>  	first = enqueue_hrtimer(timer, new_base, mode);
> -	if (!force_local)
> -		return first;
> +	if (!force_local) {
> +		if (hrtimer_base_is_online(this_cpu_base))
> +			return first;
> +
> +		if (first) {
> +			struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;
> +
> +			smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);
> +		}
> +		return 0;
> +	}
>  
>  	/*
>  	 * Timer was forced to stay on the current CPU to avoid
> -- 
> 2.46.0
>