lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 12 Jan 2024 15:40:27 +0800
From: Yu Liao <liaoyu15@...wei.com>
To: <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>
CC: <liwei391@...wei.com>, <wangxiongfeng2@...wei.com>, <frederic@...nel.org>,
	<mingo@...nel.org>
Subject: Re: [PATCH] tick/broadcast-hrtimer: Prevent the timer device on
 broadcast duty CPU from being disabled

Hi Thomas,

Kindly ping..

On 2023/12/18 10:58, Yu Liao wrote:
> It was found that running the LTP hotplug stress test on a aarch64
> system could produce rcu_sched stall warnings.
> 
> The issue is the following:
> 
> CPU1 (owns the broadcast hrtimer)	CPU2
> 
> 				tick_broadcast_enter()
> 				//shut down local timer device
> 				...
> 				tick_broadcast_exit()
> 				//exits with tick_broadcast_force_mask set,
> 				timer device remains disabled
> 
> 				initiates offlining of CPU1
> take_cpu_down()
> //CPU1 shuts down and does
> not send broadcast IPI anymore
> 				takedown_cpu()
> 				  hotplug_cpu__broadcast_tick_pull()
> 				  //move broadcast hrtimer to this CPU
> 				    clockevents_program_event()
> 				      bc_set_next()
> 					hrtimer_start()
> 					//does not call hrtimer_reprogram()
> 					to program timer device if expires
> 					equals dev->next_event, so the timer
> 					device remains disabled.
> 
> CPU2 takes over the broadcast duty but local timer device is disabled,
> causing many CPUs to become stuck.
> 
> Fix this by calling tick_program_event() to reprogram the local timer
> device in this scenario.
> 
> Signed-off-by: Yu Liao <liaoyu15@...wei.com>
> ---
>  kernel/time/tick-broadcast-hrtimer.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
> index e28f9210f8a1..6a4a612581fb 100644
> --- a/kernel/time/tick-broadcast-hrtimer.c
> +++ b/kernel/time/tick-broadcast-hrtimer.c
> @@ -42,10 +42,22 @@ static int bc_shutdown(struct clock_event_device *evt)
>   */
>  static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
>  {
> +	ktime_t next_event = this_cpu_ptr(&tick_cpu_device)->evtdev->next_event;
> +
>  	/*
> -	 * This is called either from enter/exit idle code or from the
> -	 * broadcast handler. In all cases tick_broadcast_lock is held.
> -	 *
> +	 * This can be called from CPU offline operation to move broadcast
> +	 * assignment. If tick_broadcast_force_mask is set, the CPU local
> +	 * timer device may be disabled. And hrtimer_reprogram() will not
> +	 * called if the timer is not the first expiring timer. Reprogram
> +	 * the cpu local timer device to ensure we can take over the
> +	 * broadcast duty.
> +	 */
> +	if (tick_check_broadcast_expired() && expires >= next_event)
> +		tick_program_event(next_event, 1);
> +
> +	/*
> +	 * This is called from enter/exit idle code, broadcast handler or
> +	 * CPU offline operation. In all cases tick_broadcast_lock is held.
>  	 * hrtimer_cancel() cannot be called here neither from the
>  	 * broadcast handler nor from the enter/exit idle code. The idle
>  	 * code can run into the problem described in bc_shutdown() and the


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ