lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231218025844.55675-1-liaoyu15@huawei.com>
Date: Mon, 18 Dec 2023 10:58:44 +0800
From: Yu Liao <liaoyu15@...wei.com>
To: <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>
CC: <liaoyu15@...wei.com>, <liwei391@...wei.com>, <wangxiongfeng2@...wei.com>,
	<frederic@...nel.org>, <mingo@...nel.org>
Subject: [PATCH] tick/broadcast-hrtimer: Prevent the timer device on broadcast duty CPU from being disabled

It was found that running the LTP hotplug stress test on a aarch64
system could produce rcu_sched stall warnings.

The issue is the following:

CPU1 (owns the broadcast hrtimer)	CPU2

				tick_broadcast_enter()
				//shut down local timer device
				...
				tick_broadcast_exit()
				//exits with tick_broadcast_force_mask set,
				timer device remains disabled

				initiates offlining of CPU1
take_cpu_down()
//CPU1 shuts down and does
not send broadcast IPI anymore
				takedown_cpu()
				  hotplug_cpu__broadcast_tick_pull()
				  //move broadcast hrtimer to this CPU
				    clockevents_program_event()
				      bc_set_next()
					hrtimer_start()
					//does not call hrtimer_reprogram()
					to program timer device if expires
					equals dev->next_event, so the timer
					device remains disabled.

CPU2 takes over the broadcast duty but local timer device is disabled,
causing many CPUs to become stuck.

Fix this by calling tick_program_event() to reprogram the local timer
device in this scenario.

Signed-off-by: Yu Liao <liaoyu15@...wei.com>
---
 kernel/time/tick-broadcast-hrtimer.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index e28f9210f8a1..6a4a612581fb 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -42,10 +42,22 @@ static int bc_shutdown(struct clock_event_device *evt)
  */
 static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 {
+	ktime_t next_event = this_cpu_ptr(&tick_cpu_device)->evtdev->next_event;
+
 	/*
-	 * This is called either from enter/exit idle code or from the
-	 * broadcast handler. In all cases tick_broadcast_lock is held.
-	 *
+	 * This can be called from CPU offline operation to move broadcast
+	 * assignment. If tick_broadcast_force_mask is set, the CPU local
+	 * timer device may be disabled. And hrtimer_reprogram() will not
+	 * called if the timer is not the first expiring timer. Reprogram
+	 * the cpu local timer device to ensure we can take over the
+	 * broadcast duty.
+	 */
+	if (tick_check_broadcast_expired() && expires >= next_event)
+		tick_program_event(next_event, 1);
+
+	/*
+	 * This is called from enter/exit idle code, broadcast handler or
+	 * CPU offline operation. In all cases tick_broadcast_lock is held.
 	 * hrtimer_cancel() cannot be called here neither from the
 	 * broadcast handler nor from the enter/exit idle code. The idle
 	 * code can run into the problem described in bc_shutdown() and the
-- 
2.33.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ