[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231218025844.55675-1-liaoyu15@huawei.com>
Date: Mon, 18 Dec 2023 10:58:44 +0800
From: Yu Liao <liaoyu15@...wei.com>
To: <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>
CC: <liaoyu15@...wei.com>, <liwei391@...wei.com>, <wangxiongfeng2@...wei.com>,
<frederic@...nel.org>, <mingo@...nel.org>
Subject: [PATCH] tick/broadcast-hrtimer: Prevent the timer device on broadcast duty CPU from being disabled
It was found that running the LTP hotplug stress test on a aarch64
system could produce rcu_sched stall warnings.
The issue is the following:
CPU1 (owns the broadcast hrtimer) CPU2
tick_broadcast_enter()
//shut down local timer device
...
tick_broadcast_exit()
//exits with tick_broadcast_force_mask set,
timer device remains disabled
initiates offlining of CPU1
take_cpu_down()
//CPU1 shuts down and does
not send broadcast IPI anymore
takedown_cpu()
hotplug_cpu__broadcast_tick_pull()
//move broadcast hrtimer to this CPU
clockevents_program_event()
bc_set_next()
hrtimer_start()
//does not call hrtimer_reprogram()
to program timer device if expires
equals dev->next_event, so the timer
device remains disabled.
CPU2 takes over the broadcast duty but local timer device is disabled,
causing many CPUs to become stuck.
Fix this by calling tick_program_event() to reprogram the local timer
device in this scenario.
Signed-off-by: Yu Liao <liaoyu15@...wei.com>
---
kernel/time/tick-broadcast-hrtimer.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index e28f9210f8a1..6a4a612581fb 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -42,10 +42,22 @@ static int bc_shutdown(struct clock_event_device *evt)
*/
static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
{
+ ktime_t next_event = this_cpu_ptr(&tick_cpu_device)->evtdev->next_event;
+
/*
- * This is called either from enter/exit idle code or from the
- * broadcast handler. In all cases tick_broadcast_lock is held.
- *
+ * This can be called from CPU offline operation to move broadcast
+ * assignment. If tick_broadcast_force_mask is set, the CPU local
+ * timer device may be disabled. And hrtimer_reprogram() will not
+ * called if the timer is not the first expiring timer. Reprogram
+ * the cpu local timer device to ensure we can take over the
+ * broadcast duty.
+ */
+ if (tick_check_broadcast_expired() && expires >= next_event)
+ tick_program_event(next_event, 1);
+
+ /*
+ * This is called from enter/exit idle code, broadcast handler or
+ * CPU offline operation. In all cases tick_broadcast_lock is held.
* hrtimer_cancel() cannot be called here neither from the
* broadcast handler nor from the enter/exit idle code. The idle
* code can run into the problem described in bc_shutdown() and the
--
2.33.0
Powered by blists - more mailing lists