[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20200203170739.20736-1-longman@redhat.com>
Date: Mon, 3 Feb 2020 12:07:39 -0500
From: Waiman Long <longman@...hat.com>
To: Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Cc: linux-kernel@...r.kernel.org,
Jeremy Linton <jeremy.linton@....com>, pbunyan@...hat.com,
Waiman Long <longman@...hat.com>
Subject: [RFC PATCH] tick: Make tick_periodic() check for missing ticks
The tick_periodic() function is used at the beginning part of the
bootup process for time keeping while the other clock sources are
being initialized.
The current code assumes that all the timer interrupts are handled in
a timely manner with no missing ticks. That is not actually true. Some
ticks are missed and there are some discrepancies between the tick time
(jiffies) and the timestamp reported in the kernel log. Some systems,
however, are more prone to missing ticks than the others. In the extreme
case, the discrepancy can actually cause a soft lockup message to be
printed by the watchdog kthread. For example, on a Cavium ThunderX2
Sabre arm64 system:
[ 25.496379] watchdog: BUG: soft lockup - CPU#14 stuck for 22s!
On that system, the missing ticks are especially prevalent during the
smp_init() phase of the boot process. With an instrumented kernel,
it was found that it took about 24s as reported by the timestamp for
the tick to accumulate 4s of time.
Investigation and bisection done by others seemed to point to the
commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or
lack thereof") as the culprit. It could also be a firmware issue as
new firmware was promised that would fix the issue.
To properly address this problem, we cannot assume that there will
be no missing tick in tick_periodic(). This function is now modified
to follow the example of tick_do_update_jiffies64() by using another
reference clock to check for missing ticks. Since the watchdog timer
uses running_clock(), it is used here as the reference. With this patch
applied, the soft lockup problem in the arm64 system is gone and tick
time tracks much more closely to the timestamp time.
Signed-off-by: Waiman Long <longman@...hat.com>
---
kernel/time/tick-common.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 7e5d3524e924..831e87ef134f 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -16,6 +16,7 @@
#include <linux/profile.h>
#include <linux/sched.h>
#include <linux/module.h>
+#include <linux/sched/clock.h>
#include <trace/events/power.h>
#include <asm/irq_regs.h>
@@ -84,12 +85,28 @@ int tick_is_oneshot_available(void)
static void tick_periodic(int cpu)
{
if (tick_do_timer_cpu == cpu) {
+ /*
+ * Use running_clock() as reference to check for missing ticks.
+ */
+ static u64 last_update;
+ u64 now, delta;
+ int ticks = 1;
+
+ now = running_clock();
+ if (last_update) {
+ delta = ktime_sub(now, last_update);
+
+ /* Compute missed ticks */
+ ticks = max((int)(delta / ktime_to_ns(tick_period)), 1);
+ }
+ last_update = now;
+
write_seqlock(&jiffies_lock);
/* Keep track of the next tick event */
- tick_next_period = ktime_add(tick_next_period, tick_period);
-
- do_timer(1);
+ tick_next_period = ktime_add(tick_next_period,
+ ticks * tick_period);
+ do_timer(ticks);
write_sequnlock(&jiffies_lock);
update_wall_time();
}
--
2.18.1
Powered by blists - more mailing lists