[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221114232827.835599-1-paulmck@kernel.org>
Date: Mon, 14 Nov 2022 15:28:24 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: tglx@...utronix.de
Cc: linux-kernel@...r.kernel.org, john.stultz@...aro.org,
sboyd@...nel.org, corbet@....net, Mark.Rutland@....com,
maz@...nel.org, kernel-team@...a.com, neeraju@...eaurora.org,
ak@...ux.intel.com, feng.tang@...el.com, zhengjun.xing@...el.com,
"Paul E. McKenney" <paulmck@...nel.org>,
Chris Mason <clm@...a.com>, John Stultz <jstultz@...gle.com>,
Waiman Long <longman@...hat.com>
Subject: [PATCH clocksource 1/3] clocksource: Reject bogus watchdog clocksource measurements
One remaining clocksource-skew issue involves extreme CPU overcommit,
which can cause the clocksource watchdog measurements to be delayed by
tens of seconds. This in turn means that a clock-skew criterion that
is appropriate for a 500-millisecond interval will instead give lots of
false positives.
Therefore, check for the watchdog clocksource reporting much larger or
much less than the time specified by WATCHDOG_INTERVAL. In these cases,
print a pr_warn() warning and refrain from marking the clocksource under
test as being unstable.
Reported-by: Chris Mason <clm@...a.com>
Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
Cc: John Stultz <jstultz@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Stephen Boyd <sboyd@...nel.org>
Cc: Feng Tang <feng.tang@...el.com>
Cc: Waiman Long <longman@...hat.com>
---
kernel/time/clocksource.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 8058bec87acee..dcaf38c062161 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -386,7 +386,7 @@ EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
static void clocksource_watchdog(struct timer_list *unused)
{
- u64 csnow, wdnow, cslast, wdlast, delta;
+ u64 csnow, wdnow, cslast, wdlast, delta, wdi;
int next_cpu, reset_pending;
int64_t wd_nsec, cs_nsec;
struct clocksource *cs;
@@ -440,6 +440,17 @@ static void clocksource_watchdog(struct timer_list *unused)
if (atomic_read(&watchdog_reset_pending))
continue;
+ /* Check for bogus measurements. */
+ wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
+ if (wd_nsec < (wdi >> 2)) {
+ pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+ continue;
+ }
+ if (wd_nsec > (wdi << 2)) {
+ pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+ continue;
+ }
+
/* Check the deviation from the watchdog clocksource. */
md = cs->uncertainty_margin + watchdog->uncertainty_margin;
if (abs(cs_nsec - wd_nsec) > md) {
--
2.31.1.189.g2e36527f23
Powered by blists - more mailing lists