[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bjjxc9dq.ffs@tglx>
Date: Wed, 17 Dec 2025 18:21:05 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Daniel J Blueman <daniel@...ra.org>, "Paul E. McKenney"
<paulmck@...nel.org>, John Stultz <jstultz@...gle.com>, Waiman Long
<longman@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Dave Hansen
<dave.hansen@...ux.intel.com>, Tony Luck <tony.luck@...el.com>, Borislav
Petkov <bp@...en8.de>, Stephen Boyd <sboyd@...nel.org>, Scott Hamilton
<scott.hamilton@...den.com>
Subject: clocksource: Reduce watchdog readout delay limit to prevent false
positives
The "valid" readout delay between the two reads of the watchdog is larger
than the valid delta between the resulting watchdog and clocksource
intervals, which results in false positive watchdog results.
Assume TSC is the clocksource and HPET is the watchdog and both have a
uncertainty margin of 250us (default). The watchdog readout does:
1) wdnow = read(HPET);
2) csnow = read(TSC);
3) wdend = read(HPET);
The valid window for the delta between #1 and #3 is calculated by the
uncertainty margins of the watchdog and the clocksource:
m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin;
which results in 750us for the TSC/HPET case.
The actual interval comparison uses a smaller margin:
m = watchdog.uncertainty_margin + cs.uncertainty margin;
which results in 500us for the TSC/HPET case.
That means the following scenario will trigger the watchdog:
Watchdog cycle N:
1) wdnow[N] = read(HPET);
2) csnow[N] = read(TSC);
3) wdend[N] = read(HPET);
Assume the delay between #1 and #2 is 100us and the delay between #1 and
#3 is within the 750us margin, i.e. the readout is considered valid.
Watchdog cycle N + 1:
4) wdnow[N + 1] = read(HPET);
5) csnow[N + 1] = read(TSC);
6) wdend[N + 1] = read(HPET);
If the delay between #4 and #6 is within the 750us margin then any delay
between #4 and #5 which is larger than 600us will fail the interval check
and mark the TSC unstable because the intervals are calculated against the
previous value:
wd_int = wdnow[N + 1] - wdnow[N];
cs_int = csnow[N + 1] - csnow[N];
Putting the above delays in place this results in:
cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us);
-> cs_int = wd_int + 510us;
which is obviously larger than the allowed 500us margin and results in
marking TSC unstable.
Fix this by using the same margin as the interval comparison. If the delay
between two watchdog reads is larger than that, then the readout was either
disturbed by interconnect congestion, NMIs or SMIs.
Fixes: 4ac1dd3245b9 ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin")
Reported-by: Daniel J Blueman <daniel@...ra.org>
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/
---
kernel/time/clocksource.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -252,7 +252,7 @@ enum wd_read_status {
static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
{
- int64_t md = 2 * watchdog->uncertainty_margin;
+ int64_t md = watchdog->uncertainty_margin;
unsigned int nretries, max_retries;
int64_t wd_delay, wd_seq_delay;
u64 wd_end, wd_end2;
@@ -285,7 +285,7 @@ static enum wd_read_status cs_watchdog_r
* watchdog test.
*/
wd_seq_delay = cycles_to_nsec_safe(watchdog, wd_end, wd_end2);
- if (wd_seq_delay > md)
+ if (wd_seq_delay > 2 * md)
goto skip_test;
}
Powered by blists - more mailing lists