lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHkabelw1sZqu9JR@incl>
Date: Thu, 17 Jul 2025 17:44:45 +0200
From: Jiri Wiesner <jwiesner@...e.de>
To: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc: Jonathan Corbet <corbet@....net>, Steve Wahl <steve.wahl@....com>,
	Justin Ernst <justin.ernst@....com>,
	Kyle Meyer <kyle.meyer@....com>,
	Dimitri Sivanich <dimitri.sivanich@....com>,
	Russ Anderson <russ.anderson@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	"H. Peter Anvin" <hpa@...or.com>
Subject: [PATCH] x86: UV RTC: Add parameter to disable RTC clocksource

Booting up an 8 NUMA node machine that has an UV RTC clocksource may
result in the TSC being marked unstable by the clocksource watchdog due to
time skew. The failures to verify the TSC happen soon after the current
clocksource is switched to the TSC (usually the watchdog runs twice).
Delaying the checks carried out by the clocksource watchdog after the
system boots up does not make a difference.

The clocksource watchdog compares two clocksources and it is assumed that
it is always the clocksource being verified what has caused the time skew
measured by the clocksource watchdog. To check the validity of this
assumption, a debugging kernel was used. A third clocksource that was set
to the HPET was added. The messages reported by the debugging kernel
indicate that the time skew between the TSC and the HPET was only 22
nanoseconds while the time skew between the TSC and sgi_rtc was 591659
nanoseconds:

clocksource: timekeeping watchdog on CPU176: Marking clocksource 'tsc' as unstable because the skew is too large:
clocksource: 'sgi_rtc' wd_nsec: 479339803 wd_now: 1fab695e5a wd_last: 1f9e44dca0 mask: ffffffffffffff
clocksource: 'hpet' wd2_nsec: 479931440 wd2_now: 90a1af85 wd2_last: 8fea9b37 mask: ffffffff
clocksource: 'tsc' cs_nsec: 479931462 cs_now: 944e1c227d cs_last: 9412097879 mask: ffffffffffffffff
clocksource: Clocksource 'tsc' skewed 591659 ns (0 ms) over watchdog 'sgi_rtc' interval of 479339803 ns (479 ms)
clocksource: 'tsc' is current clocksource.
tsc: Marking TSC unstable due to clocksource watchdog
TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
sched_clock: Marking unstable (90731283360, -1108605523)<-(95136368481, -5513690634)
clocksource: Checking clocksource tsc synchronization from CPU 501 to CPUs 0-500,502-767.
clocksource: CPU 501 check durations 1446ns - 32908ns for clocksource tsc.

This happened on CPU 176, which resides on NUMA node 3. The interval was
computed from timestamps from CPU 176 and from CPU 175, which also resides
on NUMA node 3. Since the time skew was reported between CPUs residing on
the same NUMA node, it is unlikely that the TSC would experience time skew.

The debugging kernel printed out the last message in
clocksource_verify_percpu() unconditionally, and all CPUs were checked.
None of the CPUs was reported as being behind or ahead of CPU 501. The
last message provides a worst case estimate. The value of 2 * cs_nsec_max
(2 * 32908 ns) is the maximum possible time skew between the TSCs of any
two CPUs on the system, as measured by the TSC sync check. The cs_nsec_max
value itself is an estimate because it includes delays incurred by
executing and servicing an inter-processor interrupt synchronously, which
has a non-negligible cost. The maximum possible time skew (of the TSC) of
66 microseconds does not even approach the size of the time skew measured
by the clocksource watchdog.

Testing has shown that the HPET is stabler than sgi_rtc so the HPET is a
better choice for veryfying the TSC. Disabling the sgi_rtc clocksource was
implemented as a workaround. The name of the parameter was inspired by
581f202bcd60 ("x86: UV RTC: Always enable RTC clocksource") and the fact
that there also is a nohpet parameter and a notsc parameter. The uvrtcevt
parameter has been documented.

Signed-off-by: Jiri Wiesner <jwiesner@...e.de>
---
 Documentation/admin-guide/kernel-parameters.txt |  4 ++++
 arch/x86/platform/uv/uv_time.c                  | 11 ++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 07e22ba5bfe3..9839257181e3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4302,6 +4302,8 @@
 			This is required for the Braillex ib80-piezo Braille
 			reader made by F.H. Papenmeier (Germany).
 
+	nouvrtc		[X86] Disable the UV RTC clocksource (SGI RTC clock).
+
 	nosgx		[X86-64,SGX,EARLY] Disables Intel SGX kernel support.
 
 	nosmap		[PPC,EARLY]
@@ -7839,6 +7841,8 @@
 				16 - SIGBUS faults
 			Example: user_debug=31
 
+	uvrtcevt	[X86] Use UV RTC clock events (SGI RTC clock) for timers.
+
 	vdso=		[X86,SH,SPARC]
 			On X86_32, this is an alias for vdso32=.  Otherwise:
 
diff --git a/arch/x86/platform/uv/uv_time.c b/arch/x86/platform/uv/uv_time.c
index 3712afc3534d..03d59b87c371 100644
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -61,6 +61,7 @@ struct uv_rtc_timer_head {
  */
 static struct uv_rtc_timer_head		**blade_info __read_mostly;
 
+static int				uv_rtc_enable = 1;
 static int				uv_rtc_evt_enable;
 
 /*
@@ -321,6 +322,14 @@ static void uv_rtc_interrupt(void)
 	ced->event_handler(ced);
 }
 
+static int __init uv_disable_rtc(char *str)
+{
+	uv_rtc_enable = 0;
+
+	return 1;
+}
+__setup("nouvrtc", uv_disable_rtc);
+
 static int __init uv_enable_evt_rtc(char *str)
 {
 	uv_rtc_evt_enable = 1;
@@ -342,7 +351,7 @@ static __init int uv_rtc_setup_clock(void)
 {
 	int rc;
 
-	if (!is_uv_system())
+	if (!uv_rtc_enable || !is_uv_system())
 		return -ENODEV;
 
 	rc = clocksource_register_hz(&clocksource_uv, sn_rtc_cycles_per_second);
-- 
2.43.0


-- 
Jiri Wiesner
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ