linux-kernel - Re: [patch 5/5] clocksource: Rewrite watchdog code completely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMVG2ssXZKmw-YTKB5=CvhEofKeyEfaBCDZbyzfUcm2+P5rQsQ@mail.gmail.com>
Date: Mon, 2 Feb 2026 14:45:17 +0800
From: Daniel J Blueman <daniel@...ra.org>
To: Thomas Gleixner <tglx@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, "Paul E. McKenney" <paulmck@...nel.org>, 
	John Stultz <jstultz@...gle.com>, Waiman Long <longman@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Stephen Boyd <sboyd@...nel.org>, x86@...nel.org, 
	"Gautham R. Shenoy" <gautham.shenoy@....com>, Jiri Wiesner <jwiesner@...e.de>, 
	Scott Hamilton <scott.hamilton@...den.com>, Helge Deller <deller@....de>, linux-parisc@...r.kernel.org, 
	Thomas Bogendoerfer <tsbogend@...ha.franken.de>, linux-mips@...r.kernel.org
Subject: Re: [patch 5/5] clocksource: Rewrite watchdog code completely

Great work Thomas!

On Sat, 24 Jan 2026 at 07:18, Thomas Gleixner <tglx@...nel.org> wrote:
>
> The clocksource watchdog code has over time reached the state of an
> unpenetrable maze of duct tape and staples [..]
...
>   1) Restrict the validation against a reference clocksource to the boot
>      CPU, which is usually the CPU/Socket closest to the legacy block which
>      contains the reference source (HPET/ACPI-PM timer). Validate that the
>      reference readout is within a bound latency so that the actual
>      comparison against the TSC stays within 500ppm as long as the clocks
>      are stable.

On my 1920 thread BullSequana SH160 test system (16 sockets with
Numascale UPI Node Controller), I find this approach is intrinsically
robust against system latency.

>   2) Compare the TSCs of the other CPUs in a round robin fashion against
>      the boot CPU in the same way the TSC synchronization on CPU hotplug
>      works. This still can suffer from delayed reaction of the remote CPU
>      to the SMP function call and the latency of the control variable cache
>      line. But this latency is not affecting correctness. It only affects
>      the accuracy. With low contention the readout latency is in the low
>      nanoseconds range, which detects even slight skews between CPUs. Under
>      high contention this becomes obviously less accurate, but still
>      detects slow skews reliably as it solely relies on subsequent readouts
>      being monotonically increasing. It just can take slightly longer to
>      detect the issue.

On x86, I agree iterating at a per-thread level is needed rather than
one thread per NUMA node, since the TSC_ADJUST architectural MSR is
per-core and we want detection completeness.

On other architectures, completeness could be traded off for lower
overhead if it is guaranteed each processor thread uses the same clock
value, though this is actually is what the clocksource watchdog seeks
to validate, so agreed on the current approach there too.

> +/* Maximum time between two watchdog readouts */
> +#define WATCHDOG_READOUT_MAX_NS                (50 * NSEC_PER_USEC)

At 1920 threads, the default timeout threshold of 20us triggers
continuous warnings at idle, however 1000us causes none under an 8
hour adverse workload [1]; no HPET fallback was seen. A 500us
threshold causes a low rate of timeouts [2] (overhead amplified due to
retries), thus 1000us adds margin and should prevent retries.

Thanks,
  Dan

-- [1]

n=$(($(getconf _NPROCESSORS_ONLN)/2)); stress-ng --msyncmany $n --vm
$n --vm-bytes 50% --vm-keep --verify --vmstat 30 --timeout 8h

-- [2]

[ 1873.419375] clocksource: Watchdog remote CPU 1807 read timed out
[ 1900.419375] clocksource: Watchdog remote CPU 1861 read timed out
[ 1925.924374] clocksource: Watchdog remote CPU 1912 read timed out
[ 1937.420453] clocksource: Watchdog remote CPU 15 read timed out
[ 1937.925028] clocksource: Watchdog remote CPU 16 read timed out
[ 1949.073317] workqueue: drm_fb_helper_damage_work hogged CPU for
>13333us 515 times, consider switching to WQ_UNBOUND
[ 1954.924464] clocksource: Watchdog remote CPU 50 read timed out
[ 2032.923596] clocksource: Watchdog remote CPU 206 read timed out
[ 2042.924367] clocksource: Watchdog remote CPU 226 read timed out
[ 2066.420624] clocksource: Watchdog remote CPU 273 read timed out
[ 2072.924015] clocksource: Watchdog remote CPU 286 read timed out
[ 2115.602465] workqueue: drm_fb_helper_damage_work hogged CPU for
>13333us 1027 times, consider switching to WQ_UNBOUND
[ 2139.924153] clocksource: Watchdog remote CPU 420 read timed out
[ 2143.419690] clocksource: Watchdog remote CPU 427 read timed out
[ 2147.420587] clocksource: Watchdog remote CPU 435 read timed out
[ 2160.924251] clocksource: Watchdog remote CPU 462 read timed out
[ 2165.419843] clocksource: Watchdog remote CPU 471 read timed out
[ 2170.442815] clocksource: Watchdog remote CPU 481 read timed out
[ 2221.420468] clocksource: Watchdog remote CPU 583 read timed out
--
Daniel J Blueman