linux-kernel - Re: [PATCH] clocksource: Use proportional clocksource skew threshold

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20240102135620.GB3303@incl>
Date: Tue, 2 Jan 2024 14:56:20 +0100
From: Jiri Wiesner <jwiesner@...e.de>
To: Feng Tang <feng.tang@...el.com>
Cc: linux-kernel@...r.kernel.org, John Stultz <jstultz@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Stephen Boyd <sboyd@...nel.org>,
	"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH] clocksource: Use proportional clocksource skew threshold

On Tue, Dec 26, 2023 at 10:16:33PM +0800, Feng Tang wrote:
> We've seen similar reports on LKML that the watchdog timer was delayed
> for a very long time (some was 100+ seconds). As you said, the
> scheduling issue should be addressed.

CFS was the scheduling policy when the delays happened. Hopefully, EEVDF 
will prove to be an improvement in this area.

> Meanwhile, instead of adding new complex logic to clocksource watchdog
> code, can we just printk_once a warning message and skip the current
> watchdog check if the duration is too long. ACPI_PM timer only has a
> 24 bit counter which will wrap around every 3~4 seconds, when the
> duration is too long, like 14.5 seconds here, the check is already
> meaningless.

Skipping the current watchdog check would solve the issue. It has also the 
advantage that clocksources would not get marked unstable on account of 
increased scheduling delays and the clocksource or watchdog counter 
wrapping around. With the CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE option 
enabled, the maximum clocksource_delta is only be half of the whole range 
(4.68 -> 2.34 secs for acpi_pm and 179.0 -> 89.5 secs for the HPET), which 
makes acpi_pm getting marked unstable even more probable.

Skipping the current watchdog check will require a threshold for evaluting 
watchdog intervals. I guess WATCHDOG_INTERVAL + (WATCHDOG_INTERVAL >> 1) 
would not be completely amiss. Depending on how tight a threshold is 
chosen, the printk_once message might become commonplace on busy systems. 
It would attract attention of customers, which is not necessarily a bad 
thing because the vendor would learn about the cases where the scheduling 
policy does not perform well.

I am not sure how much of a problem is the fact that stricter limits on 
skews will be imposed for watchdog intervals that are close to the 
threshold. The reality of production system is that the corner case that 
causes the watchdog interval to get stretched is not uncommon. Considering 
the proposed threshold, WATCHDOG_INTERVAL + (WATCHDOG_INTERVAL >> 1), the 
current uncertainty margins (of the TSC and HPET) correspond to 333 ppm 
(microseconds of skew per second). So, I am still in favour of scaling the 
margins proportionally to the watchdog interval. I am going send a new 
patch implementing skipping the current watchdog check. I could send a 
modified version of the margin scaling patch later if there was interest.
-- 
Jiri Wiesner
SUSE Labs