[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240102135620.GB3303@incl>
Date: Tue, 2 Jan 2024 14:56:20 +0100
From: Jiri Wiesner <jwiesner@...e.de>
To: Feng Tang <feng.tang@...el.com>
Cc: linux-kernel@...r.kernel.org, John Stultz <jstultz@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH] clocksource: Use proportional clocksource skew threshold
On Tue, Dec 26, 2023 at 10:16:33PM +0800, Feng Tang wrote:
> We've seen similar reports on LKML that the watchdog timer was delayed
> for a very long time (some was 100+ seconds). As you said, the
> scheduling issue should be addressed.
CFS was the scheduling policy when the delays happened. Hopefully, EEVDF
will prove to be an improvement in this area.
> Meanwhile, instead of adding new complex logic to clocksource watchdog
> code, can we just printk_once a warning message and skip the current
> watchdog check if the duration is too long. ACPI_PM timer only has a
> 24 bit counter which will wrap around every 3~4 seconds, when the
> duration is too long, like 14.5 seconds here, the check is already
> meaningless.
Skipping the current watchdog check would solve the issue. It has also the
advantage that clocksources would not get marked unstable on account of
increased scheduling delays and the clocksource or watchdog counter
wrapping around. With the CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE option
enabled, the maximum clocksource_delta is only be half of the whole range
(4.68 -> 2.34 secs for acpi_pm and 179.0 -> 89.5 secs for the HPET), which
makes acpi_pm getting marked unstable even more probable.
Skipping the current watchdog check will require a threshold for evaluting
watchdog intervals. I guess WATCHDOG_INTERVAL + (WATCHDOG_INTERVAL >> 1)
would not be completely amiss. Depending on how tight a threshold is
chosen, the printk_once message might become commonplace on busy systems.
It would attract attention of customers, which is not necessarily a bad
thing because the vendor would learn about the cases where the scheduling
policy does not perform well.
I am not sure how much of a problem is the fact that stricter limits on
skews will be imposed for watchdog intervals that are close to the
threshold. The reality of production system is that the corner case that
causes the watchdog interval to get stretched is not uncommon. Considering
the proposed threshold, WATCHDOG_INTERVAL + (WATCHDOG_INTERVAL >> 1), the
current uncertainty margins (of the TSC and HPET) correspond to 333 ppm
(microseconds of skew per second). So, I am still in favour of scaling the
margins proportionally to the watchdog interval. I am going send a new
patch implementing skipping the current watchdog check. I could send a
modified version of the margin scaling patch later if there was interest.
--
Jiri Wiesner
SUSE Labs
Powered by blists - more mailing lists